Dingjie Song
Welcome! I am a research assistant affiliated with the CUHK-Shenzhen NLP group, under the guidance of Dr. Benyou Wang. I obtained my M.E. from the Software Institute and the Natural Language Processing Group at Nanjing University, advised by Dr. Xinyu Dai and Dr. Jidong Ge. Prior to this, I completed my B.E. at the Software Institute of Nanjing University.
Email: dingjiesong.cs@gmail.com
Links: Research Overview / Updates / Awards / Papers
Research Overview
My research interests are in Natural Language Processing, especially intelligent interactive systems π€ and Domain-specific LLMs π¨π»ββοΈ and the following directions:
- Multimodal LLM [MileBench], [LongLLaVA], [TRIM], [MM-Detect]
- Medical LLM: [CMB], [HuatuoGPT-II]
- Multilingual LLM: [AceGPT]
- Task-oriented dialogue systems: [EPL, NLPCC 2023 Oral], [STAM, JCST 2023]
Updates
Nov 2024: MM-Detect π΅οΈ released! MM-Detect is the first Data Contamination Detection Framework for MLLMs! More information can be found in π paper and the GitHub.
Sep 2024: TRIM βοΈ released! TRIM is a simple yet effective Image Token Reduction Method for efficient MLLMs! More information can be found in π paper, π€ HuggingFace and the GitHub.
Sep 2024: LongLLaVA ππ¦ released! LongLLaVA is the first MLLM with hybrid architecture that can handle up to 1000 images! More information can be found in π paper, π€ HuggingFace and the GitHub. π₯#2 Paper of the day on Huggingface Daily Paper.
July 2024: ππ Two papers MileBench and HuatuoGPT2 were accepted to COLMβ24 main conference!
April 2024: MileBench π£οΈ released! MileBench is a pioneering benchmark designed to rigorously test the MultImodal Long-contExt capabilities of MLLMs. More information can be found on the π website, π paper, π€ HuggingFace and the GitHub.
March 2024: ππ Two papers CMB and AceGPT were accepted to NAACLβ24 main conference!
Before 2024
**Nov 2023**: HuatuoGPT2 released! Try it out on the [π demo](https://www.huatuogpt.cn/#/)! HuatuoGPT2 employs an innovative domain adaptation method to significantly boost its medical knowledge and dialogue proficiency and showcases SOTA performance in several medical benchmarks, especially **surpassing GPT-4 in expert evaluations and the fresh medical licensing exams**. More info can be found in [π paper](https://arxiv.org/abs/2311.09774) and [π€ HuggingFace](https://huggingface.co/FreedomIntelligence/HuatuoGPT2-34B). **Sep 2023**: We publish AceGPT that achieved **top performance** among open-source Arabic language models in benchmark tests. More info can be found in [π paper](https://arxiv.org/abs/2309.12053) and [π€ HuggingFace](https://huggingface.co/FreedomIntelligence/AceGPT-13B-chat). **Aug 2023**: [Checkout our π new paper](https://arxiv.org/abs/2308.08833) that focuses on benchmarking prevalent Medical LLMs for their medical knowledge and clinical diagnostic capabilities. More information can be found on the [π website](https://cmedbenchmark.llmzoo.com/#home) and the [π€ HuggingFace](https://huggingface.co/datasets/FreedomIntelligence/CMB). **Jul 2023**: Start the journey in CUHK-sz as a research assistant under the guidance of [Benyou Wang](https://scholar.google.com/citations?user=Jk4vJU8AAAAJ). **Jun 2023**: I defended my master's degree and got my master's degree in software engineering. Thanks to all those who have supported me. **Aug 2022 - Apr 2023**: Finished my internship with [Jiaxing Zhang](https://scholar.google.com/citations?user=ozXuhOUAAAAJ) on LLM SFT.Papers
2024
Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination Dingjie Song*, Sicheng Lai*, Shunian Chen, Lichao Sun, Benyou Wang arxiv, Under Review, project page
Less is More: A Simple yet Effective Token Reduction Method for Efficient Multi-modal LLMs Dingjie Song, Wenjun Wang, Shunian Chen, Xidong Wang, Michael Guan, Benyou Wang arxiv, Under Review, project page
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture Dingjie Song*, Xidong Wang*, Shunian Chen, Chen Zhang, Benyou Wang arxiv, Under Review, project page, code and data
MileBench: Benchmarking MLLMs in Long Context Dingjie Song, Shunian Chen, Guiming Hardy Chen, Fei Yu, Xiang Wan, Benyou Wang COLM 2024, project page, code and data
2023
HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs Junying Chen, Xidong Wang, Anningzhe Gao, Feng Jiang, Shunian Chen, Hongbo Zhang, Dingjie Song, Wenya Xie, Chuyi Kong, Jianquan Li, Xiang Wan, Haizhou Li, Benyou Wang COLM 2024, project page, code and data
AceGPT, Localizing Large Language Models in Arabic Huang Huang*, Fei Yu*, Jianqing Zhu*, Xuening Sun, Hao Cheng, Dingjie Song, Zhihong Chen, Abdulmohsen Alharthi, Bang An, Juncai He, Ziche Liu, Zhiyi Zhang, Junying Chen, Jianquan Li, Benyou Wang, Lian Zhang, Ruoyu Sun, Xiang Wan, Haizhou Li, Jinchao Xu NAACL 2024, code and data
CMB: A Comprehensive Medical Benchmark in Chinese Dingjie Song*, Xidong Wang*, Guiming Hardy Chen* (equal contribution), Zhiyi Zhang*, Zhihong Chen, Qingying Xiao, Feng Jiang, Jianquan Li, Xiang Wan, Benyou Wang, Haizhou Li *NAACL 2024, project page, code and data
Episode-based Prompt Learning for Any-shot Intent Detection Dingjie Song*, Pengfei Sun* (*equal contribution), Yawen Ouyang, Zhen Wu, Xinyu Dai NLPCC 2023 Oral
2022
- Self-Supervised Task Augmentation for Few-Shot Intent Detection Pengfei Sun, Yawen Ouyang, Dingjie Song, Xinyu Dai JCST 2022 code and data
Awards
- Outstanding Graduate Student, Nanjing University, 2022
- Yingcai Scholarship, Nanjing University, 2022
- Renmin Scholarship (Peopleβs Scholarship), Nanjing University, 2018-2021
- Third Runnerβs Up in 15th Citi Cup Financial Innovation Application Competition, Citigroup, 2019
- Second Runnerβs Up in 2019 βChain to Futureβ University Blockchain Technology Application Competition, CCF, 2019
- Outstanding Student Leader of the Communist Youth League, Nanjing University, 2018-2019
Services
- Conference reviewer: EMNLP, ACL Rolling Review