--- base_model: - meta-llama/Meta-Llama-3-8B datasets: - IgnoraZ/SynthQuestions language: - en license: cc-by-4.0 library_name: transformers pipeline_tag: text-generation --- # Model Card for Model ID This is the model from the paper **From Real to Synthetic: Synthesizing Millions of Diversified and Complicated User Instructions with Attributed Grounding**. ## Model Details ### Model Description - **Model type:** Chat Model - **Language(s) (NLP):** English - **License:** CC-BY-4.0 - **Finetuned from model:** LLaMA-3-8B - **Finetuned with data:** 1M dataset from `IgnoraZ/SynthQuestions` For more details like hyper-parameters, please refer to our paper. ### Model Sources - **Repository:** https://github.com/Ignoramus0817/SynthQuestions - **Paper:** https://www.arxiv.org/abs/2506.03968 ## How to Get Started with the Model This is a model in HF format, which can be deployed with common inference frameworks like Transformers, vLLM, SGLang and so on. We finetuned it with custom chat template instead of the default one from LLaMA. **Please make sure to use the chat template in the `tokenizer_config.json` when inferring.** ## Evaluation ### Alignment Benchmarks | Model | Arena Hard (WR%) | Alpaca Eval 2.0 (LC) | | :------------: | :--------------: | :------------------: | | SynthQuestions | 15.4 | 18.87 | ### Closed-form Benchmarks | Model | IFEVAL | MMLU | ARC-C | GPQA | GSM8K | MATH | | :------------: | :----: | :---: | :---: | :--: | :---: | :---: | | SynthQuestions | 57.05 | 65.79 | 63.92 | 30.3 | 70.53 | 22.71 | ## Citation ``` @misc{zhu2025realsyntheticsynthesizingmillions, title={From Real to Synthetic: Synthesizing Millions of Diversified and Complicated User Instructions with Attributed Grounding}, author={Chiwei Zhu and Benfeng Xu and Xiaorui Wang and Zhendong Mao}, year={2025}, eprint={2506.03968}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2506.03968}, } ``` ## Model Card Contact Please contact tanz@mail.ustc.edu.cn.