A good model for benchmark but useless in daily due to think too much
Always think in minutes
Yes, he thinks for a very long time...
Thanks for the comment.
To clarify, Nanbeige4-3B is primarily a research model aimed at exploring the capability boundaries of small-scale language models. Its strong performance isn’t limited to benchmarks—we’ve also received positive feedback from real-world user cases, such as https://www.reddit.com/r/LocalLLaMA/comments/1pj3q4q/nanbeige43b_lightweight_with_strong_reasoning/
That said, we fully acknowledge the issue you raised: in the current open-source 2511 release version, we haven’t included explicit length control or length-based penalties. This can indeed lead the model to sometimes “overthink.” We recognize this as a clear area for improvement and are actively exploring techniques to encourage more efficient, context-appropriate reasoning in future versions.