nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 · Tool calling with reasoning parsing broken

4 days ago

While trying to use this model with VLLM=12.0.0 at BF16, it always broken and output reasoning as part of message alongside with . Maybe it outputs twice, maybe just parser issue, I'm not sure. Need to more carefully test it. Posting here in hope it's just minor issue or typo in parser plugin which was already fixed but not uploaded.

venkat-srinivasan-nvidia

4 days ago

Hello! I'm taking a look now. Would you be able to share your VLLM spin up command and the ChatCompletions request that you saw this issue on so that I can test on a setup similar to yours? Thanks!

venkats-nvidia

NVIDIA org 4 days ago

•

edited 4 days ago

Hello! We took a look, and the reasoning content being replicated twice is expected -- VLLM does output the reasoning content in both keys.

With an adequate token budget specified in the request (so that the model doesn't get cut off), I do see the tool parsing working as expected with the spin-up command specified in the model card.

{"id":"chatcmpl-b3f8d10c5f6a1def","object":"chat.completion","created":1765814802,"model":"nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16","choices":[{"index":0,"message":{"role":"assistant","content":null,"refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[{"id":"chatcmpl-tool-8487fce982d34ecc","type":"function","function":{"name":"python_interpreter","arguments":"{\"script\": \"10 + 10\"}"}}],"reasoning":"Okay, the user wants me to calculate 10 + 10 using tools. Let me check the available tools. There's a Python interpreter tool that can execute Python code. So I can use that to compute the sum. The function requires a script parameter. I should write the Python code as a string that adds 10 and 10. Let me make sure the syntax is correct. Then, I'll format the tool call correctly within the XML tags. The expected output should be 20. I need to structure the JSON properly, ensuring the \"script\" key is included with the correct Python code string.\n","reasoning_content":"Okay, the user wants me to calculate 10 + 10 using tools. Let me check the available tools. There's a Python interpreter tool that can execute Python code. So I can use that to compute the sum. The function requires a script parameter. I should write the Python code as a string that adds 10 and 10. Let me make sure the syntax is correct. Then, I'll format the tool call correctly within the XML tags. The expected output should be 20. I need to structure the JSON properly, ensuring the \"script\" key is included with the correct Python code string.\n"},"logprobs":null,"finish_reason":"tool_calls","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":289,"total_tokens":449,"completion_tokens":160,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}

It's possible your completion token budget in your request is inadequate for the model, and it is not able to properly output the full completion necessary. Please feel free to share your ChatCompletions request and I can take a look. Thanks again!

nephepritou

4 days ago

•

edited 4 days ago

It's yaml for llama-swap. As I can see here https://docs.vllm.ai/en/latest/features/reasoning_outputs/ DeepSeek reasoning parser does not work with tool calls. So, it perfectly explain why my attempts at using this model with Qwen Code or Opencode fails. It works, but eventually parts of reasoning in chat history makes responses incoherent. I ashamed I don't know how to export conversation example from these tools in full request-response format.

  nemotron-nano-30b:
    env:
      - VLLM_SLEEP_WHEN_IDLE=1
      - CUDA_DEVICE_ORDER=PCI_BUS_ID
      - CUDA_VISIBLE_DEVICES=0,1,2,3
      - VLLM_ATTENTION_BACKEND=FLASHINFER
      - OMP_NUM_THREADS=12
    cmd: |
      python3
        -m vllm.entrypoints.openai.api_server
        --model /mnt/data/llm-data/models/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
        --dtype auto
        --served-model-name "nemotron-nano-30b"
        --port ${PORT}
        --tensor-parallel-size 4
        --enable-expert-parallel
        --max-model-len 262144
        --gpu-memory-utilization 0.92
        --max-num-seqs 2
        --trust-remote-code
        --enable-auto-tool-choice
        --tool-call-parser qwen3_coder
        --reasoning-parser-plugin /mnt/data/llm-data/models/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16/nano_v3_reasoning_parser.py
        --reasoning-parser nano_v3
    ttl: 0
    checkEndpoint: /v1/models

There is example chat export from opencode. You can see those </think> tags in messages.

# Reviewing files Q&A for plan-mode read-only guidance

**Session ID:** ses_4dd28acceffeskKoi1rpKHTKbx
**Created:** 12/15/2025, 7:28:15 PM
**Updated:** 12/15/2025, 7:28:29 PM

---

## User

Hello! Please, review my files

---

## Assistant



Tool: bash


---

## Assistant

 need to respond with appropriate tool usage. The user says "Hello! Please, review my files". They want review of their files. But we have no files listed. The environment shows no files. Possibly they have some files in the workspace? The system says Working directory: /. Is directory a git repo: no. No files listed. The user wants to review their files, but we have no context. We can ask clarifying question: what files? Or we can try to list files in home directory? The tool "list" expects an absolute path. Let's try listing /home maybe? But we don't know if that exists. Could be /workspace? Usually the working directory is / but maybe files are elsewhere. Let's try listing /workspace? Or maybe the user expects to have some files in the current directory? Since it's read-only, we can only list. Let's try listing / (root). Already attempted and got "zsh:1: command not found: list". Actually the tool name is "list"? In the developer instructions, they mention "list" tool to list files. But the error says "zsh:1: command not found: list". That suggests that the tool is not recognized as a bash command. Perhaps we need to use the "bash" tool with "list" command? Actually the "bash" tool executes arbitrary bash commands. So we can run "bash" with command "ls -la /". Or we can use the "list" tool directly? The tool definitions show "list" as a separate tool: it takes path and ignore patterns. So we should call the "list" tool, not bash "list". The error came because I typed "list /" directly in bash, which doesn't exist. Instead, I should use the "list" tool with parameters. So call functions/list with path "/" and maybe no ignore. Let's do that.
</think>


Tool: list


---

## Assistant

 need to respond to the user who said "Hello! Please, review my files". We have listed the root directory contents. The user likely wants a review of their files, but we don't know which files they want reviewed. They haven't provided any specific files. Perhaps they want us to review some files that they have in the environment. Since we have no information about what files they want to review, we can ask clarifying question: "Which files would you like me to review?" Or we can suggest we can review the listed files. But the instruction says we must be concise, direct, no extra explanation unless needed. However we need to ask a clarifying question because we don't know which files. The user just says "review my files". Could be they have some hidden files somewhere. Perhaps they want us to review all files? That would be too broad. Better ask: "Sure, could you specify which files you'd like reviewed?" That is a short question.

But maybe we can proactively ask to see if there are any files in user's home directory? The environment's working directory is /. That suggests maybe the user's files are under /home? But we didn't see /home listed. Could be that there are hidden files. Let's try listing /home? Use list tool again with path /home.
</think>


Tool: list


---

## Assistant

 have listed /home and see a user "gleb" with many files. The user might be "gleb"? The user said "Hello! Please, review my files". Possibly they want a review of their files under /home/gleb. Perhaps they want feedback on some code or configuration. Without specific instructions, we can ask which files they'd like reviewed, or offer to review all files under /home/gleb. Could also ask what type of review: code review, security, etc.

Given the instructions: we should be concise, direct, no extra explanation unless needed. But we need to ask a clarifying question to understand which files. So respond with a short question.

Thus: "Which files would you like me to review?"
</think>
Which files would you like me to review?

---

nephepritou

4 days ago

Lol, possible related bugfix was merged half an hour ago - https://github.com/vllm-project/vllm/pull/30671

suhara

NVIDIA org 4 days ago

Hi @nephepritou

Thank you for sharing the bugfix. If the error persists, could you try with the --enable-auto-tool-choice option with a larger number of max tokens?

We updated the Model Card accordingly.
https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16/discussions/6/files

nephepritou

3 days ago

Updating VLLM to latest nightly build helps, there are no more tool calling issues. Speed is really good, about +15% over Qwen Coder 30B from the start and bottleneck by CPU. But accuracy with Qwen Code CLI is kind of terrible. It hallucinate a lot about str_replace_editor tool which is not present in Qwen Code CLI or Opencode, looks like it was trained a lot on Claude code.

shakhizat

3 days ago

Hello, it shows: Value error, invalid reasoning parser: nano_v3 (chose from { deepseek_r1,deepseek_v3,ernie45,glm45,granite,hunyuan_a13b,kimi_k2,minimax_m2,minimax_m2_append_think,mistral,olmo3,openai_gptoss,qwen3,seed_oss,step3 }) [type=value_error, input_value=StructuredOutputsConfig(b...able_in_reasoning=False), input_type=StructuredOutputsConfig]