Configuration Parsing Warning: Invalid JSON for config file config.json

Elbaz-NVIDIA-Nemotron-3-Nano-30B-A3B-PRISM-NVFP4 (UNCENSORED)

NVFP4 Quantized Version for NVIDIA Blackwell GPUs

Model Description

This is the NVFP4 (FP4) quantized version of Ex0bit/Elbaz-NVIDIA-Nemotron-3-Nano-30B-A3B-PRISM, created using NVIDIA TensorRT Model Optimizer.

Model Size: ~18 GB (4-bit weights)

Requirements

NVFP4 format is optimized for:

  • NVIDIA Blackwell GPUs (B100, B200, GB200)
  • TensorRT-LLM v0.17+

This format is NOT compatible with:

  • Consumer GPUs (RTX 3000/4000/5000 series)
  • llama.cpp
  • Standard PyTorch inference

Usage

With TensorRT-LLM

Deploying NVIDIA Nemotron-3-Nano with TensorRT LLM

Quantization Details

  • Method: NVIDIA ModelOpt NVFP4_DEFAULT_CFG
  • Calibration: 512 samples from WikiText-2
  • Format: HuggingFace checkpoint with quantized weights

Related Models

Author

Eric Elbaz (Ex0bit)

License

NVIDIA Open Model License

Downloads last month
24
Safetensors
Model size
16B params
Tensor type
F32
·
BF16
·
F8_E4M3
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Ex0bit/Elbaz-NVIDIA-Nemotron-3-Nano-30B-A3B-PRISM-NVFP4

Finetuned
(16)
this model

Datasets used to train Ex0bit/Elbaz-NVIDIA-Nemotron-3-Nano-30B-A3B-PRISM-NVFP4