Configuration Parsing Warning: Invalid JSON for config file config.json

Elbaz-NVIDIA-Nemotron-3-Nano-30B-A3B-PRISM-NVFP4 (UNCENSORED)

NVFP4 Quantized Version for NVIDIA Blackwell GPUs

Model Description

This is the NVFP4 (FP4) quantized version of Ex0bit/Elbaz-NVIDIA-Nemotron-3-Nano-30B-A3B-PRISM, created using NVIDIA TensorRT Model Optimizer.

Model Size: ~18 GB (4-bit weights)

Requirements

NVFP4 format is optimized for:

NVIDIA Blackwell GPUs (B100, B200, GB200)
TensorRT-LLM v0.17+

This format is NOT compatible with:

Consumer GPUs (RTX 3000/4000/5000 series)
llama.cpp
Standard PyTorch inference

Usage

With TensorRT-LLM

Deploying NVIDIA Nemotron-3-Nano with TensorRT LLM

Quantization Details

Method: NVIDIA ModelOpt NVFP4_DEFAULT_CFG
Calibration: 512 samples from WikiText-2
Format: HuggingFace checkpoint with quantized weights

Related Models

Ex0bit/Elbaz-NVIDIA-Nemotron-3-Nano-30B-A3B-PRISM - Parent model (GGUF quantizations available)
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 - Original base model

Author

Eric Elbaz (Ex0bit)

License

NVIDIA Open Model License

Downloads last month: 24

Safetensors

Model size

16B params

Tensor type

F32

·

BF16

·

F8_E4M3

·

U8

·

Model tree for Ex0bit/Elbaz-NVIDIA-Nemotron-3-Nano-30B-A3B-PRISM-NVFP4

Base model

nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16

Finetuned

(16)

this model

Datasets used to train Ex0bit/Elbaz-NVIDIA-Nemotron-3-Nano-30B-A3B-PRISM-NVFP4