YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

---
tags:
  - internvl3.5
  - 8b
  - flash
  - nf4
  - 4bit
  - selective-quantization
license: other
---

# InternVL3_5-8B-Flash with Selective NF4 4-bit (LLM MLP only)

This package contains a copy of the original Flash model with Qwen3 MLP FFN layers (gate_proj, up_proj, down_proj) replaced by bitsandbytes Linear4bit (NF4). Attention projections, vision model, gating, and lm_head remain in FP16. Flash path remains enabled.

Load with:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoProcessor
model = AutoModelForCausalLM.from_pretrained('<repo>', trust_remote_code=True, device_map={'':'cuda'})
tok = AutoTokenizer.from_pretrained('<repo>', trust_remote_code=True, use_fast=False)
proc = AutoProcessor.from_pretrained('<repo>', trust_remote_code=True)
```

Downloads last month: 21

Safetensors

Model size

9B params

Tensor type

F16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support