YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)
---
tags:
  - internvl3.5
  - 8b
  - flash
  - nf4
  - 4bit
  - selective-quantization
license: other
---

# InternVL3_5-8B-Flash with Selective NF4 4-bit (LLM MLP only)

This package contains a copy of the original Flash model with Qwen3 MLP FFN layers (gate_proj, up_proj, down_proj) replaced by bitsandbytes Linear4bit (NF4). Attention projections, vision model, gating, and lm_head remain in FP16. Flash path remains enabled.

Load with:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoProcessor
model = AutoModelForCausalLM.from_pretrained('<repo>', trust_remote_code=True, device_map={'':'cuda'})
tok = AutoTokenizer.from_pretrained('<repo>', trust_remote_code=True, use_fast=False)
proc = AutoProcessor.from_pretrained('<repo>', trust_remote_code=True)
```
Downloads last month
21
Safetensors
Model size
9B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support