YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
---
tags:
- internvl3.5
- 8b
- flash
- nf4
- 4bit
- selective-quantization
license: other
---
# InternVL3_5-8B-Flash with Selective NF4 4-bit (LLM MLP only)
This package contains a copy of the original Flash model with Qwen3 MLP FFN layers (gate_proj, up_proj, down_proj) replaced by bitsandbytes Linear4bit (NF4). Attention projections, vision model, gating, and lm_head remain in FP16. Flash path remains enabled.
Load with:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoProcessor
model = AutoModelForCausalLM.from_pretrained('<repo>', trust_remote_code=True, device_map={'':'cuda'})
tok = AutoTokenizer.from_pretrained('<repo>', trust_remote_code=True, use_fast=False)
proc = AutoProcessor.from_pretrained('<repo>', trust_remote_code=True)
```
- Downloads last month
- 21
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support