Errors loading state_dict via transformers: size mismatch for down_proj
#2
by
rustyjelly
- opened
I get this when trying to load via transformers:
RuntimeError: Error(s) in loading state_dict for Llama4TextExperts:
size mismatch for down_proj: copying a param with shape torch.Size([16, 4096, 5120]) from checkpoint, the shape in current model is torch.Size([16, 8192, 5120]).
To reproduce (run in same dir as cloned model repo):
from transformers import Llama4ForConditionalGeneration
model = Llama4ForConditionalGeneration.from_pretrained('.', device_map="auto")
Seems the checkpoint is shared, and transformers will load only one shard on a single GPU. That's why the tensor is half sized. I'm still exploring how to fix it. Will update if I find a solution.