Errors loading state_dict via transformers: size mismatch for down_proj

by rustyjelly - opened Nov 5

Nov 5

•

I get this when trying to load via transformers:

RuntimeError: Error(s) in loading state_dict for Llama4TextExperts:
    size mismatch for down_proj: copying a param with shape torch.Size([16, 4096, 5120]) from checkpoint, the shape in current model is torch.Size([16, 8192, 5120]).

To reproduce (run in same dir as cloned model repo):

from transformers import Llama4ForConditionalGeneration
model = Llama4ForConditionalGeneration.from_pretrained('.', device_map="auto")

adaface-neurips

5 days ago

Seems the checkpoint is shared, and transformers will load only one shard on a single GPU. That's why the tensor is half sized. I'm still exploring how to fix it. Will update if I find a solution.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment