unsloth
/

GLM-4.7-Flash-FP8-Dynamic

Text Generation

compressed-tensors

Model card Files Files and versions

Resources

View closed (1)

why so slow

#7 opened 12 days ago by

Reasoning content is output in 'content' when serving with vLLM and calling openAI API, using --reasoning-parser glm45

#6 opened 12 days ago by

stephenmcconnachie

Trying to serve with vllm, got this error: ValueError: There is no module or parameter named 'model.layers.1.mlp.gate.e_score_correction_bias' in TransformersMoEForCausalLM

#4 opened 18 days ago by

vllm nightly currently not supporting Blackwell with this model

#3 opened 20 days ago by

Severe Looping/Repetitive Output when using --kv-cache-dtype fp8 with GLM-4.7-Flash-FP8-Dynamic on vLLM

#2 opened 20 days ago by

dual 3090 inference

#1 opened 20 days ago by