why so slow
#7 opened 12 days ago
by
jasonZhang1
Reasoning content is output in 'content' when serving with vLLM and calling openAI API, using --reasoning-parser glm45
1
#6 opened 12 days ago
by
stephenmcconnachie
Trying to serve with vllm, got this error: ValueError: There is no module or parameter named 'model.layers.1.mlp.gate.e_score_correction_bias' in TransformersMoEForCausalLM
➕
1
5
#4 opened 18 days ago
by
firow2
vllm nightly currently not supporting Blackwell with this model
1
#3 opened 20 days ago
by
1anH
Severe Looping/Repetitive Output when using --kv-cache-dtype fp8 with GLM-4.7-Flash-FP8-Dynamic on vLLM
4
#2 opened 20 days ago
by
ShelterW
dual 3090 inference
5
#1 opened 20 days ago
by
evetsagg