Post
204
Grok 2 is worse than 1 but better than 3. This was already measured using API but now we measured the LLM and the results are similar.
GLM is ranking higher and higher compared to previous versions. Nice trend!
GLM is ranking higher and higher compared to previous versions. Nice trend!