How are LLM ranked today : performance

Basically, the current way of evaluating LLM performance is based on creating requests, prompt all LLM and evaluate the best answer. It can be automatized when the correct answer is known in advance (like for mathematics tests) or manually evaluated (like for the Chatbot Arena)

  1. The elo models of the chatbot arena

LMSYS Chatbot Arena and Leaderboard

  1. The scored benchmarks

LiveBench

  1. The combined leaderboards

Why is a popularity index now needed

LLM popularity Index : June 2024th edition

Repartition of LLM usage in June 24th.svg

GPT is leading over Claude & Gemini by a fair advance

How should we interpret Llama 3's score?