LLM performance isn't staticβmodels have good days and bad days. Performance can degrade or improve without warning. Developers often wonder: "Is it the model, or is it just me?"
CheckMyLLM provides real-time performance tracking from the developer community, helping you know which models are performing well right nowβnot just in theoretical benchmarks.
CheckMyLLM is a community-driven platform that tracks how LLMs are performing in real-world use. Think of it like Rotten Tomatoes, but for AI models.
Unlike traditional benchmarks that measure theoretical capabilities, we reveal day-to-day performance fluctuations based on actual developer experiences.
Rate models on a simple 1-5 scale based on your actual experience:
You can optionally rate specific dimensions:
- Accuracy β How correct and factual was the response?
- Speed β How fast did it respond?
- Helpfulness β How useful was it for your task?
You can also specify the task type (coding, writing, analysis, etc.) to help others understand how models perform for different use cases.
You can rate each model up to 2 times per day. This helps us track changes throughout the day while preventing spam. If you rate the same model twice in one day, your latest rating replaces the earlier one.
Your ratings are aggregated to show community-wide trends: average scores, performance over time, ratings by task type, and more. This helps everyone make informed decisions about which model to use for their current task.
Routing matters for LLMs. Knowing which model is performing well today is critical for making the right choice for your task.
Real-time community feedback helps you avoid frustration and get better results by choosing the model that's actually working well right now.
- All public stats are anonymous and aggregated
- We store your email only to prevent spam and abuse
- Your individual ratings are never shown publicly
- We don't share your data with third parties
- You can request deletion of your data at any time
Sign in with GitHub, Google, or Microsoft to start rating models and help the community track real-time LLM performance.