Use the vitals package with ellmer to evaluate and compare the accuracy of LLMs, including writing evals to test local models.
OpenAI's new GPT-5 flagship failed half of my programming tests. Previous OpenAI releases have had just about perfect results. Now that OpenAI has enabled fallbacks to other LLMs, there are options.
XDA Developers on MSN
My local LLM replaced ChatGPT for most of my daily work
Local beats the cloud ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results