Model.evaluate - Search News

Micro1 Shows Why AI’s Hardest Problem Is Evaluation, Not Intelligence

Micro1 is building the evaluation layer for AI agents providing contextual, human-led tests that decide when models are ready ...

Caura.ai Introduces PeerRank: A Breakthrough Framework Where AI Models Evaluate Each Other Without Human Supervision

TEL AVIV, Israel, Feb. 4, 2026 /PRNewswire/ -- Caura.ai today published research introducing PeerRank, a fully autonomous evaluation framework in which large language models generate tasks, answer ...

Forbes

Why Human Evaluation Matters When Choosing The Right AI Model For Your Business

As enterprises increasingly integrate AI across their operations, the stakes for selecting the right model have never been higher and many technology leaders lean heavily on standard industry ...

VentureBeat

Beyond generic benchmarks: How Yourbench lets enterprises evaluate AI models against actual data

Every AI model release inevitably includes charts touting how it outperformed its competitors in this benchmark test or that evaluation matrix. However, these benchmarks often test for general ...

SiliconANGLE

Databricks expands tools for governing and evaluating AI agents

Databricks Inc. today announced a series of updates to its flagship artificial intelligence product, Agent Bricks, aimed at improving governance, accuracy and model flexibility for enterprise AI ...

Fierce Healthcare

OpenAI pushes further into healthcare with release of HealthBench to evaluate AI models

OpenAI, the maker of ChatGPT, released an open-source benchmark designed to measure the performance and safety of large language models in healthcare. The large data set, called HealthBench, goes ...

TechCrunch

OpenAI partner says it had relatively little time to test the company’s o3 AI model

An organization OpenAI frequently partners with to probe the capabilities of its AI models and evaluate them for safety, Metr, suggests that it wasn’t given much time to test one of the company’s ...

FedScoop

US AI Safety Institute taps Scale AI for model evaluation

Scale AI founder and CEO Alexandr Wang testifies during a House Armed Services Subcommittee on Cyber, Information Technologies and Innovation hearing about artificial intelligence on July 18, 2023, in ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results