Cataloguing LLM evaluations
The paper proposes a taxonomy of the LLM evaluation landscape, comprising of five categories: General Capabilities, Domain Specific Capabilities, Safety and Trustworthiness, Extreme Risks, and Undesirable Use Cases. Read more
You might also like
-
New paper: Exploring Emerging AI as Subject and Object in Democracy-Focused Evaluation
-
Podcast: Assessing evidence on the effectiveness of humanitarian AI use cases with Humanitarian AI Today
-
New brief: Artificial intelligence in the humanitarian sector
-
REvaluation week podcast episode: “Does AI really save work in evaluation?”