Cataloguing LLM evaluations
The paper proposes a taxonomy of the LLM evaluation landscape, comprising of five categories: General Capabilities, Domain Specific Capabilities, Safety and Trustworthiness, Extreme Risks, and Undesirable Use Cases. Read more
You might also like
-
New brief: Artificial intelligence in the humanitarian sector
-
REvaluation week podcast episode: “Does AI really save work in evaluation?”
-
What’s next for the Gender, AI and MERL Working Group? An event series built in partnership with you!
-
Event: Gender, AI and MERL Working Group meeting focusing on gender inclusion and GenAI