October 18, 2023

Cataloguing LLM evaluations

The paper proposes a taxonomy of the LLM evaluation landscape, comprising of five categories: General Capabilities, Domain Specific Capabilities, Safety and Trustworthiness, Extreme Risks, and Undesirable Use Cases. Read more

New brief: Artificial intelligence in the humanitarian sector
by Talitha Hlaka
REvaluation week podcast episode: “Does AI really save work in evaluation?”
by Talitha Hlaka
What’s next for the Gender, AI and MERL Working Group? An event series built in partnership with you!
by Talitha Hlaka
Event: Gender, AI and MERL Working Group meeting focusing on gender inclusion and GenAI
by Talitha Hlaka

Cataloguing LLM evaluations

You might also like