Cataloguing LLM evaluations
The paper proposes a taxonomy of the LLM evaluation landscape, comprising of five categories: General Capabilities, Domain Specific Capabilities, Safety and Trustworthiness, Extreme Risks, and Undesirable Use Cases. Read more
You might also like
-
New guide: key questions to ask before using GenAI for research on violence against women
-
RightsCon Recap – Assessing humanitarian AI: what M&E frameworks do humanitarians need in the face of emerging AI?
-
Event recap: Humans in the machine – the impact of AI on workers
-
Evidence and Learning in the Context of Climate Change: Invitation to Action