Cataloguing LLM evaluations
The paper proposes a taxonomy of the LLM evaluation landscape, comprising of five categories: General Capabilities, Domain Specific Capabilities, Safety and Trustworthiness, Extreme Risks, and Undesirable Use Cases. Read more
You might also like
-
What’s happening with GenAI Ethics and Governance?
-
Join the AI and African Evaluation Working Group Meet ‘n’ Mix Session on May 7!
-
Hands on with GenAI: predictions and observations from The MERL Tech Initiative and Oxford Policy Management’s ICT4D Training Day
-
When Might We Use AI for Evaluation Purposes? A discussion with New Directions for Evaluation (NDE) authors