Cataloguing LLM evaluations
The paper proposes a taxonomy of the LLM evaluation landscape, comprising of five categories: General Capabilities, Domain Specific Capabilities, Safety and Trustworthiness, Extreme Risks, and Undesirable Use Cases. Read more
You might also like
-
Humans in the Machine: the Impact of AI on workers – Learn More on February 6th
-
Join us for the Gender, MERL and AI Working Group meeting kick-off!
-
The influence of Big Tech in 2025: 8 ways civil society can prepare for the incoming US administration
-
We’ve (mostly) banned AI assistants from NLP Community of Practice events. Here’s why.