Cataloguing LLM evaluations
The paper proposes a taxonomy of the LLM evaluation landscape, comprising of five categories: General Capabilities, Domain Specific Capabilities, Safety and Trustworthiness, Extreme Risks, and Undesirable Use Cases. Read more
You might also like
-
Meet us (virtually!) at RightsCon next week
-
Getting Real About Artificial Intelligence: GenAI, Evaluation in International Development, and the Case for Caution
-
Key takeaways from our first Gender, AI and MERL Working Group meeting
-
What are the potential benefits, considerations, and risks of AI for Research Funding Organisations (RFOs)?