Universal and Transferable Adversarial Attacks on Aligned Language Models
Large language models (LLMs) are typically trained on massive text corpora scraped from the
internet, which are known to contain a substantial amount of objectionable content. In an attempt to make AI systems better aligned with human values. Read more
You might also like
-
Event Recap: Evaluating the Climate & Socio-Environmental Impact of Data Centers
-
No Impact Without Engagement: Towards Standardised Product Metrics for Social and Behaviour Change Chatbots
-
Join us on May 28: Building a GenAI Sexual and Reproductive Health Chatbot in Senegal and Kenya – Technical and Operational Learnings
-
Safety by Design: When AI Finds the Cracks, Who Falls Through Them?
