Universal and Transferable Adversarial Attacks on Aligned Language Models
Large language models (LLMs) are typically trained on massive text corpora scraped from the
internet, which are known to contain a substantial amount of objectionable content. In an attempt to make AI systems better aligned with human values. Read more
You might also like
-
Community listening: A relational, people-first process that shouldn’t be turned into a tickbox exercise
-
Meet us in person at ICT4D Conference in Nairobi in May 20-22
-
Kicking off a new learning group: AI and MERL in Latin America
-
Event Recap: Evaluating the Climate & Socio-Environmental Impact of Data Centers
