July 27, 2023

Universal and Transferable Adversarial Attacks on Aligned Language Models

Large language models (LLMs) are typically trained on massive text corpora scraped from the
internet, which are known to contain a substantial amount of objectionable content. In an attempt to make AI systems better aligned with human values. Read more

Community listening: A relational, people-first process that shouldn’t be turned into a tickbox exercise
by Talitha Hlaka
Meet us in person at ICT4D Conference in Nairobi in May 20-22
by Talitha Hlaka
Kicking off a new learning group: AI and MERL in Latin America
by Talitha Hlaka
Event Recap: Evaluating the Climate & Socio-Environmental Impact of Data Centers
by Talitha Hlaka

Universal and Transferable Adversarial Attacks on Aligned Language Models

You might also like