Universal and Transferable Adversarial Attacks on Aligned Language Models
Large language models (LLMs) are typically trained on massive text corpora scraped from the
internet, which are known to contain a substantial amount of objectionable content. In an attempt to make AI systems better aligned with human values. Read more
You might also like
-
Tech Salon recap: listen more and shift away from Western-centric framing to better address online violence against women and girls
-
Savita Bailur joins MTI as a Core Collaborator
-
Reviewing Mirca Madianou’s new book, “Technocolonialism: When Tech for Good is Harmful”
-
Welcoming our new AI+Africa Lead for the NLP-CoP: Vari Matimba