July 27, 2023

Universal and Transferable Adversarial Attacks on Aligned Language Models

Large language models (LLMs) are typically trained on massive text corpora scraped from the
internet, which are known to contain a substantial amount of objectionable content. In an attempt to make AI systems better aligned with human values. Read more

“Hey ChatGPT, what is missing from our reports?”: A Generative AI Use Case for Public Sector MERL
by Talitha Hlaka
New resource: Tool for Assessing AI Vendors
by Talitha Hlaka
What does the data say? Join us for a roundtable on the emerging evidence on GenAI for Social & Behavioral Change
by Talitha Hlaka
Event: “Should we be using AI right now?” A conversation about the ethical challenges of the current moment
by Talitha Hlaka

Universal and Transferable Adversarial Attacks on Aligned Language Models

You might also like