Universal and Transferable Adversarial Attacks on Aligned Language Models
Large language models (LLMs) are typically trained on massive text corpora scraped from the
internet, which are known to contain a substantial amount of objectionable content. In an attempt to make AI systems better aligned with human values. Read more
You might also like
-
“Hey ChatGPT, what is missing from our reports?”: A Generative AI Use Case for Public Sector MERL
-
New resource: Tool for Assessing AI Vendors
-
What does the data say? Join us for a roundtable on the emerging evidence on GenAI for Social & Behavioral Change
-
Event: “Should we be using AI right now?” A conversation about the ethical challenges of the current moment