Universal and Transferable Adversarial Attacks on Aligned Language Models
Large language models (LLMs) are typically trained on massive text corpora scraped from the
internet, which are known to contain a substantial amount of objectionable content. In an attempt to make AI systems better aligned with human values. Read more
You might also like
-
Qualitative researchers are not OK: adapting to the use of AI by research respondents
-
Join Us on June 24 for the AI+Africa Working Group Meeting – Early findings from our landscape study
-
Event Recap: Framing Made in Africa AI Approaches to MERL
-
Event recap: “Should we be using AI right now?” A conversation about the ethical challenges of the current moment