Universal and Transferable Adversarial Attacks on Aligned Language Models
Large language models (LLMs) are typically trained on massive text corpora scraped from the
internet, which are known to contain a substantial amount of objectionable content. In an attempt to make AI systems better aligned with human values. Read more
You might also like
-
Welcoming our new AI+Africa Lead for the NLP-CoP: Vari Matimba
-
Do you see what I see? Insights from an inclusive approach to AI ethics governance.
-
New guide: key questions to ask before using GenAI for research on violence against women
-
RightsCon Recap – Assessing humanitarian AI: what M&E frameworks do humanitarians need in the face of emerging AI?