Universal and Transferable Adversarial Attacks on Aligned Language Models
Large language models (LLMs) are typically trained on massive text corpora scraped from the
internet, which are known to contain a substantial amount of objectionable content. In an attempt to make AI systems better aligned with human values. Read more
You might also like
-
We’re hiring: Join MTI as our new AI+Africa Lead for the NLP-CoP
-
Welcoming our new NLP-CoP Community Manager: Bárbara Paes
-
Join us for a special NLP-CoP members-only event with: Mirca Madianou, author of Technocolonialism
-
Humans in the Machine: the Impact of AI on workers – Learn More on February 6th