Large language model applications for evaluation: Opportunities and ethical implications
Original article for “New Directions for Evaluation” by Cari Beth Head, Paul Jasper, Matthew McConnachie, Linda Raftree, Grace Higdon.
Abstract:
Large language models (LLMs) are a type of generative artificial intelligence (AI) designed to produce text-based content. LLMs use deep learning techniques and massively large data sets to understand, summarize, generate, and predict new text. LLMs caught the public eye in early 2023 when ChatGPT (the first consumer facing LLM) was released. LLM technologies are driven by recent advances in deep-learning AI techniques, where language models are trained on extremely large text data from the internet and then re-used for downstream tasks with limited fine-tuning required. They offer exciting opportunities for evaluators to automate and accelerate time-consuming tasks involving text analytics and text generation. We estimate that over two-thirds of evaluation tasks will be affected by LLMs in the next 5 years. Use-case examples include summarizing text data, extracting key information from text, analyzing and classifying text content, writing text, and translation. Despite the advances, the technologies pose significant challenges and risks. Because LLM technologies are generally trained on text from the internet, they tend to perpetuate biases (racism, sexism, ethnocentrism, and more) and exclusion of non-majority languages. Current tools like ChatGPT have not been specifically developed for monitoring, evaluation, research, and learning (MERL) purposes, possibly limiting their accuracy and usefulness for evaluation. In addition, technical limitations and challenges with bias can lead to real world harm. To overcome these technical challenges and ethical risks, the evaluation community will need to work collaboratively with the data science community to co-develop tools and processes and to ensure the application of quality and ethical standards.
You might also like
-
Tech Salon recap: listen more and shift away from Western-centric framing to better address online violence against women and girls
-
Savita Bailur joins MTI as a Core Collaborator
-
Reviewing Mirca Madianou’s new book, “Technocolonialism: When Tech for Good is Harmful”
-
Welcoming our new AI+Africa Lead for the NLP-CoP: Vari Matimba