This data set helps researchers spot harmful stereotypes in LLMs

“I hope that folks use [SHADES] as a diagnostic instrument to establish the place and the way there may be points in a mannequin,” says Talat. “It’s a method of figuring out what’s lacking from a mannequin, the place we will’t be assured {that a} mannequin performs nicely, and whether or not or not it’s correct.”

To create the multilingual dataset, the group recruited native and fluent audio system of languages together with Arabic, Chinese language, and Dutch. They translated and wrote down all of the stereotypes they may consider of their respective languages, which one other native speaker then verified. Every stereotype was annotated by the audio system with the areas through which it was acknowledged, the group of individuals it focused, and the kind of bias it contained.

Every stereotype was then translated into English by the contributors—a language spoken by each contributor—earlier than they translated it into extra languages. The audio system then famous whether or not the translated stereotype was acknowledged of their language, creating a complete of 304 stereotypes associated to folks’s bodily look, private identification, and social elements like their occupation.

The group is because of current its findings on the annual convention of the Nations of the Americas chapter of the Affiliation for Computational Linguistics in Might.

“It’s an thrilling strategy,” says Myra Cheng, a PhD scholar at Stanford College who research social biases in AI. “There’s an excellent protection of various languages and cultures that displays their subtlety and nuance.”

Mitchell says she hopes different contributors will add new languages, stereotypes, and areas to SHADES, which is publicly available, resulting in the event of higher language fashions sooner or later. “It’s been a large collaborative effort from individuals who need to assist make higher know-how,” she says.

Source link

Powering next-gen services with AI in regulated industries

The problem with AI agents

Inside Amsterdam’s high-stakes experiment to create fair welfare AI

How to Turn Social Media Moments Into Newsworthy Stories That Captivate Audiences

My Hands-On Journey with Google Cloud’s Vertex AI: Building Real-World GenAI Applications with Gemini & Imagen | by Sathvikambekar | Apr, 2025

How to Use Open-Source Tools for Data Governance

How Golden Visas and Second Passports Are Transforming Wealth Strategies

MBA Grads From Top Schools Struggling to Find Work: Report

Most Popular

Q&A: A roadmap for revolutionizing health care through data-driven innovation | MIT News

Ecologists find computer vision models’ blind spots in retrieving wildlife images | MIT News

Manufacturing Digital Transformation Could Lead to Increased Data Security Risks

Our Picks

Predicting Token Sale Probabilities with Lock-up x ROI Using Random Forest | by Yann MASTIN | Mar, 2025

Are Data Scientists at Risk in 2025? | by Natassha Selvaraj | Feb, 2025

The Total Derivative: Correcting the Misconception of Backpropagation’s Chain Rule

This data set helps researchers spot harmful stereotypes in LLMs

Related Posts