Skip to main content

Representation Matters in AI-Generated Images

Media Inquiries
Heidi Opdyke
Mellon College of Science
Peter Kerwin
University Communications & Marketing
A prompt for Mexican dancers from an AI image generator produced the strange ballerinas at left; a new CMU-designed filter makes the image more appropriate and realistic, right. Image credit: Zhixuan Liu, Jean Oh, et al. 2024. SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation.

A prompt for Mexican dancers from an AI image generator produced the strange ballerinas at left; a new CMU-designed filter makes the image more appropriate and realistic, right. Image credit: Zhixuan Liu, Jean Oh, et al. 2024. SCoFT.

Results from artificial intelligence image generators can range from appropriate to downright offensive — particularly for cultures that aren’t well represented in the internet’s data. 

An international team led from Carnegie Mellon University used the Pittsburgh Supercomputer Center’s Bridges-2 system and input from several different cultures to develop an effective fine-tuning approach, Self-Contrastive Fine-Tuning(opens in new window) (SCoFT), for retraining a popular image generator so that it can generate equitable images for underrepresented cultures.

Jean Oh

Jean Oh

A research team led by Jean Oh(opens in new window), associate research professor at CMU’s Robotics Institute(opens in new window), is working on how to make generative AI models aware of the diversity of people and cultures. 

“We wanted to use visual representation as a universal way of communication between people around the world,” said Oh. “We started generating images about Korea, China and Nigeria. We immediately observed that the popular foundation models are clueless about the world outside the U.S. If we redraw the world map based on what these models know it will be pretty skewed.”

Toward this goal, her team developed a novel fine-tuning approach and, thanks to an allocation from the NSF’s ACCESS project, used PSC’s Bridges-2 supercomputer to train new models and run sets of experiments to verify the performance of the proposed approach.

Bridges-2 enhances AI image generation

At one point, scientists developing the AI approaches underlying image generation thought that more available data would generate better results. Models trained on the internet, though, didn’t quite turn out that way.

Deep-learning AIs learn by brute force, beginning by making random guesses on a training dataset in which humans have labeled the right answers. As the computer makes good or bad guesses, it uses these labels to correct itself, eventually becoming accurate enough to test on data for which it isn’t given the answers. For the task of generating images based on requests made with text, an AI tool called Stable Diffusion is an example of the state of the art, having trained on the 5.85-billion text-to-image-pair LAION dataset.

But ask Stable Diffusion to give for a picture of a modern street in Ibadan, Nigeria, and it creates something that looks more like a Westerner’s negative stereotype. Other images may be less obviously offensive. In some ways that’s worse, because it’s harder to identify.

To improve on this, the RI team recruited people from five cultures to curate a small, culturally relevant dataset. Although this Cross-Cultural Understanding Benchmark (CCUB) dataset had only an average of about 140 text-to-image pairs for each culture, it allowed the team to retrain Stable Diffusion to teach it to generate images portraying each culture more accurately with less stereotyping when compared to the baseline model. The team also added the same fine-tuning step to images generated by the popular GPT-3 AI image generator.

Bridges-2 proved ideal for the work. PSC’s flagship system offers powerful image- and pattern-recognition-friendly graphics processing units (GPUs), and an architecture designed to help large data move efficiently through the computer without logjams. This enabled the scientists to fine-tune the AI in progressive steps that significantly improved the impressions that 51 people from five recipient cultures had from the resulting images. Their SCoFT method improved the judges’ perception of how well the image matched the text query and represented their cultures, and reduced the images’ offensiveness,

The team will present a paper on their work(opens in new window) at the 2024 IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR 24) in June. 


The Pittsburgh Supercomputing Center is a joint computational research center with Carnegie Mellon University and the University of Pittsburgh. PSC provides university, government and industrial researchers with access to several of the most powerful systems for high performance computing, communications, and data storage available to scientists and engineers nationwide for unclassified research.

Learn more(opens in new window)

— Related Content —