» New Data Reveals Political Bias in LLMs

Arctotherium:

On February 19th, 2025, the Center for AI Safety published “Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs” (website, code, paper). In this paper, they show that modern LLMs have coherent and transitive implicit utility functions and world models, and provide the methods and code to extract them. Among other findings, they reveal that larger, more capable LLMs have more coherent and more transitive (i.e., preferring A > B and B > C implies A > C) preferences.

Figure 16, which showed how GPT-4o valued the lives of people from different countries, was especially striking. This plot shows that GPT-4o values the lives of Nigerians at roughly 20x the lives of Americans (this came from running the “exchange rates” experiment in the paper over the “countries” category using the “deaths” measure).