Authoritarianism Could Poison AI
The artificial intelligence (AI) revolution is gathering steam, poised to revolutionize industry and change how we access information. The power of AI rests on using massive volumes of human-created data to train it. The world’s most famous chatbot, ChatGPT, uses a training model that inputs billions of texts from around the world to unlock human-like capabilities.
The all-encompassing nature of AI models—taking in vast quantities of information available over the Internet—has recently run up against issues of copyright and fair use. But an even bigger question lies at the heart of AI—how is the data available to it affected by censorship and propaganda?
Put another way, AI faces an authoritarian data problem. Understanding how Internet authoritarianism is impacting AI—and how AI could even, perversely, enhance authoritarianism—is a critical task for shaping the technology’s future.
Social scientists must work diligently to address the influence of authoritarian regimes and formulate a democratic vision for AI governance.
Authoritarian Data Pollution
The age of AI coincides with rising authoritarianism across the planet. Four in five people on earth live in countries considered by Freedom House to not be fully free. By controlling what is disseminated across their online space, authoritarian regimes inevitably influence large swathes of the Internet—which may then be used to train AI.
AI trained predominantly on English sources have been shown to generally reflect popular opinion in Western democracies. But in sophisticated authoritarian systems like China and Russia, the Internet shaped by human and algorithmic censors that delete content, manipulate search results, prevent reporting on certain topics, and ban accounts deemed subversive.
Even in less advanced authoritarian systems, governments still constrict their online space by limiting access to the Internet, especially during times of upheaval. Moreover, citizens may fear punishment for sharing anti-regime content, and self-censor in response. All of this limits the types of content available online—including content available to AI.
Authoritarian states not only restrict online data production—they also create content that can leave its mark on training models. Propaganda long predates the Internet, but it now spreads in new ways. Authoritarian regimes fund automated bots and human “astroturfers” to parrot regime talking points while mimicking genuine online discourse. This method of molding the Internet to the regime’s liking can be difficult to distinguish from free exchange, leaving programmers unable to prevent it from polluting AI training models.
The effects on AI trained in these countries’ national languages may be profound. More than nine in ten Chinese speakers are located in mainland China, and eight in ten Russian speakers are in Russia. Therefore, AI trained using these languages must rely largely on data from online spaces managed by authoritarian governments.
The influence of these governments’ curation of online information is found even in chatbots developed in the democratic world. Wikipedia—a treasure trove of data for training AI—is banned in China, where a government-backed alternative called Baidu Baike boasts over 27 million entries, in comparison to Chinese-language Wikipedia’s 1.3 million. Encyclopedic data available in Chinese tilts decisively in favor of government-approved sources.
Even ChatGPT, which is trained using a limited number of non-English-language data, is affected by the imbalanced ratio of government to non-government sources. When asked why Mao Zedong was a great leader, the chatbot offered more praise for the 20th century strongman when asked in Chinese than in English. The results are even more skewed in other AI models which take in more non-English-language data.
Another Tool in the Authoritarian Toolkit
For authoritarian regimes themselves, AI presents both challenges and opportunities. AI trained using content from free and open online spaces in the democratic world poses challenges to authoritarian censorship regimes. China, for example, has banned ChatGPT for fear it could create a backdoor in the Great Firewall of China.
But data from the democratic world could also enhance AI as a censorship tool. Training AI to detect and remove subversive content, using data from countries without Internet censorship, could improve the efficiency of censorship programs under authoritarian regimes. Ironically, these same regimes have limited access to such subversive data in their domestic Internet, but can gather it from the democratic world.
In other words, while authoritarian repression limits the information available to programmers in the democratic world, democratic openness could inadvertently improve authoritarians’ ability to censor their citizens.
A Democratic AI Vision
AI promises to be one of the most influential innovations of our time. But unlike previous technological game changers, it relies on data that is social in nature to derive its effectiveness. This weaves politics deeply into the fabric of AI—including the politics of propaganda and repression.
Addressing the influence of authoritarianism on AI requires more than just a technical fix. Changes to algorithms, computational power, or data input volumes are insufficient to solve the authoritarian data problem. One early experiment combining social science and principles of democracy in governing AI is the “alignment assemblies,” which are hybrid convenings of ordinary citizens that, through deliberation, discover user preferences and use them to help train AI systems.
It is up to social scientists to deepen our understanding—both quantitively and qualitatively—of how propaganda and repression are influencing AI training data, and to articulate ways in which this influence can be ameliorated. This duty will have profound consequences for the trajectory of AI.
Eddie Yang is a Ph.D. candidate in political science at UC San Diego, and a 2023-24 Dissertation Fellow at the UC Institute on Global Conflict and Cooperation (IGCC).
Global Policy At A Glance
Global Policy At A Glance is IGCC’s blog, which brings research from our network of scholars to engaged audiences outside of academia.
Read More