China’s Science and Technology Ecosystem in Focus: Introducing the China Policy Document Navigator
IGCC’s China Science, Technology, Innovation, and Industrial Policy (STIIP) project has spent the last three years shining a light on developments in China’s scientific enterprise as the country positions itself to compete with the United States for leadership in strategic technologies that will determine the course of the 21st century. The project culminates with the launch of the China Policy Document Navigator, a first-of-its-kind database that gives researchers curated access to a treasure trove of Chinese policy documents that have become increasingly difficult to otherwise access.
IGCC’s Paddy Ryan talks with project leaders Barry Naughton and Young Yang to discuss why the data portal is needed, what they’ve learned from it so far, and its role in advancing academic and U.S. government research during a pivotal time in Chinese science, technology, and innovation policy.
What prompted the creation of the Policy Document Navigator?
Barry Naughton: We’re living in a period in which technological progress is extremely rapid, and technological competition between the United States and China is increasingly intense. This project was designed to increase our understanding of how China’s government develops and executes its priorities with respect to technology and—more broadly—industrial policy.
The first step in this project was to understand the explicit policies that Beijing adopts and promulgates. That’s a bigger challenge than it might seem at first—after all, the Chinese Communist Party isn’t the most transparent organization in the world. So to accomplish this initial goal, it became clear that we needed to create a database of government policy documents, which led to the Policy Document Navigator.
What makes an integrated database like the Policy Document Navigator key to this objective?
Barry Naughton: First off, China’s a big and complex country. There are many different levels of government that are pursuing myriad different policies—and those policies aren’t necessarily easy to access.
Second, tensions between the United States and China have increased. Because of that, a lot fewer documents have been publicly released, and even some that have been released are being taken down from the web.
Taken together, the purpose of this database is to make sure that we preserve public access to this diverse range of industrial and technology policies.
Young Yang: As Barry alluded to, China researchers face enormous challenges when it comes to transparency. Our goal, therefore, is to provide as complete, transparent, and accurate a record as possible.
But government control isn’t the only thing that impedes transparency. Another thing we’ve had to consider is that China’s bureaucratic system is quite fragmented. That results in different policies being implemented at various levels of government—and that fragmentation makes it difficult to keep track of how overall policy priorities are implemented.
Integrated data sets help us to trace the diffusion of these various policy priorities—not only from Beijing to local governments, but also horizontally from one province to another.
What does the database contain and how were these documents collected?
Young Yang: In short, the database is the most comprehensive and structured collection of Chinese science, technology, and innovation policies from 2011–22. The database covers national, provincial, and prefectural levels across three Five-Year Plan cycles.
The database contains three main sets of documents. The first are the flagship science, technology, and innovation (STI) Five-Year Plans, totaling 95 at the national and the provincial levels, and 525 at the prefecture level.
Second are the 23 strategic emerging industry and future industry plans which focus on frontier sectors like artificial intelligence (AI), quantum technology, and advanced manufacturing.
And third are the more than 46,000 (and counting) documents covering additional STI-related policies, including regulations, guidelines, and program announcements, collected from national ministries down to municipal science bureaus.
We collect these documents in two ways. For the most high-profile plans, we hand curate them to ensure accuracy and completeness. For the broader set of documents, we engage in the large-scale scraping of Chinese government websites. We then use a large language model and custom typing model to filter, tag, and classify these documents by issue, date, and content.
All this together means that, for the first time, you can accurately search and filter through China’s fragmented STI policymaking across time and levels of government. The database allows China analysts to review who is making policy, when major policy shifts happen, and which topics are prioritized. This helps researchers uncover how Beijing’s big strategic goals translate into concrete and localized action.
In the spring, IGCC shared the databases publicly for the first time at an event in Washington, D.C. Who are the intended users, what will they be able to do, and what kind of feedback are you getting so far?
Barry Naughton: The project was funded by U.S. government research money, so naturally, priority access goes to our partners in the State Department, the Office of the U.S. Trade Representative, the Treasury, and the Department of Commerce, among other agencies.
A wide range of federal and congressional offices are trying to improve their access to original Chinese sources with an eye toward improving the quality of China policy between U.S. administrations. There’s a heightened focus on China in Washington, but policy remains very much in flux. For that reason, this kind of information is especially valuable in government.
Having said that, this is a publicly accessible database which we have structured to be open to many different kinds of users, whether in government, academia, or elsewhere. It’s set up so that users can define their own research questions and approach the data in the manner most suited to their objectives.
Are there any initial insights you can share about what you’ve learned so far putting the data portal together?
Barry Naughton: The first key takeaway concerns the recent and newsworthy advances of AI in China, epitomized by the January release of DeepSeek-R1. We see in our policy data base that China in fact began to prioritize AI research about 10 years ago. So China’s success with large language models isn’t just a one-off, it’s the product of a long-term commitment to research in this area.
Second, we find evidence that Chinese science policymakers are pushing researchers towards more fundamental core scientific issues, such as genetic research and quantum communication and computer. And finally, we see an emphasis from officials on accountability and oversight of China’s researchers. They’re really working to make sure that scientists follow the priorities laid down by government institutes.
Young Yang: I’d add that the portal shows different patterns of technology prioritization in China, particularly for AI and quantum. These two emerging technologies diffuse rather differently in China’s innovation system, and the data allows us to make clear comparisons between the two.
In quantum computing, we’ve identified 5,000 policy documents in total across all 31 provinces between 2011–22. The topic distribution shows a heavy emphasis on management and financing from policymakers, as well as research and development (R&D) strategies. The upshot is that this sector is very specialized—provinces that have high research capacity like Beijing, Jiangsu, and Anhui are home to major quantum research centers, and it appears that quantum computing is being treated as a strategically guided initiative, but with provinces taking the lead locally.
By contrast, we see in AI much faster and more widespread policy adoption. There has been a major surge in locally published policy documents beginning in 2017 after the release of the central government’s New Generation Artificial Intelligence Development Plan. Unlike quantum, it appears that AI-related policies are spread across a wider range of topics, not just science and R&D, but even agriculture and healthcare, which means provinces can tailor AI strategies to their industrial strengths. This suggests that Chinese policymakers view AI as a transformative general-purpose technology that can be rapidly localized across many different types of provincial economies.
In short, this portal shows us that AI policies diffuse broadly and quickly, whether a national plan is announced or not, while quantum computing policies remain more concentrated and tied to specific research hubs. This levels of granularity, both over time and across geographies, was impossible to track before we had a comprehensive database like this.
China’s next Five-Year Plan, which will cover 2026–30, is expected to be released in the fall. What are you anticipating?
Barry Naughton: We anticipate continuity—it’s expected that the government will further focus scientific research on a core suite of critical technologies that lie at the very center of competition between China and the United States. We can say with a very high degree of confidence that AI and advanced materials will continue to be a focus. We expect the emphasis on genetics and clinical research to continue to increase. Quantum communications and optoelectronics are key areas to watch.
But regardless of what we anticipate, the real advantage of having this data set is that, when the plan comes out, it will enable us to react quickly should our simple expectations turn out to be not quite right. It will be interesting in the coming year to use this database to systematically analyze and uncover what’s really different in the newest plans as they emerge.
So, how can researchers access the database?
Young Yang: All public users will have to register an account on our website. The site will direct to a login page, with a link to register if you don’t have an account. After registration, we will send an activation link to the email associated with the account, and after clicking on that email, you will use that username and password to get into the public database.
If you want to access to the whole data set, we require you to send an email to IGCC with your name, institution, and research purpose. After reviewing all this information, we grant access to the entire data base.
The portal is now live and can be accessed here.