Much Ado About Data: How America and China Stack Up

Matt Sheehan:

Analysts often cite the amount of data in China as a core advantage of its artificial intelligence (AI) ecosystem compared to the United States. That’s true to a certain extent: 1.4 billion people + deep smartphone penetration + 24/7 online and offline data collection = staggering amount of data.

But the reality is far more complex, because data is not a single-dimensional input into AI, something that China simply has “more” of. The relationship between data and AI prowess is analogous to the relationship between labor and the economy. China may have an abundance of workers, but the quality, structure, and mobility of that labor force is just as important to economic development.

Likewise, data is better understood as a key input with five different dimensions—quantity, depth, quality, diversity, and access—all of which affect what data can do for AI systems.

What follows is a framework for analyzing the comparative advantages of countries and companies across the five dimensions, with the aim of bringing more precision to comparisons of how America and China stack up. This is, however, just one framework, and I welcome critiques and suggestions on how to quantitatively measure each of these dimensions.

Why Does Data Matter To AI Systems?

Before getting to the five dimensions, a detour into data’s role in AI systems is in order.

Advances in AI have given computers superhuman pattern-recognition skills: the ability to wade through oceans of digital data, spotting thousands of hidden patterns or correlations between inputs and outcomes. AI systems then use those correlations to make inferences or predictions, “learning” how to perform a task based on the examples it has seen in the data.