You are not logged in. LOG IN NOW >

The Hunt for Open Data in China

BY Rebecca Chao | Wednesday, September 11 2013

No data in this stack of hay. (Perry McKenna/flickr)

Like water and oil, ‘open data’ and ‘China’ take a bit of engineering if you want them to mix. When human rights advocate Xu Zhiyong demanded officials disclose their wealth, he was not met with understanding or even indifference; he was thrown in jail like many truth-seekers before him.

Rather than ask for information, a group of young techies are surreptitiously going out and finding it, despite the risks of digging too deep.

Cui Anyong is among those who dare to envision a future in China where information is open. While studying new media at City University of Hong Kong, he coordinated a crowdsourced translation of the Data Journalism Handbook into Chinese. He also organized China’s first open data journalism meetup at the beginning of September. It sprouted from the Beijing chapter of Hacks/Hackers, a global initiative to bring the media together with civic-minded technologists.

“People always say there is no open data in China,” said Cui. “Actually, there is.” For starters, he explains, the government publishes water-air quality and earthquake data in real time, roughly every four hours. It operates three open data sites: Beijing Data, Data Shanghai and the National Bureau of Statistics. The data is still “very limited," says Cui but he believes, “it is the beginning” of something more.

Despite the short supply, Cui is able to use some ingenuity to draw meaning from jumbled and ill-formatted datasets. His first project intends to draw a correlation between industrial pollution and cancer villages, a link that the government has long denied.

In some regions of China, like the small rural community of Xinlong that sits next to an industrial park for chemical plants and manufacturers in Yunnan province, the water sometimes runs red and yellow, crops turn black and an alarming number of villagers have some form of cancer or other serious health problems. Particularly at the local level, officials have turned a blind eye. In February, the Communist Party reprimanded the Ministry of Environmental Protection for using the term "cancer village" in a report and he later apologized for "making a mistake." Party leaders then sent memos to provincial governments, warning them not to use the word, "cancer village."

Cui seeks to find an undeniable link between pollution and health by overlaying water quality and health data on a map. He has finished inputting the water data but is still working on adding a layer for health. He aims to complete his online app by next month.

Cui's water map is still in its initial stages.

Scraping the Web

Open data is not limited to government sites thanks to the Web. “The development of the Internet industry is really good in China so lots of people scrape data from Weibo and the news websites,” says Cui. “You can find lots of articles talking about how to scrape or how to visualize the information flow on Weibo.”

In one instance, a group of doctoral students mapped outbreaks of the H7N9 bird flu. In May 2009, reporter Deng Fei built a cancer village map using Google. It has nearly 50 locations though according to some reports, there are as many as 400 in China. Fei also initiated a data-driven campaign last winter on the micro-blogging platform Weibo, asking users to share photos of pollution. It went viral.

The best known project is Danger Maps, which received funding from Alibaba Group, China's e-Bay. The map uses a mix of open data and crowd sourcing to plot the location of industrial facilities and allows users to search for polluting factories near their homes.

In one sense, environmental maptivism has flourished because the government has not yet sought to shut it down. In fact, the Communist Party has allowed for some debate and criticism about China's environmental problems. In October 2012, three thousand rallied in Ningbo against the expansion of a chemical plant and the government later agreed to halt the project.

Liu Yan, founder of the hackerspace Xindanwei, recently coordinated the first government-led climate change hackathon-type event (she refrained from calling it a full-on hackathon because the Communist Party finds the term's connotations a bit too liberal). It marks a significant shift in the government's attitude towards open data, explains Liu.

"This is the first time that the government is providing all this data to the start-up and creative community and is working together with them," she says. "Also, top researchers from all over China are providing insight and knowledge. I was super excited because for start-ups, this is really a very important and unique opportunity to be connected to these data sets."

Opening Open Data

Even when data is available, it may sit there like canned food, requiring a special utensil to "open it" or in other words, turn it into a readable format.

San Francisco-based data design analyst, Bu Shujian, says in 2011, the Chinese government's JPEG files made it impossible for her to scrape the information in order to see how far off Beijing's readings were from those generated by the U.S. Embassy. Luckily, a colleague of hers offered to write a script to capture images of the online data every few hours and convert the images into data points. The effort was worthwhile and provided some unexpected findings.

“The results show that the [Chinese] government data is pretty accurate,” Bu said, which surprised her. She thinks the hype around China's inaccurate pollution ratings may have been spurred by the media, which pays attention to time stamps that reveal particularly large differences between the two reports when in fact, there is not much of a difference overall.

Bu's pollution charts.

Still, Chinese citizens have long mistrusted pollution data and use the brown, exhaust-filled air as proof that the government underreports air quality levels.

This mistrust comes in part from the government's efforts to close down competing sources of information. In June of 2012, China asked "other" governments to refrain from releasing their air quality data, a request experts believe was directed at the U.S. Embassy.

Renaud explains that overall, most of the data the Chinese government releases is difficult to corroborate. “There’s no way to check it,” he says. Consequently, he’s been looking into creating data-generating tools. “How can we design very cheap hardware, like censors and data fields to produce data," he asks. "That’s why there has been a push for open source hardware, to create devices that will generate data.”

Sophia Lin is the creator of the Make+ studio in Shanghai where developers and designers gather regularly to build open source hardware. Like a nesting doll, it was a hackerspace within a hackerspace, initially contained within developer David Li's Xin Che Jian, the first hackerspace in China.

"I thought, how about if I get some artists to collaborate with technologists who may not know much about art," explains Lin. "Maybe some innovative ideas could come out of it." Today, the collaboration has produced products like an air quality reader that can easily be assembled with materials purchased from the e-commerce giant, Taobao. At the heart of maker movements is widening access to tools traditionally available only to experts and governments.

Staying Within the Lines

One unfortunate side effect of open data's growing popularity is that the government begins to pay attention, leading to crackdowns.

Take the “human flesh search engine,” for example, a method for bringing down corrupt leaders using information found on Internet. Netizens comb the Web for incriminating pictures, public statements, and records: a list of properties under a relative's name or a snapshot of an official driving an expensive car or wearing a Rolex. While the Web-based sleuthing worked at first, the government quickly tightened its controls on public information, particularly at the local level.

Cui says that he too must tread carefully: “Water quality is very sensitive in China because the coasts bring in a lot of development for local governments.” But at least with young journalists and civic hackers enthusiasts like Cui and Bu, there is a longing to see data journalism in China reach a level comparable to the West. The New York Time's "Snowfall" is very popular among Chinese journalists, says Cui.

Asked if he would venture beyond environmental issues and dig into the financial records of high level corrupt officials, Cui exclaims, “I would not go to that length. It is still too dangerous.”

Personal Democracy Media is grateful to the Omidyar Network for its generous support of techPresident's WeGov section.

Editor's Note: This article has been revised to reflect a correction made on September 17, 2013. The original version of this article misstated the university from which Cui Anyong graduated. He went to City University of Hong Kong not Hong Kong University.