Disruptions: Data Without Context Tells a Misleading Story

Erik S. Lesser/European Pressphoto Agency Google’s Flu Predictor overestimated how many people had the flu this flu season.

Several years ago, Google, aware of how many of us were sneezing and coughing, created a fancy equation on its Web site to figure out just how many people had influenza. The math works like this: people’s location + flu-related search queries on Google + some really smart algorithms = the number of people with the flu in the United States.

So how did the algorithms fare this wretched winter? According to Google Flu Trends, at the flu season’s peak in mid-January, nearly 11 percent of the United States population had influenza.

Yikes! Take vitamins. Don’t leave the house. Wash your hands. Wash them again!

But wait. According to an article in the science journal Nature, Google’s disease-hunting algorithms were wrong: their results were double the actual estimates by the Centers for Disease Control and Prevention, which put the coughing and sniffling peak at 6 percent of the population.

Kelly Mason, a public affairs spokeswoman for Google, said the company’s Flu Trends site was meant to be only one source in addition to the C.D.C. and other flu surveillance methods. “We review and potentially update our model each season,” she said.

Scientists have a theory about what went wrong, as well.

“Several researchers suggest that the problems may be due to widespread media coverage of this year’s severe U.S. flu season,” Declan Butler wrote in Nature. Then add social media, which helped news of the flu spread quicker than the virus itself.

In other words, Google’s algorithm was looking only at the numbers, not at the context of the search results.

In today’s digitally connected world, data is everywhere: in our phones, search queries, friendships, dating profiles, cars, food, reading habits. Almost everything we touch is part of a larger data set. But the people and companies that interpret the data may fail to apply background and outside conditions to the numbers they capture.

“Data inherently has all of the foibles of being human,” said Mark Hansen, director of the David and Helen Gurley Brown Institute for Media Innovation at Columbia University. “Data is not a magic force in society; it’s an extension of us.”

Society has encountered similar situations for centuries. In the 1600s, Dr. Hansen said, an early census was recorded in England as the Great Plague of London killed tens of thousands of Britons. To calculate the spread of the disease, officials started recording every christening and death in the city. And although this helped quantify the mortality rate, it also created other problems. There was now an astounding collection of statistical information for scientists to review and understand, but it took time to develop systems that could accurately assess the information.

Now, as we enter a world of big data, we have to learn how to apply context to these numbers.

Dr. Hansen said the problem of data without context could be summed up in a quote from the playwright Eugène Ionesco: “Of course, not everything is unsayable in words, only the living truth.”

I experienced this firsthand in the spring of 2010, when I was an adjunct professor at New York University teaching graduate students in the Interactive Telecommunications Program.

I created a class called “Telling Stories With Data, Sensors and Humans,” with the goal of determining whether sensors and data could become reporters and collect information. Students built little electronic contraptions with $30 computers called Arduinos, and attached several sensors, including ones that could detect light, noise and movement.

We wondered if we could use these sensors to determine whether students used the elevators more than the stairs, and whether that changed throughout the day. (Esoteric, sure, but a perfect example of a computer sitting there taking notes, rather than a human.)

We set up the sensors in some elevators and stairwells at N.Y.U. and waited. To our delighted surprise, the data we collected told a story, and it seemed that our experiment had worked.

As I left campus that evening, one of the N.Y.U. security guards who had seen students setting up the computers in the elevators asked how our experiment had gone. I explained that we had found that students seemed to use the elevators in the morning, perhaps because they were tired from staying up late, and switch to the stairs at night, when they became energized.

“Oh, no, they don’t,” the security guard told me, laughing as he assured me that lazy college students used the elevators whenever possible. “One of the elevators broke down a few evenings last week, so they had no choice but to use the stairs.”

E-mail: bilton@nytimes.com

Please click here for more inforamtion

A version of this article appeared in print on 02/25/2013, on page B6 of the NewYork edition with the headline: Data Without Context Tells a Misleading Story.

facebook

twitter

google+

My Blog List

Monday, 10 June 2013

Disruptions: Data Without Context Tells a Misleading Story

A version of this article appeared in print on 02/25/2013, on page B6 of the NewYork edition with the headline: Data Without Context Tells a Misleading Story.

About Author

0 comments:

POST A COMMENT

Ad Space

Recent Post

Popular Post

Labels

Find us on FB

Pages

Send Quick Message

Gallery

About

About Us