APPetit for Information

By Melissa Silmore (TPR'85)

Jason Hong picks up his new Android smartphone. He can’t resist. Never mind that he’s a respected professor of computer science. He’s a kid when it comes to gadgets and games. He’s taking a quick break. Perfect time to pull up that new batch of apps he downloaded. Hong clicks open the blackjack app. As he flips through a few hands, his mind wanders to some recent privacy research he read. He hops back to check the app’s list of permissions—what he allowed the software to access when he installed it. “That’s weird,” he thinks. For some reason, this simple card game is collecting his location information. Even for a tech-savvy academic, the unknowing divulgence of personal information is a surprising and unsettling realization.

That was 2008. Google had entered the app market not long before with its Android app store and sent a few complimentary phones to the researchers at the Carnegie Mellon Human-Computer Interaction Institute (HCII). One landed in Hong’s office, which was—and still is—littered with Nerf guns, Legos, toy cars, and his prized games. Fitting for the slight 38-year-old with round glasses who even today looks like one of his students, not a PhD. (His brother still dares him to sit with the students on the first day of class and remark, “Hey, I hear this guy’s really tough!”) But all that doesn’t obscure how seriously he and his team take their work and concern for privacy and security.

Hong was raised in South Carolina by Taiwanese parents—“just a good ol’ Southern boy,” he laughs—and showed an early gift for math and science. He joined HCII to continue working on privacy research he’d begun in graduate school. “We knew privacy for these mobile and sensor-based environments was a problem even then, but relatively little work had been done to address it,” he says. Most privacy research had focused purely on the computer itself, such as developing stronger encryption techniques to improve data security. At CMU, though, Hong became fascinated with a “human-centered” approach to the problem, like that of his collaborator Norman Sadeh, director of the Mobile Commerce Lab and another early privacy researcher. Some of Hong’s first work with Sadeh, for instance, involved the reasons people fall for phishing scams, those fraudulent emails sent to uncover personal information.

“We are currently entering a third age of computing,” Hong contends. The first stage was the development of computation. The second was the rise of communication—think the Web, Wikipedia, and social networking. Now we’re entering the stage some call ubiquitous computing. Sensors, mobile devices, and embedded computing are so entrenched in our daily lives, from cars to thermostats to lighting, that we’re scarcely aware.

Is ignorance bliss? Maybe it shouldn’t be.

“We’re entering this third age, and we haven’t solved the privacy problems of the first two,” he says. “It’s not even clear if we can solve them. And now we have all these new challenges, particularly with these commodity smartphones and the number of sensors they have. Light sensors, accelerometers, cameras. Not to mention what they know about you—who you’ve been calling, location data, geotags on your pictures. Privacy may be the most difficult problem for this third age.”

The conundrum of the blackjack game on his new phone that day in 2008 “got the hamsters running,” Hong remembers. Later that afternoon, two graduate students arrived in his office to discuss their ongoing projects, but Hong interrupted with his discovery. “I didn’t know it was this bad,” recalls Shahriyar Amini, now a fifth-year electrical and computer engineering doctoral student. “I would never have imagined that a card game would be using my location information.” They sat around Hong’s round table mulling the disturbing issue. How many apps were doing this? What other information were they collecting? And most importantly—why? A new research project was born.

First, they needed an approach. There were countless apps out there, and the numbers were growing daily. Previous app research had used only automated approaches, but these techniques would often flag legitimate uses—such as a navigation app using location—as problems. The team clearly needed human input. But they would have to find a way to gather data so that they could eventually examine public reaction—on a very large scale and in a workable time frame.

A week later, Amini popped into Hong’s office. He had a novel idea for the project—crowdsourcing. It was something they were all familiar with—a method of using humans to accomplish small tasks that computers can’t manage, like recognizing images or unusual text. Existing Web sites such as Amazon’s Mechanical Turk make accessing such a “crowd” fairly simple. They could post questions on the site, pay small amounts for answers, and hopefully achieve a much larger and quicker response than with traditional survey methods. Moreover, by posting individual questions and aggregating results, they could avoid a lengthy questionnaire that nobody would want to read, let alone answer. “Our plan was to use crowdsourcing to understand people’s perception regarding the privacy implications of using an app.”

The problem was crowd tasks were generally reading text, editing, and the like. Nobody had attempted questioning the public on a technical issue. And even if it could be done, how could they measure something as subjective as “privacy risk?” So, they continued to refine their ideas and look for funding, eventually supplied by the National Science Foundation, Google, and the Army Research Office. Two new members joined the team to add a second perspective—understanding users’ privacy concerns. They were Norman Sadeh and Jialiu Lin, a computer science doctoral student. Intrigued with the subject, Lin discussed her new research with her classmates. As she mentioned the personal data these apps were collecting, Lin was struck by their reactions. They were all as surprised as her team. It hit her. Why not use this?

As Hong puts it, “If people expect an app to do something, like Google Maps using location data, it’s like informed consent. If people don’t expect something, like a game using your contact list, there’s a mismatch. We can measure people’s level of surprise—their expectations—as one way of measuring privacy risk.”

The team began by examining the top 100 apps to learn what information they were accessing. In a tedious, by-hand operation, they determined that 56 of the 100 were using what could be considered sensitive information, including a phone’s unique device ID, location data, and contact list.

Consider this: There are more than 1 billion smartphones in use across the globe. On average, Americans spend 127 minutes each day on their smartphones, using 41 apps. There are more than 1 million apps available. An app displays its permissions—the information it will access— the millisecond prior to download. Do you, like everyone else, glance and hit the button?

Here’s what Hong’s team determined: While you’re slinging away those Angry Birds, they’re collecting your device ID and location. While that lifelike waterfall from Backgrounds HD Wallpapers is lighting up your screen, it’s gathering your device ID and contact list. And while your Brightest Flashlight app is helping you move through that dark hallway, it’s grabbing your device ID and location. Often, these apps, particularly the free versions, are sharing this personal data with multiple online marketers, who are likely refining their targeted ads, increasing revenue for everyone. But then, that’s only conjecture.

“We don’t fully know what they’re trying to do,” admits Hong. ”We know data is being sent to these companies but don’t know 100% what’s going on. We do have lots of guesses. Perhaps they’re trying to infer what zip code you live in, or where you work, for example, to then infer more demographics about you to tailor their ads.”

“Research at CMU has shown how much you can infer just by looking at someone’s location,” adds Sadeh, “such as what church you attend, what medical conditions you may have, your political affiliation, and more. Now there are close to 130 permissions—a lot of sensitive functionality—available to developers. It essentially opens the door for abuse.”

The research team next posted their questions to an online crowdsourcing Web site and within two weeks, had results. Good ones. Lin brought them to Hong and Sadeh at their biweekly meeting. Hong was surprised—the results exceeded the team’s expectations. The crowd was not only able to answer questions, they could provide valuable answers to academic research.

As for how the crowd answered: They were surprised and uncomfortable with every app that collected personal information without obvious reason. The apps that engendered the most surprise were: Brightest Flashlight (ID, location); Toss It game (ID, location); Angry Birds game (ID, location); Talking Tom virtual pet (ID); HD Wallpapers (ID, contacts); Dictionary.com (ID, location); Mouse Trap game (ID); Horoscope (ID, location); Shazam music (ID, location); and Pandora Internet Radio (ID, contacts).

And as shocked as people were with one app tracking them, they were probably unaware of a potentially bigger problem. Advertisers have made things so simple for app developers—just download a package and collect your share of the revenues—that developers often sign up and share your information with many.

Unfortunately, there’s more to be concerned about. With a few major advertising networks controlling the majority of the market, most people have multiple apps collecting and sending their information to the same few entities. The data can potentially be aggregated, allowing for an uncomfortably clear picture of a user. “They can actually combine all the information they gather to form a kind of life history of your cell phone, inferring where you live, your workplace, where your children go to school,” says Lin.

“What’s more concerning,” adds Sadeh, “is when you start putting together information across populations of people and mining this data, you can also identify, for instance, social relationships and much more.”

“And the problem is only growing,” notes Amini. “There are so many apps, more and more users. The market is growing so quickly. Every six months we get a new figure.”

The team is exploring potential solutions. Amini is developing software that automatically scans apps to determine the information accessed and quickly post results to the crowd for reaction. Hong notes the potential of a Web site that could give users, in real time, a simple app privacy rating—and, unlike permissions, before they’re about to hit the download button. Lin mentions the hope of using machine-learning approaches to discover patterns that could be generalized to the entire app market. And Sadeh notes the potential of clustering users and their differing preferences.

Hong is also concerned with a more pervasive consequence. He’d like to not only protect people’s privacy, but also to preserve their comfort with technology. “If people are really worried about these things,” he says, “it could blunt adoption of very promising kinds of technologies that could really benefit all of humanity in so many ways.”

“In the long term,” he adds, “this problem will require, first, some legislative action, like limitations on what data is collected and for what purpose. Second, we need to raise public awareness. Third, we want to help developers better understand what they’re doing and how to make the right choice. And finally, we want to help the end user by providing information to make better choices. It’s going to take a combination of at least these things together to solve this problem.” Hong pauses. “Well, probably more like manage.”

Recently, Hong presented the team’s research to a series of West Coast companies, for even as the Silicon Valley giants gather ad revenue, they’re concerned with the public’s comfort level and this new method of measuring it. He noticed that as he reached the privacy portion, members of the audience began playing with their phones. After one such talk, a woman approached him. “I uninstalled those apps while you were speaking,” she said. “And so did a lot of others.”

Melissa Silmore (TPR’85) is a Pittsburgh-based freelance writer and a regular contributor to this magazine.

APPetit for Information

Explore