Searching for Privacy

Searching for Privacy

Lorrie Cranor, Noah Smith & Norman Sadeh

As you search the web, do you wonder how those sites are searching you? New research led by Carnegie Mellon University's Norman Sadeh, professor of computer science, aims to help users make quick sense of those lengthy and confusing website privacy policies.

"This is a project that spans technology — such as machine learning, natural language processing and formal methods — human computer interaction, user modeling, cognitive and behavioral science and law," Sadeh noted. In the best CMU tradition, it's an interdisciplinary effort with colleagues from across Carnegie Mellon's schools and in conjunction with law school researchers from Fordham and Stanford universities.

"People are increasingly aware that information about them is being collected, used, shared and recombined in all sorts of ways," Sadeh said. "But they have no practical way of finding out about these practices and making informed decisions. Hardly anybody reads privacy policies and when they do, they usually can't answer even the most trivial questions about them."

Crowdsourcing will first help to identify the privacy issues that matter most to users, a challenging task in itself. With a vast number of ever-changing policies, the team will then turn to computers and natural language processing to routinely scan the web for relevant policy fragments. Crowdworkers can then help analyze these smaller passages regarding specific points.

"Instead of asking a crowdworker to read five pages of a privacy policy, we might be able to show them just five sentences and ask them what they say (or don't say) about a particular question," Sadeh said. "And we suspect that in a number of cases, we might eventually be able to figure out the answer automatically, using machine learning."

The 3½ year, $3.75 million Usable Privacy Policy Project is sponsored by the National Science Foundation through its Secure and Trustworthy Cyberspace (SaTC) program.

The team hopes to have results in as little as two years' time, and a major browser vendor has already expressed strong interest in incorporating the findings in the form of a plug-in. There's also the potential to apply similar techniques to mobile app privacy policies.

Sadeh credits CMU with helping him to further his research.

"My colleagues at other universities often tell me how envious they are of our ability here at CMU to assemble interdisciplinary teams such as this one," Sadeh said. "Most people have come to realize how critical the ability to conduct cross-disciplinary research is to progress in many areas. At CMU, this is part of who we are and have been for many years."


Related Links: School of Computer Science | Institute for Software Research | Human-Computer Interaction Institute | Machine Learning Dept | Language Technologies Institute | Read press release