CIO

Carnegie Mellon University helps you control your privacy

Managing your personal data in the digital age has become almost impossible. But Carnegie Mellon University has found that a combination of natural language processing, privacy preference modeling, machine lanuguage, crowdsourcing and privacy interface design may make the impossible possible.

With the advent of the internet age, controlling your personal data rapidly became impractical without cutting both internet and mobile cords. As internet of things (IoT) and big data technologies proliferate, the ways your data can be collected and processed are expanding even more quickly. And that, potentially, puts users, government and business on a collision course where privacy is concerned.

Getting your arms around privacy in the digital age is no simple feat. One of the key issues is the ubiquitous privacy policy on websites, says Norman Sadeh, professor in the School of Computer Science at Carnegie Mellon University (CMU), director of CMU's Mobile Commerce Lab, founder and chairman of Wombat Security Technologies and formerly chief scientist of the European Union's initiative in ecommerce. Sadeh is also co-director of CMU's Master's program on privacy engineering, which focuses on engineering new technology products with privacy considerations baked in from the beginning, reflecting the tradeoffs between functionality and privacy.

"Website privacy policies have become a de facto standard to address expectations of "notice and choice" on the Web," Sadeh says. "The idea is that users should be able to obtain information about the practices of websites they visit, and be able to make meaningful decisions on that basis. In practice, however, users are known to never read privacy policies."

Even when users do read privacy policies, they often struggle to understand what they read, Sadeh says. Mobile apps are frequently even more impenetrable. Sadeh notes that users often have between 50 and 100 mobile apps, each with controls for between three and five privacy permissions — if the apps even expose the permissions they're granted.

"Do you know what your settings are for your mobile apps on your cell phone? Nobody has any idea," Sadeh says. "It's overwhelming."

It's complicated (and inconsistent)

"The reason privacy is challenging is that not everyone feels the same way about disclosing information to different entities for different purposes," he adds. "If everyone felt the same way, it would be easy. People are more complex than that. We have diverse preferences with regard to information we're willing to share."

Organizations have attempted to address the issue in the past. In 2002, the World Wide Web Consortium (W3C) unveiled the Platform for Privacy Preferences Project (P3P), a protocol through which websites would declare the intended use of the information they collected from users. In 2010, the U.S. Federal Trade Commission called for a "do not track" system. Soon after, the major browsers — Mozilla Firefox, Microsoft Internet Explorer, Apple Safari, Google Chrome and Opera — all added support for Do Not Track (DNT) headers that requested web applications disable tracking of individual users. The W3C is still working to standardize DNT.

Sadeh notes that while these projects aimed to develop machine-readable formats to convey websites' data practices, many website operators have proved reluctant to embrace them. Moreover, there is no legal requirement to implement P3P or to honor DNT.

In 2012, when it released Windows 8 to manufacturing, Microsoft announced that DNT would be enabled by default in Internet Explorer 10 as part of the operating system's "Express Settings" set-up option. The result was a furor from advertising companies, who argued DNT should be opt-in rather than enabled by default.

As a result, the Digital Advertising Alliance (DAA) trade association told its members they need not honor DNT: "The DAA does not require companies to honor DNT signals fixed by the browser manufacturers and set by them in browsers. Specifically, it is not a DAA Principle or in any way a requirement under the DAA Program to honor a DNT signal that is automatically set in IE10 or any other browser. The Council of Better Business Bureaus and the Direct Marketing Association will not sanction or penalize companies or otherwise enforce with respect to DNT signals set on IE10 or other browsers."

In 2015, Microsoft announced that DNT would no longer be enabled by default in Windows 10, but it would provide users clear instructions for turning on the feature.

Taking privacy out of websites' hands

Sadeh says that a practical framework for empowering users to control their privacy can't rely on the cooperation of website operators. Instead, the power must shift to the users themselves, and a combination of natural language processing (NLP), privacy preference modeling, crowdsourcing and privacy interface design may be exactly what the doctor ordered.

Sadeh is involved with two projects — The Usable Privacy Policy Project and the Personalized Privacy Assistant Project aimed at making that happen.

The Usable Privacy Policy Project is a Frontier project funded under the National Science Foundation's Security and Trustworthy Cyberspace (SaTC) Initiative. It brings together Carnegie Mellon University, Fordham Law School's Center on Law and Information Policy (CLIP) and Stanford Law School's The Center for Internet and Society (CIS).

"The goal of the project is to see to what extent we might use computers to semi-automatically extract information from privacy policies — extract, in fact, answer those key questions that users really care about and provide information back to users through browser plug-ins that will make this information very easy for users to digest," Sadeh explains. "To do this, we're combining machine learning, natural language processing and crowdsourcing technologies. We've been able to show that despite prior research results suggesting that users often do not understand privacy policies, it is possible, by bringing together enough crowd workers, to very often extract meaningful answers to key questions."

The Personalized Privacy Assistant Project is a CMU-funded effort to create intelligent agents capable of learning the privacy preferences of their users over time, using machine learning. These agents would be capable of semi-automatically configuring many settings and making many privacy decisions on their users' behalf. The privacy assistants could alert users about practices the users may not feel comfortable with and even occasionally nudge users to reconsider the implications of some of their privacy decisions. In the case of mobile apps, Sadeh says the assistant could even recognize when an app is leveraging user data without asking permission to do so.

Sadeh notes the project consists of multiple research strands. It is driven by user-centered design processes that translate personal privacy preference models, transparency mechanisms and dialog primitives into the assistant functionality.

Businesses benefit, too

In the end, Sadeh says the research is also likely to pay dividends for businesses. While people have diverse preferences with regard to the information they're willing to share, those preferences often fall into a relatively small number of buckets.

"There's lots of correlation. You can effectively leverage these correlations to organize users in different buckets," he says. "For developers, it means that if you're going to be looking at whether you're going to be collecting a given piece of information, we could actually tell you how many people in your target population of users are likely to be uncomfortable with that."

That information, he notes, could help an organization decide whether or not it's worthwhile to develop a new functionality.