Privacy and AI: Data Scientists Need to Care; Government Regulation Could Result in Fewer Data Sources

For issue importance, left scale, data privacy is the highest rated issue, impacting over 65 percent of people within 10 years. Source: Future of Humanity Institute, University of Oxford.

By AI Trends Staff

Privacy —  the Future of Humanity Institute at Oxford rates this as the most severe problem we will face over the next 10 years.

Data scientists should care because more data means better models and less data means less accurate models. The value we bring to the table will be directly impacted if government regulation takes many of our data sources off the table, suggests a report in Data Science Central.

However, “privacy” has become what Marvin Minsky described as a ‘suitcase word’.  That is it can carry such a variety of meanings that it can refer to many different experiences.  After all, who doesn’t want more privacy? Similarly ‘freedom’ or ‘democracy’ or ‘safety’ are words used so broadly that if asked, the public will respond from their own personal point of view, not necessarily what any survey is seeking to discover.

Targeted ads were seen as an inappropriate use of personal data by 51 percent of respondents to a survey conducted by YouGov. The public wants to know exactly how they are being tracked.  

The use of ad blockers against targeted ads is on the rise. Here are the top six reasons for it found by Hubspot: 

  • annoying/intrusive (64%)
  • disrupt what I’m doing (54%)
  • create security concerns (39%)
  • better page load time/reduced bandwidth use (36%)
  • offensive/inappropriate ad content (33%)
  • privacy concerns (32%)

These responses tend to support the idea that it’s frequency and intrusiveness that drives resistance, not necessarily the privacy of the data used.

GDPR Costly to Business In First Year 

 After a year in effect, GDPR in the European Union is being seen as a drag on business. A recent review by the Center for Data Innovation found a laundry list of unintended consequences:

  • Negatively affects the EU economy and businesses.
  • Drains company resources.
  • Hurts European tech startups.
  • Reduces competition in digital advertising.
  • Is too complicated for businesses to implement.
  • Fails to increase trust among users.
  • Negatively impacts users’ online access.
  • Is too complicated for consumers to understand.
  • Is not consistently implemented across member states.
  • Strains resources of regulators.

Companies are spending an average of $1.8 million each on compliance. An estimated 30 percent of previously available news and information services have been withdrawn from the market over the failure to be able to comply.

GDPR Conflicts with Promise of Machine Learning

The GDPR seems to conflict directly with some of the promise of machine learning applied to data analysis, according to a recent account in Towards Data Science.

For example, under the ‘scope of applicability’ section, GDPR states that the regulation applies to “all data about EU citizens that could potentially identify a data subject”. Is it possible to categorically exclude any system that processes data about people from being under the purview of GDPR? 

One of the most valuable activities that helps a data science project progress leading to innovation and interesting insights is exploratory data analysis (EDA). The GDPR states that a data processor should not use the data for any purpose beyond the original intent without securing further consent from the data subject. This would at best significantly slow down exploratory efforts.

The GDPR explicitly debars data subjects being made subject to outcomes of autonomous decision making. However, that is the essence of many mainstream use cases of AI/ML. The clause implies that one cannot apply ML to data about EU citizens. Does showing an ad when someone’s browsing a site constitute being subject to automated decision making? Many issues are subject to interpretation. A saving grace may be that the prohibition does not apply if the autonomous decision making is (a) necessary for meeting contractual obligations or needed for legal reasons or (b) when the data subject has explicitly consented to it.

Read the source articles in Data Science Central and Towards Data Science.