Best Permission Practices Sought for Data Collection in AI Research

2095
AI researchers in healthcare are more cognizant of whether the data they need was collected in accordance with the best permission practices. (GETTY IMAGES)

By John P. Desmond, AI Trends Editor

AI researchers in healthcare are refining ways to ensure data they work with has been obtained with proper permissions, including from patients.

This becomes more challenging as smartphone apps asking for medical information become more popular, and consumers may click through agreement pages without of course reading the fine print.

Google for example has built its portfolio primarily on a 15-to-35-year-old consumer market, and now wants to consider targeting an older demographic. “Now they just want to go out to the retirement communities and start collecting data from residents to figure out how they can pitch their product to that demographic,” stated Camille Nebeker, an associate professor at the UC San Diego medical school, in a recent account in Bloomberg Law.

But how the tech companies have historically collected information and what data researchers need for studies, can be disconnected. Nebeker has studied data from AliveCor’s Kardia device that detects abnormal heartbeats, to improve the health of aging patients. The data was collected in a way that meets the requirements for studying human subjects, known as the Common Rule 45 C.F.R. 46).

Large datasets are needed to train machine learning models. Getting clearance to use the data in a research center can be complicated. “The concern is that the data are being used without the originators of the content agreeing to the use,” stated Susan Gregurick, associate director for data science and director of the Office of Data Science Strategy at the National Institutes of Health, to Bloomberg.

Beware of Unexpected Risks in Data Science/AI Research

An exploratory workshop on Privacy and Health Research in a Data-Driven World was recently held by the Office for Human Research Protections (OHRP), a unit of the federal agency HHS. Dr. Jerry Menikoff, director of the OHRP, presented on “Unexpected Forms of Risk in Data Science/Artificial Intelligence Research.”

He described the experience between Cambridge Analytica and Facebook, in which data originated from Facebook users who thought they were taking a personality quiz, and wound up entered into a database along with data on all their Facebook friends, that was sold to political campaigns in efforts to influence voters. “ No academic research was ever published as a result of this research,” Dr. Menikoff noted.

The experience prompted Dr. Menikoff to produce a list of “hallmarks of a research ethics scandal,” things for practitioners of ethical research to watch out for:

  • Metrics jumping between domains, e.g., psychiatry to social media profiles to electoral data,
  • Research that is exempt under Common Rule for narrow technical reasons,
  • Blurred lines between academic and commercial research,
  • Use of Application Program Interface (API) tools intended for commercial and advertising purposes to gather data for academic research,
  • Abuse of mTurk workers (workers accessed through an Amazon crowdsourcing mechanism), ● Deceptive/opaque recruiting tactics for human subjects – a strong signal of unethical research,
  • Predictive population models as research output become tools for intervention in individual lives, and
  • Downstream effects nearly impossible to imagine because the models are highly portable and far more valuable than the actual data.

Working Group on AI Seeks to Bridge Computer Science and Biomedical Research

The NIH of HHS has a working group on AI charged to build a bridge between the computer science and biomedical communities; to generate training that combines the two subjects for research; to understand career paths in the new AI economy may look different; to identify the major ethical considerations, and to make suggestions. Their AI Working Group Update was issued in December 2019.

Among the group’s recommendations: support flagship data generation efforts; publish criteria for ML-friendly datasets; design and apply “datasheets” and “model cards” for biomedical ML; develop and publish consent and data access standards for biomedical ML; and publish ethical principles for the use of ML in biomedicine.

The direction in data collection for AI research is away from scandal and towards best practices.

Read the source articles in  Bloomberg Law, information on Common Rule 45 C.F.R. 46), the account in Privacy and Health Research in a Data-Driven World and the AI Working Group Update from the NIH unit.