Maneesh Goyal, COO of Mayo Clinic Platform, believes strongly in patient privacy, but not in the way it is commonly represented in the world of healthcare: de-identified data, according to the HIPAA Safe Harbor method.
“Many organizations will take patient data and de-identify it, and once it’s de-identified, it’s no longer considered HIPAA data,” Goyal said in a recent interview. “We think this is interesting, but not enough to protect patient data because, especially when you have more and more computing, you can really figure it out.”
In a recent interview, he explained the approach the Mayo Clinic Platform takes to protecting privacy in the broader context of its Orchestrate platform. It is a data platform through which biopharmaceutical and medical technology companies can leverage valuable data from the Mayo Clinic Platform and combine it with high-quality research and core laboratory expertise, thereby accelerating their own drug discovery and driving clinical development programs. On February 11, the Rochester, Minnesota-based health system announced that the Orchestrate platform will now allow researchers to access standardized real-world cancer data from Mayo Clinic and participants Mayo Clinic Platform Connection partners.
So how does Mayo Clinic ensure patient privacy, especially given that this data is made available to outside users such as pharmaceutical and medical technology companies? And why is it important to do it this way?
“The way we’ve approached de-identification is not just to remove all the things that would be identifiers, but to actually change them,” he said, giving an example from his own medical history. “So our tools go in there and replace it with a fictitious person, but they leave the clinical notes there. And then we do a date change, a random date change of the entire clinical record. So God forbid I was in a car accident on one date, and it’s public information, now you move it away from that date. And then I’m no longer identifiable.”
Mayo Clinic has about 100 petabytes of structured and unstructured EHR data, and about 28 petabytes have been anonymized, Goyal said. Unstructured data in clinical notes is important because it explains the provider’s rationale for, for example, a diagnosis or other decision making. All that anonymized data is stored in a “cloud container.”
“And then we create a container and that data never leaves that container, and that has now stood the test of the US regulatory system,” Goyal explained, adding that it has also qualified in foreign regulatory environments. “So when we provide access, we do it in a sandbox that’s in our controlled environment. No individual patient record is visible. We verify everything that comes out of the system. So no data leaves our control.”
This is known as a clean room environment, Goyal said. Another popular term for a data access process that preserves patient privacy is called “federated learning,” and at Mayo Clinic it applies to health system partners that have joined the Mayo Clinic Platform, such as Hospital Israelita Albert Einstein in Brazil.
“Federated learning is basically sending the question to all these different data sets and then getting the aggregated answer. But each of those environments has to support this closed container, and no one has access to the central area where all the information is located,” Goyal said.
This allows pharmaceutical companies to run computational jobs or train AI models or simply seek a greater understanding of the target disease. For example, pharmaceutical companies can ask questions like “find the course of disease Or separately, ask questions like “how did this medication work in diabetics versus non-diabetics?”
Other actions are also possible and this goes to the heart of the money wasted in clinical development. Clinical trials must be repeatable and, in the past, they had to be performed to know if they were repeatable. In many cases, they failed or were not repeatable for various reasons, such as incorrect sample size or flawed trial design. Pharmaceutical companies would find out about this only later, after time, effort and money had already been invested.
Now, with Mayo Clinic Orchestrate, pharmaceutical companies can create synthetic versions of clinical trials to see if the results are repeatable, for example, in a much larger patient population.
“So one way our pharmaceutical partners are using this is to validate their trial hypothesis,” he said. “Our approach is to use real data from real people and get as much data as possible into a single repository so we can do a synthetic trial with real data. You can actually say, ‘Is this going to work? Do we have enough patients in a large non-patient population to do the trial the way I’ve envisioned?'”
But it’s not just about querying data, training an AI model, or validating a hypothesis. Goyal explained that Orchestrate is about bringing together a fragmented R&D process into a single comprehensive platform. For example, if a pharmaceutical company wants to conduct a trial on inflammatory bowel disease and goes to Mayo Clinic to recruit patients, then the process with Orchestrate would go this way.
“So they identify a set of patients. We can do this with our de-identified data. We bring in an IBD specialist from Mayo Clinic, develop a cohort of patients, then do an IRB and quickly recruit them to do additional tissue sample collection,” Goyal said. “The power of this now is to take that tissue sample within our own infrastructure, do all the profiling, that is, genetic pathology, proteomic, epigenetic, profiling of that against the patient’s longitudinal data to put it back into the clinical record in a de-identified form, and then hand it over to our pharmaceutical partners and say, now it’s your playing field to invent and identify the targets that are going to be important for your condition.”
Access to the Orchestra’s program is by subscription, he stated.
Photo: Claudio Ventrella, Getty Images
