Medical Content Management Concepts And A Roadmap To Support Clinical Decision Support Software In A Global Marketplace

By Charles Alcorn – Head of Data Science at Roche Molecular Systems

This post is adapted from a talk by Charles Alcorn – Head of Data Science at Roche Molecular Systems and describes medical content management concepts for the field of Oncology.



Roche is a pharmaceutical and diagnostics company that develops and produces pharmaceuticals and diagnostic services, many of which are related to cancer. Several years ago, Roche decided to launch a division that was responsible for developing clinical decision support software. This is software that is used at the point of care in the delivery of care to patients. What you find if you go into the typical oncology office is that there's a lot of data but the data is not very well organized. so often that data is fragmented. It's not readily available and there's an opportunity really in that space to improve the integration and use of that data.

The other thing that's happened with oncology over the past several years is that there's been a proliferation of new therapies and new tests and new opportunities for patients. Now we can identify genetic markers that are predictive of whether or not a patient's going to respond to a specific therapy. So this is one field of medicine where the amount of information has increased remarkably over time. Compared to the past (when in most cases the diagnosis of cancer meant a bad outcome) there's an opportunity now to manage patients, prolong their life, and to manage them based on their unique characteristics. 

As a consequence of that though there's a lot of information that has to be managed. As data scientists, we do this already – we create reference tables, we map data, we look for benchmark information, we do that informally as part of our data science analysis and development work. But when it comes to supporting applications, that process has to be more formalized because these applications are released on a regular basis and doctors are waiting for them.

Another issue or challenge that we've had at Roche is that we do business basically in all the countries in the world. You can imagine the treatment practices in Germany are going to be very different than those in the United States or in Saudi Arabia. So another thing that we need to incorporate is software that reflects the local marketplace – so when a doctor logs on to the software it's not giving them recommendations for the United States. it's giving them recommendations for their specific market. I'm going to give you some examples of that during the course of this talk.

First, let’s look at some background information on medical content management:  first of all, it extracts existing knowledge so we're not manipulating or changing the knowledge. This is data that's in published literature (e.g., clinical trial XYZ said that this drug in combination with this drug was more effective than an alternative scenario). We heard yesterday from Bloomberg there were some examples of them extracting information from financial reports – essentially we're doing the same thing. We’re pulling from the medical literature and published information. So that's one part of medical content management.

The second part is that we're building a database of knowledge. One of the things I mentioned in my abstract was that if an oncologist had to read all the literature that comes out every day or every week they would basically be spending all their time reading the literature. So we're taking that knowledge and  information and putting it in a common data set that can be easily accessed. That’s a key part of our goal.

The third item here is that the software is driven by product design and functionality across all of our products. We have a mutation profiler application that reports mutation information and therapies and what the response was. Physicians can access that information and can potentially identify a unique combination of drugs for their particular patient. We have a tumor board application this integrates all the data from the different sources. I showed it earlier – imaging, genetic test results, past therapy results, cancer stage, patient characteristics. It brings that data all in together and applies clinical or treatment guidelines to manage that patient.

The last two items here is that this is becoming a regulated space. We heard from the finance industry yesterday about the increase in regulations. The Food and Drug Administration is starting to regulate clinical software – they want to know what your processes are for managing information and using it. This is because they want to make sure that you know the information is reliable, accurate, and not biased in any way. So the process that's used for medical content management is important for that purpose.

Let me give some examples. Firstly, here in the middle we have medical content management, medical information, and medical knowledge. We're actually looking for the medical knowledge here – we know there's a lot of genetic tests available, we know a lot of information has been generated, but the real question is what's the relationship between those genetic tests and patient treatment and outcomes. So we're really focused in on the knowledge part and that's what we're actually extracting and putting in our database. 

I gave some examples here on on the left: updates to guidelines or treatment pathways, drug safety warnings, practice specific guidelines modifications, reimbursement and payment, extended indications (you may hear in the news on occasion drug XYZ that we used for lung cancer a year ago has now been approved to treat ovarian cancer – that’s an extended

indication), and extended treatment pathways (different patients that would benefit from that drug). These are some of the triggers that we're looking at in terms of managing our system. 

In terms of our architecture and design, we have all of our applications sitting here: oncology workflow, clinical trial matching guidelines. This is our back-end part of our architecture. We have a lot of manual data curation, which is actually extracting the data from the electronic health record. What you find with electronic health records is that often physicians put in notes but they don't use standardized fields. Part of our data enrichment is actually going on at the patient level and extracting that data and putting it in a structured environment to support these various applications. Sitting down here is medical content management, and in many ways those are the rules and the dimensions are the aspects of these applications. I'll give you an example in a moment as to how that would work.


Get insights like this in person at our next Data Science Salon: Applying Machine Learning & AI to Finance, Healthcare, & Technology, February 18-19 in Austin, TX.

Learn more


We have a team that’s dedicated to daily monitoring. Let’s say new biomarkers have been developed and approved. Biomarkers are used to test a patient for a genetic mutation. In this case, let’s say that genetic mutation is related to therapy response. If it was approved on Monday, we can look at that as a trigger or an event. Over on this side here we have our terminology service, VLP taxonomies, data curation requirements, our common data model, and finally our apps. We want that event to flow through and impact these various areas. If there's a new code associated with that biomarker we want to add that to our reference table so that we can capture that information moving forward. If there's a data curation requirement, we want to add that to our list. We're going to look for that biomarker from now on in the EHR. If you have that biomarker available you want to report that to the physician so you have to have a structured field to show that that test was performed, here's the result, and potentially here's the indicated treatment for that particular biomarker. So that's how an event flows through our system.

I mentioned earlier the global variety in health care. So there are many treatment guidelines (pathways for treatment – if X do Y, if Z do ABC). They’re not a recipe or a cookbook but they're a guideline for a physician that they can use based upon different characteristics. These protocols have been developed over time to make healthcare more consistent with less variability. But these are the guidelines for different parts of the world. This is the US; this is the UK; this is Europe. You can see the format of these varies substantially. We extract this information using NLP tools so that when a doctor logs on in the US to use our guidelines app they're seeing the US guidelines. Similarly in the UK we’re ingesting and incorporating their guidelines. These are detailed guidelines that in the US were available in a PDF file. It was about 200 pages long and none of the physicians ever went to look at it because the average time that a physician has to treat patient is about 10 or 15 minutes. So we took all of this text-based information and put it into a decision tree with different components that they can click and point to on a desktop. It's a much better way for them to to navigate the information.

The frequency of updates is another factor. In the U.S., the National Cancer Institute updates its guidelines every six months. Europe updates periodically, while Germany and the UK see less frequent updates. So another aspect of medical content management is how frequently the information is updated and when it's available.

Another issue I mentioned earlier is safety warnings – we don't want to make recommendations to physicians if a safety warning was issued last week by the FDA. We're looking and extracting this information as it becomes available so that if a physician picks a drug that safety information is reflected in a real-time basis for that particular provider or that particular drug. The reality is that after drugs have been approved there can be many more safety warnings that occur during the life of that drug.

Payment is another factor. Payment reimbursement varies by country. This is another piece of information that we have to include – we can't recommend a drug if it's off the payment schedule for that particular country.

Lastly I mentioned drug extensions. With cancer, those have been occurring quite frequently. A drug might have been originally developed for let's say lung cancer and then got extended use in ovarian and breast cancer as new clinical trial data becomes available about the drug’s efficacy. This is another factor that we incorporate into our medical content management.

I want to talk a little bit about our process. We've developed curation tools to extract the data. Part of our curation processes is NLP, part of it is manual. We have experts reading the literature and capturing that data. A big issue there is quality control – we don't want to have incomplete or inaccurate information presented so we have a check processes to verify it.



Charles Alcorn - Head of Data Science at Roche Molecular Systems, speaking at Data Science Salon Miami 2019.


We also have unstructured content sources. Those are some of the literature and various other bodies. We also have public and commercial sources. These are all extracted using a combination of automatic and manual processes to build our database that gets updated weekly. 

So these are our content sources: regulators, guidelines, literature, reimbursement. We thought about electronic health records as a source so we could delegate some of this responsibility. The reality is electronic health records are very busy with pre-admission, getting claims – it's more of an operational function and less of a knowledge application. They're a potential client for our knowledge database but they're mainly involved with day to day operations and a practice. 

And here are our clients and applications. We're also considering developing an authoring tool so that the actual practice could go in and make modifications specific to their environments. If practice XYZ has a special formulary for this drug they would be able to go in and make it higher priority compared to the other ones that are available.

We organized our process around people, process, and technology. We thought about what technology we need, what are the processes, and who are the people. For this we have medical data science, data curation, and engineering – that forms the entire medical content management team. We have products sitting outside of it too. These are people who are product managers for the software. They're mainly responsible for developing use cases. You can see the different qualities or areas of expertise that were looking for for the team members in this group. In terms of data science processes, we define metrics, we maintain taxonomies, and we engage in quality monitoring for the data. The manual data curators that I mentioned go here – they have to have expert knowledge, they capture content, and they provide updates. This sort of roadmap made it pretty easy. Roche is a huge company and there's a lot of players when people see this they sort of get on board and get behind it.

And finally, let’s talk about technology. What’s most important here is for engineering data (or software to capture data) to take text and disaggregate it and turn it into fields – that’s our harvester software. These are the applications that dissect those those guidelines. We also have authoring software where we delegate some of the responsibility back to the practice or the physician. 

So to summarize Medical Content Management: updates must be processed with priority, the content and knowledge management is iterative, the updates require specialist knowledge, and a single update may trigger several updates downstream.


Curious for more?

Don’t miss the next Data Science Salon in Austin, February 18-19, 2020.

Learn more


Get the latest data science news and resources every Friday right to your inbox!

Sign up for our newsletter