Blogs.csc.com

IDMP: Inside Documents Majority of Pieces

2015-08-11

At this year’s DIA e-Regulatory and Intelligence conference, one message was delivered over and over again about IDMP: Only 20-40% of the information that will go into IDMP will be found in existing structured databases. Where is the rest? In unstructured sources, mainly documents used as part of the marketing application, but over time labeling changes, manufacturing records and quality systems will contribute a substantial piece of the puzzle.

For structured data, integration of disparate databases has a pretty well-documented cost: either “ETL” (Extract, Transfer, and Load) or full direct data exchange. The former is generally cheaper, but has to be repeated every time data changes. The latter is often more difficult; if information isn’t in the same format, the integration layers need to do a lot of translating, but pays off in the long run, assuming the volume of translated data is high enough. If you only have a couple products in a small number of markets, the price of repeated transfers or manual updates may be justified.

The unstructured documents are even more challenging. For instance, a label may specify an expected adverse experience for a drug, such as headache, nausea or photosensitivity. One word, such as “photosensitivity” in a sentence of a document, could become multiple MedDRA terms in the Clinical Particulars of the IDMP information tree. If later safety information lets you change that (congratulations, your drug doesn’t have the same adverse effects of others in its class!), the fact that “photosensistivity” is missing from the document is very hard to detect, and even harder to know if it removes some or all of the MedDRA terms.

The obvious (cough cough) solution is Structured Authoring! Use XML to construct your documents, where all the important facts are tagged with appropriate markers, and every bit of information in them can be queried. That will merely require that every kind of document listed in the chart above has a standard XML-based structure created for it, that authors are trained in the use of structured authoring tools, and that you have developed a time machine so that 20-year-old market authorizations will have already been translated into that format.

Putting aside the cynicism (really, if we don’t start now, we’ll still be in the same boat in 20 years), it isn’t necessary that every document have every fact tagged… just the pieces needed to describe the products, organizations, and substances related to the marketed and investigational products that go into IDMP.

The less-obvious solution is Big Data computing: parse through everything and identify likely sources for all the information, which then gets mapped into the IDMP structure. CSC’s experts are exploring how best to do this: it will take some time to mature, and will certainly be less accurate than mapping everything in XML documents, but will be much less labor intensive, and will be useful on the existing back file.

For now, though, we need a lower-tech way to do this – like the California Gold Rush’s forty-niners – while we prepare the industrial mining version of Big Data technology to extract every last nugget of information. We have the tools today, but using them together is still not straightforward:

Prospecting: Documents in an indexed repository can be searched for critical terms, or browsed down trees of documents if there is a well-organized repository, such as the marketing and clinical trial applications. CSC’s FirstDoc provides tools for document search.

Staking Claims: Important information must be marked so that it can be found and extracted. Annotation tools such as OpenAnnotate can be used to highlight relevant text, and a standard way of saying things like “This is the Nosuchacin 100mg Tablet Labeler Organization Name” or “This is the package description for the 30-pill bottle for Nosuchacin 100mg Tablet” entered into the annotation box. Another option would be to add information to the Comments fields in the FirstDoc repository, e.g., “Page 30 has the Nosuchacin 100mg Tablet Labeler Organization Name: Ted’s Imports.” Custom annotation tools under consideration would help browse through the RIM database and identify exactly where the data should go. The key here is that documents contributing to your IDMP knowledge base should have fences around them saying “Changing this document will affect the registered product information!”. This will ensure that a change to a document gets propagated into the IDMP records.

Extracting Nuggets: At this point in time, copying and pasting text is about all there is, but the same custom tools that identify where the data should go are expected to be able to populate that data too (although, as noted above, translation and expansion of data may still be needed).

Registering and Protecting Claims: Unlike the forty-niners, we don’t have to worry about bandits claiming to be Federales stealing our information out from under us, but we do need to be able to find our way back. The search tools in FirstDoc will help find comments with the right terms, and tools can be used to search for annotations in the documents themselves. On top of that, your Regulatory Information Management (RIM) system such as TRS Tracker can be used to point at each document related to the records associated with the products. Tracker makes it quite simple to add a link to a document in a repository to any object in the system.

With a combination of locating documents, identifying relevant text, and ensuring it can be found and updated, the rough territories of your unstructured data can be turned into productive sources of product knowledge. Let our consultants help build your IDMP solution.