Semantic methods improving immediate and long-term value for tranSMART

Berkeley, CA, Boston, MA, Amsterdam, Netherlands- October 21, 2015

Robert Stanley, CEO of IO Informatics, Inc. ("IOI",, presented Adding a semantic layer to ETL for data normalization, automation and cross-instance alignment at the tranSMART Annual Meeting hosted in Amsterdam at the Netherlands Cancer Institute (www.transmartfoundation.organization/2015-annual-meeting).

"We are happy to have been asked to present our work on utilizing our SentientT semantic data harmonization platform in order to address challenges and deliver on opportunities related to data ETL ( Extract, Transform and Load of data) for immediate and long-term interoperability within the tranSMART system," stated Mr. Stanley. "Using tranSMART as a common translational database schema suggests an improved ability for different groups to collaborate around shared translational research data. Realizing this promise is not quite so trivial. Even with traditional curation and ETL of data into tranSMART, the promise of collaboration across tranSMART instances is not guaranteed. Traditional curation methods address many of the issues but, even with normalized data in tranSMART, collaboration across tranSMART instances will not be available "out of the box".

For example, if data within one tranSMART instance is normalized to a different set of standards relative to data within another instance, the promise of collaborative integration across groups is still not realized. This is a risk for longer-term inefficiency and disillusionment."

The solution created by IOI applies agile semantic software and data modelling methods for getting data normalized, connected, into tranSMART and ready for collaboration:

  • Getting data in - automated ETL functionality with provenance, alerting, indication of preferred and alternate labels and other functionality makes it easier to get data into tranSMART reproducibly and in an automated manner
  • Making sure it is curated and harmonized - purpose-built data modelling, normalization and integration methods provide for curation rules that grow as new data are added to any system. Semantic technologies are designed to apply inference, ontologies and dictionaries to make it easier to get data harmonized (e.g., mapped to common terms and synonyms) and integrated (e.g. meaningfully connected) within tranSMART, to support precise and accurate queries.
  • Agility for aligning or adjudicating between different formal data standards. If data are normalized under different standards and nomenclature, semantic technologies are designed to rapidly re-align them for collaboration. This makes it possible to more quickly harmonize data from two different instances of tranSMART, to realize the longer term goals of efficient collaboration and addition of new data to the system.

The IOI team, under the direction of Jason Eshleman, Ph.D. (Director of Informatics) have addressed each of the challenges above for the tranSMART system as part of award-winning research teams, demonstrating how IOI Sentient has aligned data from different sources and applied automated ETL to get clean, useful data into tranSMART and prepared for immediate research and longer term collaboration

IOI participated in the tranSMART Cross Neurodegenerative Disease Datathon ( this past summer in Boston.

