Great Expectations Case Study:
How Komodo Health uses GX to safeguard their data pipelines with quality checks and UAT verification
UAT testing on every scheduled run and checks on data ingestion and processing pipelines
About Komodo Health
Komodo Health is a healthcare technology startup that provides data-driven software applications that empower its customers to improve patient care and reduce the burden of disease. The company’s Healthcare Map delivers patient-level insights by dynamically analyzing the broadest array of data across patients, practitioners, and health systems. The engineering team at Komodo Health utilizes a range of data engineering tools, such as Spark, Snowflake, Airflow, Amazon AWS EMR, AWS Glue, etc.
The Challenge
Komodo Health offers their customers several software products built on top of the company's Healthcare Map, which streams over 15 million daily clinical encounters from myriad data sources containing relevant information on a patient's diagnosis, recent procedures and therapies. Pulse, one of Komodo’s software products built on top of the Healthcare Map, delivers precise and actionable alerts on clinical activity at the optimal phases of engagement with healthcare providers, helping life sciences companies communicate to clinicians at timely moments based on the patients currently in their care.
What’s both interesting and challenging about Pulse is that each customer cares about a different disease area, and each disease has its own clinical pathways and patient journey nuances, and thus is manifested by different clinical signals, which are often difficult to pinpoint within a massive source of clinical data. Thus, each alert type requires careful implementation and configuration, in close collaboration with customers and Komodo’s clinical innovations and medical teams. As the alerting solution is primarily utilized for finding clinicians treating patients with rare and oncological therapies, the accuracy of the data to create the alerts is absolutely crucial, as incorrect data might cause unnecessary and misleading alerts.
How Komodo Health Uses Great Expectations
Great Expectations for User Acceptance Testing
Komodo Health uses Great Expectations as part of their User Acceptance Testing (UAT) workflow in order to address data quality issues in the Pulse alerting pipelines. In addition to creating standard code unit tests during the pipeline development process, developers now use Jupyter notebooks to add data quality checks in the form of Expectation Suites. In this way, the pipeline output is validated on every scheduled run using those Expectation Suites, thereby verifying adherence to the UAT requirements in production and safeguarding the pipelines before sending their outputs to Komodo Health’s customers. This is helping Komodo to ensure their product can scale effectively as it adds more and more customers.
Enterprise Use of Great Expectations
Additionally, Great Expectations is also used on several other teams at Komodo Health to ensure data quality as it ingests and organizes data for multiple end user products. For example, Great Expectations is used to secure the upstream data ingestion and processing pipelines in order to pre-process the third party claims data in data batches and prevent data issues from propagating to the downstream pipelines. Komodo ingests tens of millions of claims per day and overall processes data on over 320 million US patients, so Great Expectations helps provide the confidence to manage data across different stages of processing.
Komodo Health worked closely with the Great Expectations core engineering team as part of a Superconductive Consulting Services partnership, which led to several major feature enhancements. In order to enable Great Expectations in Komodo’s ecosystem, the Great Expectations team developed a special build of Great Expectations to support jobs deployed to AWS Glue and AWS EMR cloud services PySpark.
Learn more about Komodo Health here
We would like to thank the team at Komodo Health for their support in creating this case study!
15 million
Clinical encounters per day
320 million
Total US patients
10s of millions
Claims per day
Components
- Spark
- Snowflake
- Airflow
- Amazon AWS EMR
- AWS Glue
- PySpark