The novel digitization of healthcare data and the compelling opportunity for technology

The U.S. healthcare industry is undergoing a rapid technological transformation underpinned by the recent and novel digitization of healthcare data. Historically, the healthcare industry has operated in an offline fashion (e.g., paper records, phone calls, faxes, etc.), and the utilization of digitized healthcare data was largely limited to insurance claims. Today, healthcare generates ~30% of the world’s data and its data volume is projected to grow faster than that of any other industry.[1] At the same time, technology adoption within healthcare remains low; as evidence, consider that the financial services industry spends ~3x as much on software per year relative to the healthcare industry, despite being only one-third of the size.[2]

In our view, the proliferation of healthcare data is principally the result of three factors:

1. Ubiquitous adoption of electronic medical record (EMR) platforms

2. Growing availability of genomic data

3. Increased use of wearable healthcare devices 

The first followed the 2009 HITECH Act, which provided incentives for healthcare providers to purchase EMR technology platforms; accordingly, as of 2019, ~96% of healthcare providers had adopted an EMR, up from only ~8% in 2008.[3] 

The second is a derivative of the dramatic reduction in genomic sequencing costs over the past two decades – from ~$95M per genome in the early 2000s to only ~$500 today.[4] The resultant proliferation of genomic data has had a transformative impact on the medical field and led to a myriad of advances in treatment and MedTech, particularly around understanding how an individual’s DNA contributes to varying health and disease outcomes. 

Finally, the growing use of wearable healthcare devices (e.g., smart watches, blood glucose meters, etc.) has resulted in troves of real-time, patient-generated data that is increasingly being used for real-time patient monitoring and intervention applications, as well as in clinical trials to both expand patient accessibility and improve data capture. 

Until recently, all three of the aforementioned types of healthcare data existed either in an offline format (e.g., paper records) or essentially not at all (e.g., genomic and wearables data). Note that there are several other types of healthcare data consistent with this trend, including, but not limited to, lab, medical imaging, and social determinants of health (SDOH) – all of which are important and have their own idiosyncrasies. In our view, however, drivers #1 – 3 outlined above are the three most notable and encompass the broadest array of healthcare data; accordingly, this piece focuses principally on those three.

As a derivative of unsustainable growth in U.S. healthcare expenditures, coupled with a growing need to improve health outcomes, the healthcare industry has reached a profound inflection point. Against this backdrop, we have strong conviction that numerous category-defining, franchise technology companies will be built that utilize healthcare data to address the industry’s most ambitious problem statements and pain points, including increasing drug discovery and development productivity, improving diagnostic quality and care coordination, driving operational efficiencies, and improving the overall patient experience – all vectors which also improve patient outcomes. In our view, these technology platforms have the opportunity to drive an enormously compelling ROI for industry stakeholders across a myriad of use cases and applications.

Having said all of that, there are several foundational considerations that render healthcare data uniquely difficult to utilize. While the complete list is rather long, here are some of the more notable roadblocks:

1. Healthcare data exists in silos generally organized by data type (e.g., clinical records, insurance claims, genomic, lab, imaging, pharmacy, etc.)

2. Custodians of one type of data are unlikely to be willing to share it with other industry stakeholders (e.g., a provider with clinical data vs. an insurer with claims)

3. There is no ubiquitously utilized enterprise master patient index (EMPI) that can be used to pair datasets at the patient level; single data sources by themselves present an incomplete picture

4. HIPAA compliance and other regulatory considerations heavily restrict data access, sharing, and utilization rights

5. Different data formats and connectivity standards introduce added complexity and friction in terms of data sharing (though some recent industry initiatives are helping)

6. ~80% of healthcare data is unstructured (e.g., free-text notes, images, etc.) rendering it difficult, if not impossible, to use in its current form[5]

In our view, these challenges, coupled with the growing volume and diversity of healthcare data sources, present a unique opportunity for technology companies to deliver significant value to the healthcare industry. We sub-segment the technology companies that benefit from this theme into four buckets, including: 

1. Infrastructure and enabling technologies

2. Data analytics

3. AI / ML to drive decision-making

4. AI- / ML-enabled automation

Note that #1 and #2 are not mutually exclusive, while labeled and annotated training data are prerequisites for #3 and #4. Below we’ve shared a bit more about each category, as well as some representative vendors that fit into each.

1. Infrastructure and enabling technologies – Help connect, normalize, curate, and manage data across disparate sources and formats; examples include 1upHealth, Datavant, Health Gorilla, HiLabs, Innovaccer, Lifebit, Mendel, Ribbon, TetraScience, Tripleblind, Science.io, and Veda Data Solutions

2. Data analytics – Packaged, self-service analyses via an application layer and / or curated data delivered via an API; examples include Kipu*, Komodo Health, H1 Insights, OM1, and Truveta

3. AI / ML to drive decision-making – Use labeled / annotated data to train AI and ML models that help end users make better informed, more efficient decisions; examples include Aidoc*, BenchSci*, Deep 6 AI, Diagnostic Robotics, Iterative Scopes, Paige.AI, and Unlearn

4. AI- / ML-enabled automation – Use labeled / annotated data to train AI and ML models that automate business processes and workflows; examples include Syllable*, Abridge, AKASA, DeepScribe, Memora Health, Notable Health, and Robin Healthcare

* TCV portfolio companies

We further believe that technology companies across all four categories have an opportunity to differentiate and establish competitive moats along the four dimensions outlined below. To be clear, compelling technology platforms need not check all four boxes – some may only check one of them.

1. Unique access to healthcare data – This can be a derivative of business model (e.g., open / network-based system), via barter or give-to-get relationships, long-term data sharing partnerships, and / or customers contributing data, among other levers

2. IP that integrates, curates, and prepares the data for downstream use cases – This may take the form of technology tooling and / or organizational know-how (e.g., the process for cleansing the data)

3. Functionality that applies healthcare-specific contextualization – This often involves both platform functionality as well as clinically / scientifically trained personnel in order to ensure effective platform utilization by the end user

4. Software applications that deliver value in the context of specific business use cases and workflows

In closing, the U.S. healthcare industry is perhaps the last major industry to undergo digitization; it is also one of the largest. Against a rapidly growing volume and diversity of healthcare data, coupled with challenges and complexities associated with its use, we believe there is an extraordinary opportunity for technology to play a leading role in audaciously unlocking and delivering value across multiple sub-segments, functions, and applications in healthcare. Accordingly, we at TCV are incredibly excited to continue to partner with companies boldly seeking to utilize healthcare data in order to fundamentally transform both the development of novel medicines and provision of patient care, and, ultimately, to improve patient outcomes.


1 Source: RBC.

2 Source: Gartner. Size of industry measured in terms of contribution to U.S. GDP.

3 Source: HealthIT.gov.

4 Source: NHGRI.

5 Source: NCBI.

*** 

The views and opinions expressed are those of the speakers and do not necessarily reflect those of TCMI, Inc. or its affiliates (“TCV”). TCV has not verified the accuracy of any statements by the speakers and disclaims any responsibility therefor. This blog post is not an offer to sell or the solicitation of an offer to purchase an interest in any private fund managed or sponsored by TCV or any of the securities of any company discussed. The TCV portfolio companies identified, if any, are not necessarily representative of all TCV investments, and no assumption should be made that the investments identified were or will be profitable. For a complete list of TCV investments, please visit www.tcv.com/all-companies/. For additional important disclaimers regarding this blog post, please see “Informational Purposes Only” in the Terms of Use for TCV’s website, available at http://www.tcv.com/terms-of-use/


Machine learning observability built for practitioners: Our investment in Arize

We’re still in the early innings of AI adoption, yet we’ve already seen it transform industries. Companies like Netflix, Spotify, and Uber have scaled internal teams from a handful of data scientists and machine learning (ML) engineers to hundreds. These teams, and the models they built to inform business and product decision-making, have shaped how consumers watch television, listen to music, and hail rides.

As AI use cases abound, ML teams face growing challenges around how to build, deploy, and maintain their models. Practitioners demand the flexibility to optimize each component of a model and, like software development, use specific tools to address the various phases of the model development lifecycle. In an attempt to bridge the development gap in ML, a new framework called MLOps has emerged. MLOps is derived from core DevOps principles and represents one of the fastest growing markets in technology today. 

To date, MLOps has largely centered on data preparation, model training, and model deployment; however, building and deploying models is only the beginning of the journey. What happens once a model is running in production? The reality is most companies still lack the necessary tools to scalably monitor and understand their live models. Moreover, as these models become more complex, troubleshooting issues gets harder and both upstream and downstream problems compound. 

In order for AI to achieve long-term sustainability, companies must improve model transparency, understanding, and performance. Enter Arize, a ML observability platform built by practitioners to help unpack the proverbial AI black box and optimize the performance of models in production. 

Shaping the future of AI infrastructure

As the various components of the model lifecycle standardize, many companies have started to orient away from expensive in-house builds and less flexible integrated platforms. Instead, practitioners are opting for best-of-breed MLOps solutions from focused vendors like Arize to gain additional control over their modeling workflow.

Arize sets itself apart as one of the few platforms that sits at the center of production ML. Within MLOps, there has been a flood of investment in tools addressing data preparation all the way through to the model deployment stage, but few tackle the reality that all live models face over time: degradation. Without real-time monitoring and observability, ML teams spend countless hours pouring over anomalies and trying to understand problems in the data, software, and / or model itself. Arize’s observability platform seamlessly plugs into any MLOps stack and provides a scalable solution to monitor the performance of models, explain what the models are trying to do, and diagnose data and drift issues without going back to square one.

In speaking to Arize’s customers, which include many of the world’s most sophisticated ML teams, it’s clear observability is seen as a core pillar of AI infrastructure and represents a natural progression in how they think about model lifecycle management. There’s a reason adjacent observability solutions like Datadog and Monte Carlo exist in other areas of IT, and we believe ML will be no different. 

Built by practitioners

Arize’s founders, Jason Lopatecki and Aparna Dhinakaran, first met at TubeMogul, where Jason was a founder who helped build out the company’s ML team and Aparna was a data scientist. Jason would eventually guide TubeMogul through a successful IPO and sale to Adobe while Aparna went to work for Uber as part of its famed Michelangelo team. 

Jason and Aparna stand out in a MLOps space where many founders hail from academia. Both draw from deep practitioner roots and have experienced firsthand the heartache of spending months building and training models, deploying them to production, and having no insight into how the models actually performed once deployed. Independently, they came to the conclusion that something was fundamentally missing in the MLOps toolchain. Together, they are now focused on bringing transparency, understanding, and performance to production ML through Arize’s dedicated observability platform. 

Our partnership with Arize

We’re thrilled to announce that TCV has led Arize’s $38M Series B alongside our friends at Battery Ventures, Foundation Capital, and Swift Ventures. 
At TCV, we gravitate towards founders that are culture and product obsessed. Jason and Aparna blend multi-stage company-building experience with firsthand knowledge of a real customer pain point. We’re incredibly excited to partner with the Arize team on their mission to make AI work and work for the people. If you are interested in joining Arize for the journey ahead, please visit their website to learn more about current career opportunities.