The U.S. healthcare industry is undergoing a rapid technological transformation underpinned by the recent and novel digitization of healthcare data. Historically, the healthcare industry has operated in an offline fashion (e.g., paper records, phone calls, faxes, etc.), and the utilization of digitized healthcare data was largely limited to insurance claims. Today, healthcare generates ~30% of the world’s data and its data volume is projected to grow faster than that of any other industry.[1] At the same time, technology adoption within healthcare remains low; as evidence, consider that the financial services industry spends ~3x as much on software per year relative to the healthcare industry, despite being only one-third of the size.[2]

In our view, the proliferation of healthcare data is principally the result of three factors:

1. Ubiquitous adoption of electronic medical record (EMR) platforms

2. Growing availability of genomic data

3. Increased use of wearable healthcare devices 

The first followed the 2009 HITECH Act, which provided incentives for healthcare providers to purchase EMR technology platforms; accordingly, as of 2019, ~96% of healthcare providers had adopted an EMR, up from only ~8% in 2008.[3] 

The second is a derivative of the dramatic reduction in genomic sequencing costs over the past two decades – from ~$95M per genome in the early 2000s to only ~$500 today.[4] The resultant proliferation of genomic data has had a transformative impact on the medical field and led to a myriad of advances in treatment and MedTech, particularly around understanding how an individual’s DNA contributes to varying health and disease outcomes. 

Finally, the growing use of wearable healthcare devices (e.g., smart watches, blood glucose meters, etc.) has resulted in troves of real-time, patient-generated data that is increasingly being used for real-time patient monitoring and intervention applications, as well as in clinical trials to both expand patient accessibility and improve data capture. 

Until recently, all three of the aforementioned types of healthcare data existed either in an offline format (e.g., paper records) or essentially not at all (e.g., genomic and wearables data). Note that there are several other types of healthcare data consistent with this trend, including, but not limited to, lab, medical imaging, and social determinants of health (SDOH) – all of which are important and have their own idiosyncrasies. In our view, however, drivers #1 – 3 outlined above are the three most notable and encompass the broadest array of healthcare data; accordingly, this piece focuses principally on those three.

As a derivative of unsustainable growth in U.S. healthcare expenditures, coupled with a growing need to improve health outcomes, the healthcare industry has reached a profound inflection point. Against this backdrop, we have strong conviction that numerous category-defining, franchise technology companies will be built that utilize healthcare data to address the industry’s most ambitious problem statements and pain points, including increasing drug discovery and development productivity, improving diagnostic quality and care coordination, driving operational efficiencies, and improving the overall patient experience – all vectors which also improve patient outcomes. In our view, these technology platforms have the opportunity to drive an enormously compelling ROI for industry stakeholders across a myriad of use cases and applications.

Having said all of that, there are several foundational considerations that render healthcare data uniquely difficult to utilize. While the complete list is rather long, here are some of the more notable roadblocks:

1. Healthcare data exists in silos generally organized by data type (e.g., clinical records, insurance claims, genomic, lab, imaging, pharmacy, etc.)

2. Custodians of one type of data are unlikely to be willing to share it with other industry stakeholders (e.g., a provider with clinical data vs. an insurer with claims)

3. There is no ubiquitously utilized enterprise master patient index (EMPI) that can be used to pair datasets at the patient level; single data sources by themselves present an incomplete picture

4. HIPAA compliance and other regulatory considerations heavily restrict data access, sharing, and utilization rights

5. Different data formats and connectivity standards introduce added complexity and friction in terms of data sharing (though some recent industry initiatives are helping)

6. ~80% of healthcare data is unstructured (e.g., free-text notes, images, etc.) rendering it difficult, if not impossible, to use in its current form[5]

In our view, these challenges, coupled with the growing volume and diversity of healthcare data sources, present a unique opportunity for technology companies to deliver significant value to the healthcare industry. We sub-segment the technology companies that benefit from this theme into four buckets, including: 

1. Infrastructure and enabling technologies

2. Data analytics

3. AI / ML to drive decision-making

4. AI- / ML-enabled automation

Note that #1 and #2 are not mutually exclusive, while labeled and annotated training data are prerequisites for #3 and #4. Below we’ve shared a bit more about each category, as well as some representative vendors that fit into each.

1. Infrastructure and enabling technologies – Help connect, normalize, curate, and manage data across disparate sources and formats; examples include 1upHealth, Datavant, Health Gorilla, HiLabs, Innovaccer, Lifebit, Mendel, Ribbon, TetraScience, Tripleblind,, and Veda Data Solutions

2. Data analytics – Packaged, self-service analyses via an application layer and / or curated data delivered via an API; examples include Kipu*, Komodo Health, H1 Insights, OM1, and Truveta

3. AI / ML to drive decision-making – Use labeled / annotated data to train AI and ML models that help end users make better informed, more efficient decisions; examples include Aidoc*, BenchSci*, Deep 6 AI, Diagnostic Robotics, Iterative Scopes, Paige.AI, and Unlearn

4. AI- / ML-enabled automation – Use labeled / annotated data to train AI and ML models that automate business processes and workflows; examples include Syllable*, Abridge, AKASA, DeepScribe, Memora Health, Notable Health, and Robin Healthcare

* TCV portfolio companies

We further believe that technology companies across all four categories have an opportunity to differentiate and establish competitive moats along the four dimensions outlined below. To be clear, compelling technology platforms need not check all four boxes – some may only check one of them.

1. Unique access to healthcare data – This can be a derivative of business model (e.g., open / network-based system), via barter or give-to-get relationships, long-term data sharing partnerships, and / or customers contributing data, among other levers

2. IP that integrates, curates, and prepares the data for downstream use cases – This may take the form of technology tooling and / or organizational know-how (e.g., the process for cleansing the data)

3. Functionality that applies healthcare-specific contextualization – This often involves both platform functionality as well as clinically / scientifically trained personnel in order to ensure effective platform utilization by the end user

4. Software applications that deliver value in the context of specific business use cases and workflows

In closing, the U.S. healthcare industry is perhaps the last major industry to undergo digitization; it is also one of the largest. Against a rapidly growing volume and diversity of healthcare data, coupled with challenges and complexities associated with its use, we believe there is an extraordinary opportunity for technology to play a leading role in audaciously unlocking and delivering value across multiple sub-segments, functions, and applications in healthcare. Accordingly, we at TCV are incredibly excited to continue to partner with companies boldly seeking to utilize healthcare data in order to fundamentally transform both the development of novel medicines and provision of patient care, and, ultimately, to improve patient outcomes.

1 Source: RBC.

2 Source: Gartner. Size of industry measured in terms of contribution to U.S. GDP.

3 Source:

4 Source: NHGRI.

5 Source: NCBI.


The views and opinions expressed are those of the speakers and do not necessarily reflect those of TCMI, Inc. or its affiliates (“TCV”). TCV has not verified the accuracy of any statements by the speakers and disclaims any responsibility therefor. This blog post is not an offer to sell or the solicitation of an offer to purchase an interest in any private fund managed or sponsored by TCV or any of the securities of any company discussed. The TCV portfolio companies identified, if any, are not necessarily representative of all TCV investments, and no assumption should be made that the investments identified were or will be profitable. For a complete list of TCV investments, please visit For additional important disclaimers regarding this blog post, please see “Informational Purposes Only” in the Terms of Use for TCV’s website, available at