Illumina's Billion Cell Atlas Ignites AI Drug Discovery Race
- 1 billion cells mapped in the initial phase of the Billion Cell Atlas, with a planned expansion to 5 billion cells over three years.
- 20 petabytes of single-cell transcriptomic data projected within the first year.
- 20,000 genes in the human genome perturbed across 200 disease-relevant cell lines.
Experts view Illumina's Billion Cell Atlas as a transformative resource for AI-driven drug discovery, enabling unprecedented insights into disease mechanisms and accelerating the development of precision medicines.
Illumina's Billion Cell Atlas Ignites AI Drug Discovery Race
SAN DIEGO, CA – January 13, 2026 – Genomics leader Illumina today launched a monumental effort to map human disease at the cellular level, introducing the Billion Cell Atlas—a dataset of unprecedented scale designed to fuel artificial intelligence and accelerate the discovery of new medicines. The initiative, announced in partnership with pharmaceutical giants AstraZeneca, Merck, and Eli Lilly and Company, marks a significant strategic pivot for Illumina and intensifies the global race to build foundational biological data for the AI era.
The Atlas is the first product from Illumina's new BioInsight business and represents the initial phase of a three-year plan to build a 5 billion cell resource. It is being positioned as the world's largest genome-wide genetic perturbation dataset, providing a comprehensive view of how individual human cells respond when specific genes are turned on or off. This massive biological library is expected to unlock new insights into complex illnesses and dramatically shorten drug development timelines.
"We believe the cell atlas is a key development that will enable us to significantly scale AI for drug discovery," said Jacob Thaysen, chief executive officer of Illumina. "We are building an unparalleled resource for training the next generation of AI models for precision medicine and drug target identification, ultimately helping map the biological pathways behind some of the world's most devastating diseases."
Charting the Cell at Unprecedented Scale
At its core, the Billion Cell Atlas is a systematic exploration of genetic function. Using CRISPR gene-editing technology, researchers will perturb each of the 20,000 genes in the human genome across more than 200 disease-relevant cell lines. These cell lines were specifically chosen for their connection to historically difficult-to-treat conditions, including immune disorders, cancer, and cardiometabolic, neurological, and rare genetic diseases.
The project's sheer scale is enabled by Illumina's proprietary technology stack. The company's Single Cell 3' RNA prep platform allows for the capture of millions of individual cells in a single experiment, a necessary prerequisite for a billion-cell-scale project. The resulting torrent of information—projected to reach 20 petabytes of single-cell transcriptomic data within a year—is processed using the hardware-accelerated DRAGEN pipeline and hosted on the Illumina Connected Analytics cloud platform for analysis.
This end-to-end workflow transforms raw genetic sequences into a structured, queryable map of cellular cause-and-effect. For the first time, researchers can systematically ask what happens to a specific cell type when any given gene is silenced or activated, providing a powerful tool for validating potential drug targets and understanding disease mechanisms from the ground up.
A New Alliance Forging AI-Powered Medicine
The project's ambition is matched by the strategic weight of its founding partners. AstraZeneca, Merck, and Eli Lilly are not just passive recipients of the data; they are active collaborators in building the Atlas, contributing to the selection of curated cell lines to drive their respective research programs.
For these pharmaceutical leaders, the Atlas represents a critical tool to enhance their AI-driven discovery engines. Merck, for instance, plans to use the data to train its proprietary AI foundation models and construct "virtual cell models." These in silico simulations aim to predict how cells will respond to new drug candidates, potentially reducing reliance on costly and time-consuming wet-lab experiments.
"By harnessing advanced genomic patient datasets, Merck scientists are building and leveraging AI models grounded in real biological variation—not just literature text—and translating those insights into novel targets and precision biomarkers that matter for patients," said Iya Khalil, vice president and head of Data, A.I. & Genome Sciences at Merck. "Through our close collaboration with Illumina, we're establishing a scalable bridge from genomic insight to therapeutic impact."
This sentiment is echoed by the other partners, who see the Atlas as a way to overcome a fundamental bottleneck in modern medicine. "Translating genetic information into a clear understanding of disease mechanisms—and then ultimately into medicines—remains a core challenge in R&D," noted Slavé Petrovski, vice president of the Centre for Genomics Research at AstraZeneca. He added that the dataset will help "turn genetic signals into mechanistic biology we can directly study."
Ruth Gimeno, group vice president of Cardiometabolic Research at Eli Lilly and Company, emphasized the foundational nature of the project. "The next generation of AI-driven drug discovery will depend on biological data at a scale never before achieved," she stated. "Comprehensive datasets spanning diverse cell types offer the critical foundation needed to generate meaningful insights into human disease."
The Race to Build Biology's Foundational Datasets
While Illumina's claim to the "world's largest" genetic perturbation dataset appears solid, its launch does not occur in a vacuum. It represents a major move in a burgeoning, high-stakes field where tech and biotech entities are racing to create the foundational datasets that will power the future of medicine. This landscape includes several other massive cell-mapping initiatives.
The Human Cell Atlas (HCA), a global consortium founded in 2016, has a complementary goal of creating a comprehensive reference map of all human cell types. Though less focused on genetic perturbation, its efforts to catalog cellular diversity are crucial for interpreting datasets like Illumina's.
More direct competition comes from collaborations like the Arc Virtual Cell Atlas, from the Arc Institute and Vevo Therapeutics, which is also generating massive single-cell perturbation data to build predictive models of cell behavior. Similarly, the Chan Zuckerberg Initiative, in partnership with 10x Genomics and Ultima Genomics, recently announced its own "Billion Cells Project" to build advanced AI models for understanding gene function. These parallel efforts underscore a broad industry consensus: the future of drug discovery lies in training AI on vast, high-quality biological data.
BioInsight: A Strategic Pivot from Sequencers to Data
The Billion Cell Atlas is also the flagship product of BioInsight, a newly formed business unit that signals a major strategic evolution for Illumina. For decades, Illumina has dominated the market by selling the "picks and shovels" of the genomics revolution—its DNA sequencing machines and consumables. With BioInsight, the company is moving up the value chain to sell the "maps" derived from that technology.
This data-as-a-service model positions Illumina not just as a hardware provider but as a central data broker for the entire pharmaceutical ecosystem. By leveraging its technological dominance in sequencing to generate proprietary, high-value datasets, Illumina is creating a powerful new revenue stream and embedding itself more deeply into the drug discovery process.
This pivot allows the company to capitalize directly on the AI boom in biopharma, providing the essential fuel for the industry's computational models. By building a comprehensive, disease-specific atlas and pairing it with advanced algorithms, Illumina is betting that the future of unlocking the genome lies not only in the ability to read it but in the power to interpret it at massive scale.
📝 This article is still being updated
Are you a relevant expert who could contribute your opinion or insights to this article? We'd love to hear from you. We will give you full credit for your contribution.
Contribute Your Expertise →