AI Agents Are Now Building Data Infrastructure, and It’s Happening Fast

📊 Key Data
  • 81,000 new data pipelines shipped in January 2026
  • 91% of pipelines authored by AI agents
  • 34-fold increase in pipeline volume in one year
🎯 Expert Consensus

Experts would likely conclude that AI agents are rapidly transforming data engineering, automating complex workflows, and unlocking previously inaccessible 'dark data' at scale.

5 days ago
AI Agents Are Now Building Data Infrastructure, and It’s Happening Fast

AI Agents Are Now Building Data Infrastructure, and It’s Happening Fast

SAN FRANCISCO, CA – June 03, 2026 – At Snowflake's massive annual user conference this week, the data cloud giant named a small Berlin-based company, dltHub, its Startup Program Product Partner of the Year. While partner awards are a staple of tech conferences, this one signals a shift that extends far beyond corporate accolades. It marks a quiet inflection point in how our digital world is being built, where AI agents are increasingly taking over one of the most complex and critical jobs in technology: data engineering.

The award recognizes dltHub for helping over a thousand organizations, including giants like Stellantis and Flatiron Health, move data into Snowflake's AI Data Cloud. But the real story isn't just moving data; it's how that data is being moved. dltHub is at the vanguard of a movement where AI agents, not just human developers, are writing the code that creates and manages the data pipelines essential for modern business and AI. This is the industrialization of a once-artisanal craft, and the scale is staggering.

The Rise of the Agentic Engineer

For decades, data engineering has been a bottleneck. It’s the painstaking, often thankless work of building digital plumbing to connect disparate data sources to a central repository where they can be analyzed. It's a field dominated by complex tools and a shortage of senior talent. dltHub’s thesis, now validated by Snowflake, is that this paradigm is over.

"This award reflects a thesis we've been investing in for years: that the future of data engineering is code-first, Python-native, and increasingly written by AI agents working alongside humans," said Matthaus Krzykowski, CEO and Co-Founder of dltHub, in a statement.

The company's platform, dltHub Pro, is built on its popular open-source Python library, dlt. It acts as a force multiplier, allowing developers to work with AI coding assistants like Claude Code or Codex to build production-grade data pipelines in a fraction of the time. An engineer describes the data source they need, and the AI agent generates the Python code. dltHub Pro then takes that code and productionizes it, handling deployment, validation, and monitoring.

The metrics shared by the company are eye-watering. In January 2026 alone, the dltHub community shipped approximately 81,000 new data pipelines. A stunning 91% of them were authored by AI agents. This is ten times the number of pipelines written by human developers on the platform and represents a 34-fold increase in volume from just one year prior. The era of the "agentic engineer" isn't a future-looking concept; it's happening now, at scale.

For consultancies like Tasman Analytics, this translates to a radical change in business velocity. A process that previously took a senior engineer a week—building a connector to a new API—can now be done by a mid-level engineer in an afternoon. According to one case study, they went from API documentation to a running pipeline in just 20 minutes, a task that would have previously been scoped for two weeks.

Illuminating Corporate 'Dark Data'

Perhaps the most significant impact of this new approach is its ability to unlock data that was previously inaccessible. Large enterprises, particularly in regulated industries like finance, healthcare, and manufacturing, are sitting on decades of valuable information trapped in legacy systems. These are the undocumented databases, archaic Enterprise Resource Planning (ERP) systems, and internal APIs that modern, graphical user interface (GUI)-based data tools simply can't reach.

This is the corporate "dark data" that holds immense potential for AI and analytics, but it has been too costly and complex to extract. dltHub’s code-first approach, supercharged by AI, is changing that calculation. A striking example comes from the non-profit Pro Juventute, where a data lead used an AI agent with dltHub Pro to tackle a legacy ERP system with over 1,231 tables and zero documentation. The task, which would have been a months-long, multi-person archeological dig, was completed in hours.

This capability is why dltHub has earned Snowflake Industry Competencies in Financial Services, Technology, Manufacturing & Industrial, and Healthcare & Life Sciences. It’s also why global automaker Stellantis now orchestrates 60,000 Snowflake pipelines a month on a dlt-based platform, and why Flatiron Health, a leader in oncology data, was able to cut its pipeline costs by 50% after migrating. The tool isn't just for new, cloud-native companies; it's a bridge to modernization for the backbone of the global economy.

The Ecosystem as the Engine

dltHub's success is also a testament to Snowflake's strategy. In the hyper-competitive cloud platform wars, the strength of a partner ecosystem is a critical differentiator. Snowflake's Startup Program is a key pillar of this strategy, designed to nurture early-stage companies building on its platform by providing credits, technical mentorship, and go-to-market support. The program's prestige gives startups like dltHub enterprise credibility, smoothing the path through procurement and security reviews.

This award, one of the "highest honors" for partners, is based on "technical innovation and verifiable business results," according to Snowflake. It demonstrates a symbiotic relationship: dltHub brings thousands of Python-native developers and a revolutionary AI-driven workflow into the Snowflake ecosystem. In return, Snowflake provides the scalable, secure foundation and market access for dltHub to thrive.

"dltHub leveraged our Snowflake for Startups program to ship real products on the AI Data Cloud, including a Snowflake Native App that lets customers replicate operational databases into Snowflake without data ever leaving their account," noted Amy Kodl, Snowflake's SVP of Worldwide Alliances & Channels.

That Native App is a crucial piece of the puzzle. It allows customers to run the entire replication pipeline for databases like Oracle or MSSQL directly inside their own Snowflake account. This means sensitive data never leaves their secure environment, a non-negotiable requirement for many in finance and healthcare. By fostering partners who solve these critical, specific problems, Snowflake makes its own platform more powerful and indispensable. The recognition of dltHub is an acknowledgment that the future of the AI Data Cloud will be built not just by Snowflake, but by a vibrant ecosystem of innovators who are pushing the boundaries of what's possible.

The award is more than a plaque; it's a clear signal from the market leader in cloud data that the very foundation of data management is being rebuilt by AI, one automated pipeline at a time. This transformation is enabling organizations to finally tap into their entire data estate, unlocking insights and efficiencies that were previously locked away in the complex plumbing of their legacy systems.

📝 This article is still being updated

Are you a relevant expert who could contribute your opinion or insights to this article? We'd love to hear from you. We will give you full credit for your contribution.

Contribute Your Expertise →
UAID: 33531