The Self-Learning SRE: AI That Learns to Fix Your Code
Cleric's new AI SRE agent promises to end engineering firefighting by learning from incidents. With $9.8M in funding, is this the future of system reliability?
The Self-Learning SRE: AI That Learns to Fix Your Code
SAN FRANCISCO, CA – December 09, 2025 – The persistent hum of alerts and the frantic scramble of "all hands on deck" to resolve production incidents have become a costly, accepted ritual in modern software engineering. This cycle of reactive firefighting consumes valuable engineering hours that could be spent on innovation. Now, a new entrant backed by significant funding and industry accolades aims to break that cycle with an autonomous agent designed not just to respond to problems, but to learn from them.
San Francisco-based startup Cleric has officially launched what it calls the first self-learning AI Site Reliability Engineer (SRE), an agentic system that promises to autonomously investigate issues, learn from every incident, and evolve its understanding of a company's unique infrastructure. The launch is amplified by a substantial $9.8 million seed round and a coveted spot on Gartner's 2025 "Cool Vendor in AI for SRE and Observability" report, signaling that both investors and analysts see a paradigm shift on the horizon for production operations.
Beyond Alerts: How AI is Learning to 'Reason' About Production
For years, the AIOps market has focused on reducing "alert noise"—using machine learning to correlate signals and group alerts to lessen the cognitive load on engineers. Cleric proposes a more profound intervention: an AI that acts as a junior SRE teammate. When an incident occurs, the agent doesn't just flag it; it begins an autonomous investigation, delivering initial findings and supporting evidence directly into team collaboration channels like Slack.
This approach moves beyond simple pattern matching. "We designed Cleric to reason about systems the way experienced engineers do: by correlating context across logs, metrics, and configurations, not just reacting to individual alerts," explained Willem Pienaar, Cleric’s co-founder and CTO. This ability to synthesize disparate data sources is critical. For complex cases, human engineers can collaborate with the agent, guiding its reasoning through conversation or diving into a detailed web interface, effectively mentoring the AI. The system provides confidence scores for its findings and, crucially, learns from feedback, improving its signal-to-noise ratio over time.
The impact on engineering capacity is the most tangible metric of success. Early adopters report freeing up 20–30% of engineering time previously lost to repetitive, manual troubleshooting. BlaBlaCar, the community-based travel app, has been a key test case, running the system in production since early 2025. According to Maxime Fouilleul, Head of Infrastructure & Operations at BlaBlaCar, the benefits extend beyond faster fixes. “Our goal isn’t complete alert coverage,” he stated. “It’s intelligent coverage, using Cleric′s insights to proactively eliminate systemic issues.” This points to a strategic shift from incident response to systemic resilience, where the AI helps uncover latent patterns that inform long-term architectural improvements.
Carving a Niche in a Crowded AIOps Landscape
Cleric’s claim to be the "first self-learning AI SRE" is bold in a market populated by mature AIOps platforms and observability giants. Competitors like Rootly, Observe's AI SRE, and the embedded AI capabilities within Datadog (Bits AI) and Dynatrace (Davis AI) all leverage machine learning for incident management and root cause analysis. Many of these tools already incorporate learning from past incidents to improve future performance.
However, Cleric's differentiation lies in its emphasis on creating a truly agentic, evolving "teammate." The company’s founders, former platform engineers from the Southeast Asian super-app Gojek, built the system based on their firsthand experience with the overwhelming complexity of hyper-scale microservices environments. Co-founder and CEO Shahram Anver, who previously led MLOps and DevOps platforms at Gojek, and CTO Willem Pienaar, creator of the popular open-source feature store Feast, understand the limitations of static automation. Their vision is for an AI that learns not just from data, but from the nuanced decisions and expertise of its human counterparts.
This focus on continuous, interactive learning is what sets it apart from tools that rely on pre-defined runbooks or more rigid ML models. By integrating with a company's existing observability stack—including Datadog, Grafana, and Prometheus—it avoids the need for a costly "rip and replace" overhaul. Instead, it layers on top, learning the unique failure modes and signal patterns of that specific environment.
Market Validation: Following the Money and the Analysts
An innovative idea is one thing; market validation is another. Cleric's simultaneous announcement of a $9.8 million seed round and its Gartner recognition provides powerful external confirmation of its approach. The funding round, led by Vertex Ventures US with follow-on participation from initial investor Zetta Venture Partners, underscores significant confidence in the company's trajectory.
Zetta Venture Partners, the first VC firm to focus exclusively on AI-native B2B startups, is a particularly noteworthy backer. Their investment thesis is built on identifying companies where AI is not a feature but the core foundation. Zetta's support suggests they see Cleric not as another DevOps tool, but as a fundamental reshaping of how engineering work is done. This aligns with the firm's belief that language model agents are poised to transform complex enterprise tasks like incident response.
The inclusion in Gartner's "Cool Vendor in AI for SRE and Observability 2025" report further cements Cleric's position as an innovator to watch. Such reports aim to highlight vendors that are pioneering new approaches to persistent industry problems. For Gartner to single out Cleric's methodology validates the idea that reducing engineer burnout and improving reliability requires more than just better dashboards; it requires intelligent automation that learns and adapts.
The Future of Operations: From Firefighting to Foresight
The convergence of these trends—increasing system complexity, engineer burnout, and advancing AI capabilities—has created a critical inflection point for IT operations. The promise of tools like Cleric is not merely to automate the SRE role out of existence, but to elevate it. By offloading the toil of initial investigation and repetitive diagnostics, these AI agents free human experts to focus on higher-order problems: architectural design, long-term strategy, and building more resilient systems from the ground up.
As CEO Shahram Anver noted, “Production isn’t static. It's a living environment. Cleric learns from every incident, alert, and human decision to evolve how it supports operations.” This philosophy captures the essence of the emerging paradigm. The goal is no longer to achieve a mythical "zero-alert" state but to build operational systems that learn as quickly as the production environments they manage.
With its new funding, Cleric plans to expand R&D, accelerate customer deployments, and deepen its partnerships with observability platforms. Its journey will be a crucial test case for the future of human-AI collaboration in high-stakes technical domains. If successful, the era of constant firefighting may finally give way to a future of operational foresight, where systems don't just fail and get fixed—they fail, learn, and become stronger.
📝 This article is still being updated
Are you a relevant expert who could contribute your opinion or insights to this article? We'd love to hear from you. We will give you full credit for your contribution.
Contribute Your Expertise →