Dremio Deepens Open-Source Roots with Iceberg V3, Polaris Leadership
- Apache Iceberg V3: Full read and write support now available in Dremio Cloud, enhancing performance with features like deletion vectors and VARIANT data type.
- Polaris Graduation: Apache Polaris, co-created by Dremio and Snowflake, has graduated to a Top-Level Project (TLP) at the ASF, signaling maturity and broad adoption.
- ASF Board Seat: Dremio engineer JB Onofre elected to the Apache Software Foundation board, strengthening Dremio's influence in open-source data standards.
Experts view Dremio's advancements in Apache Iceberg V3 and Polaris as a significant step toward establishing open, interoperable data standards, reducing vendor lock-in and enhancing performance for modern data engineering.
Dremio Deepens Open-Source Roots with Iceberg V3 and Polaris Leadership
SAN FRANCISCO, CA – April 06, 2026 – Dremio, the self-proclaimed "Agentic Lakehouse" company, has intensified its campaign for open data standards with a series of significant announcements, including full support for Apache Iceberg V3, the graduation of its co-created Apache Polaris project, and a new seat on the Apache Software Foundation (ASF) board. These moves position the company not just as a user of open-source technologies, but as a key architect of the future data lakehouse, directly challenging the proprietary ecosystems of larger competitors.
The announcement comes as the data industry increasingly coalesces around Apache Iceberg as the de facto open standard for massive analytic datasets. Dremio’s assertion of leadership is underscored by the general availability of full read and write support for Iceberg V3 in its Dremio Cloud platform, a milestone that puts it at the forefront of implementation for the format's latest specification.
The Power of V3: More Than Just an Update
Apache Iceberg V3 represents a pivotal evolution for the table format, designed to address the complex demands of modern data engineering, from high-frequency updates to the proliferation of semi-structured data. Dremio's integration brings several game-changing features to its users.
A key enhancement is the introduction of deletion vectors, which dramatically accelerate row-level operations. This new architecture provides a more performant and scalable way to handle changes, a critical capability for Change Data Capture (CDC) pipelines, streaming workloads, and compliance requirements like GDPR. Industry benchmarks have shown that this can lead to significantly faster write performance and more efficient row-level deletes.
Furthermore, V3 introduces the VARIANT data type, a native way to handle JSON and other semi-structured data without the performance penalty of parsing at query time. This eliminates the "schema-on-write" bottleneck, allowing enterprises to ingest data from diverse sources like APIs, IoT feeds, and NoSQL stores directly into optimized columnar formats. For regulated industries, the formalization of row-level lineage provides a built-in audit trail, tracking when each row was created or modified without requiring additional tooling.
"The Iceberg lakehouse has become the default architecture for AI and analytics," said Rahim Bhojani, CTO of Dremio, in the company's official release. "Most platforms added Iceberg as a feature, but Dremio was built on it from the ground up." This native approach, he argues, allows features to compound, creating a platform that is both faster and easier to manage.
Polaris Rising: An Open Challenge to Walled Gardens
Perhaps the most strategic element of Dremio's announcement is the continued momentum behind Apache Polaris, which recently graduated from incubation to a Top-Level Project (TLP) at the ASF. Co-created by Dremio and Snowflake, Polaris provides a vendor-neutral, open-source REST catalog for Iceberg tables.
This TLP status is a powerful signal of maturity and community adoption, validating Polaris as a production-ready standard for the entire ecosystem. It establishes a viable, open alternative to proprietary catalogs like AWS Glue or Databricks' Unity Catalog, which can lock users into a specific vendor's stack. By implementing a standard API, Polaris ensures that multiple query engines—including Spark, Flink, Trino, and DuckDB—can all access and modify the same Iceberg tables with a single source of truth for metadata.
Dremio leverages this standard as the foundation for its own Open Catalog. This allows the company to offer a crucial layer of enterprise-grade governance—including role-based access control (RBAC), row-level filters, and column masking—that is enforced consistently, regardless of which engine is querying the data. For organizations committed to a multi-tool, best-of-breed data strategy, this interoperability is a critical enabler, preventing the very vendor lock-in Dremio positions itself against.
"The graduation of Polaris is a defining moment for open data architectures," noted one prominent open-source contributor not affiliated with Dremio. "It provides the stability and long-term governance guarantees needed for production infrastructure, removing one of the biggest risks for enterprises considering a truly open lakehouse."
An 'Agentic' Architecture Built on Open Foundations
These open-source advancements are the bedrock of Dremio's "Agentic Lakehouse" vision, a platform designed not just for human analysts but also for AI agents that can query, analyze, and act on data autonomously. The company's claim to being "Iceberg-native" is central to this strategy.
Unlike platforms that bolt on Iceberg compatibility, Dremio's query engine was co-created with and built natively on Apache Arrow, the open-source columnar memory format. This allows it to process Iceberg and Parquet data in vectorized batches without costly and proprietary format conversions, leading to significant performance gains.
Building on this foundation are features designed to automate the complexities of managing petabyte-scale data. Autonomous Reflections automatically observe query patterns and create optimized materializations to accelerate performance from seconds to sub-second, all without manual tuning. Meanwhile, Iceberg Clustering uses Z-order indexing to continuously co-locate related data across multiple columns, minimizing I/O and avoiding costly full-table rewrites. Together with automated compaction and snapshot cleanup, these capabilities aim to free data engineers from maintenance tasks so they can focus on building data products.
A Strategic Play for Industry Influence
Beyond the technical merits, Dremio's recent moves represent a calculated play for influence over the future of the data ecosystem. The company's deep contributions are not limited to code; the election of Dremio engineer JB Onofre to the Apache Software Foundation's board of directors is a testament to this strategy. Onofre, who was instrumental in shepherding Polaris through its incubation period, now has a voice in the highest governing body of the world's largest open-source foundation.
This deep-seated involvement in projects like Arrow, Iceberg, and Polaris gives Dremio a powerful role in shaping the standards that the entire industry will build upon. By championing and contributing to these open foundations, the company is making a long-term bet that the future of data is interoperable, and that the winners will be those who build the most open and efficient platforms on that common ground. As enterprises increasingly seek to avoid vendor lock-in while scaling their AI and analytics initiatives, this commitment to a truly open ecosystem may prove to be Dremio's most compelling differentiator.
📝 This article is still being updated
Are you a relevant expert who could contribute your opinion or insights to this article? We'd love to hear from you. We will give you full credit for your contribution.
Contribute Your Expertise →