Apache Spark in Rust: Sail Architecture & Benchmarks

If you run Spark platforms in production, you spend real time tuning executors, chasing shuffle spills, handling OOM kills, and second-guessing your JVM GC config. A new project deserves your attention. Sail (github.com/lakehq/sail) is an open-source, drop-in Apache Spark replacement written entirely in Rust. No JVM, no Scala, no GC pauses. It speaks the Spark Connect protocol, so your existing PySpark SQL and DataFrame code runs without changes.

This is not another Spark accelerator that wraps native code around the JVM. Sail eliminates the JVM. Built by LakeSail (lakesail.com), the project shipped 24 releases in under two years, gathered 3,000+ GitHub stars and 41 contributors, and now tops every major Spark configuration on the public ClickBench benchmark, including Spark with native accelerators.

I spent several hours researching Sail for this article: the architecture docs, the source, the benchmarks, the blog posts, and third-party reviews. Here is the full picture. The architecture, the performance data, the maturity assessment, and how Sail compares to every other Spark accelerator on the market.

Architecture: A Clean Break From the JVM

The core insight behind Sail is that Spark’s JVM architecture is not something you fix by adding native accelerators. Databricks’ Photon, Apple’s DataFusion Comet, and Meta’s Velox (via Gluten/Gazelle) all integrate with Spark’s JVM executors. They replace parts of the physical execution layer with native code, but still cross the JVM boundary for control flow, UDF execution, and data serialization. That boundary is expensive.

Sail takes a different path. Instead of plugging into the JVM, it implements the Spark Connect protocol directly in Rust. The server is a native gRPC endpoint. PySpark connects to it via SparkSession.builder.remote("sc://host:50051"). From that point on, all execution happens in Rust. The JVM is never involved.

The Stack, Bottom to Top

Apache Arrow: columnar in-memory format throughout. No row-to-columnar conversion anywhere in the pipeline. Data stays in Arrow from read through shuffle to result delivery.
Apache DataFusion: the query engine. Logical plan, physical plan, optimizer rules, vectorized execution with SIMD. Sail extends DataFusion with Spark-compatible nodes (custom logical operators, physical operators, catalog extensions).
Custom Rust SQL parser: written in one week with the chumsky parser-combinator library and Rust procedural macros. It replaced the community sqlparser-rs library, which had gaps in Spark SQL syntax. SQL test success rose from 79.6% to 94.5% on 1,373 parsing tests, and overall coverage exceeds 80% of ~3,800 Spark SQL parsing tests.
Sail Spec: an unresolved intermediate representation that expresses both SQL strings and Spark Connect relations. Both inputs, SQL and the DataFrame API, convert to this spec before analysis.
PyO3 Python bindings: Python runs in-process via PyO3 rather than as a separate process (Spark’s Py4J model). Python UDFs receive Arrow arrays with zero-copy access and no serialization overhead.
Spark Connect gRPC: bidirectional gRPC protocol between the PySpark client and the Sail server. It supports Spark 3.5.x and 4.x.

Control Plane and Data Plane

Sail separates control flow and data flow. The control plane uses an actor model for safe concurrency without locks. The data plane uses Arrow Flight gRPC for zero-copy columnar data exchange between workers during shuffle. In cluster mode (Kubernetes), workers are stateless Rust processes. They start in seconds, idle at single-digit MB of memory, and scale to zero when idle.

This stateless worker design is a real operational difference from Spark. Spark executors are heavyweight JVM processes with fixed heap allocations, GC tuning requirements, and slow startup. Sail workers have none of that. Elastic scaling becomes practical: spin up 100 workers for a heavy shuffle stage, then drop them when it finishes.

Deployment Modes

Mode	Description	Use Case
Local	Single process, multi-threaded, starts in seconds	Development, CI, moderate workloads
Local-Cluster	Multi-process on one node, all CPU cores	Large local datasets
Kubernetes	Distributed driver + stateless workers	Production ETL and analytics

Performance: The ClickBench Numbers

The Sail team’s derived TPC-H numbers (100 GB, Parquet, r8g.4xlarge) show 102.75s vs Spark’s 387.36s (3.8x), peak memory 22 GB vs 54 GB, and zero shuffle spill against Spark’s 110+ GB. The ClickBench results published in May 2026 go further.

ClickBench is a public, reproducible benchmark maintained by the ClickHouse team. It runs 43 analytical queries against a 14 GB, 100M-row dataset (hits.parquet). Every system runs best-of-3 on identical AWS hardware. The results are peer-reviewed and published open-source on GitHub.

Here is how Sail compares to every major Spark configuration on ClickBench (source: lakesail.com/performance/, data from 2026-05-11):

System	Instance	Relative Runtime (Sail=1.0)
Sail (Parquet)	c6a.4xlarge (16 vCPU, 32 GB)	×1.00
Spark + Gluten/Velox	c6a.4xlarge	×4.03
Spark + Auron	c6a.4xlarge	×4.71
Spark + Comet	c6a.4xlarge	×5.49
Apache Spark (plain)	c6a.4xlarge	×5.90

Sail runs ~4-6x faster than Spark with native accelerators and ~6x faster than plain Spark on the same hardware. The median per-query speedup against plain Spark is 8.4x. The best single query (Q7, a MIN/MAX on EventDate) reaches 216.7x because Sail’s instant startup avoids Spark’s multi-second planning overhead. The worst query (Q35, a heavy GROUP BY with ordering) is still 2.6x faster.

These results are published on the official ClickBench page (benchmark.clickhouse.com) and are fully reproducible: clone github.com/ClickHouse/ClickBench, then run cd sail && ./install && ./benchmark.sh.

Beyond ClickBench: TPC-H Resource Utilization

On the derived TPC-H benchmark (100 GB, AWS r8g.4xlarge), the resource efficiency gap is just as clear:

Metric	Spark	Sail
Total query time (22 queries)	387.36 s	102.75 s
Per-query speedup	Baseline	43% to 727%
Peak memory	~54 GB (sustained)	22 GB (1-second peak)
Disk write (shuffle spill)	>110 GB	0 GB
Memory release after queries	Inefficient	Proactive

LakeSail combines the 4x speedup with running on a quarter of the instance size to claim a 94% cost reduction (one sixteenth of Spark’s infrastructure cost). That figure assumes proportional cloud pricing by instance size and that both gains hold at once, so treat it as a best case rather than a guarantee. Test it against your own workload.

A Note on Benchmark Choice

One caveat on all of this. Query benchmarks like ClickBench and TPC-H are reproducible and widely cited, but they lean on CPU-bound analytical queries. They tell you little about how an engine handles a full read, sort, shuffle, and write under I/O pressure. For that, my own preference is TPCx-HS, the TPC big-data sort benchmark that almost nobody runs anymore. I updated it a few years ago to partially support Spark 3 (github.com/julienlau/tpcx-hs). It stresses the parts of a data platform that usually break first in production: disk throughput, network shuffle, and end-to-end data movement. No single benchmark captures how an engine behaves on your data. Run more than one, and run them on your own workload.

Spark Accelerators Compared

Sail is not the only project trying to make Spark faster. Here is how the landscape looks as of mid-2026, based on my research and the official Apache DataFusion Comet vs Gluten comparison page:

Project	Language	Approach	Engine	Open Source	ClickBench vs Sail
Sail	Rust	Full replacement (Spark Connect)	DataFusion	Yes (Apache 2.0)	×1.00 (baseline)
Gluten+Velox	C++	Spark plugin (Substrait)	Velox (Meta)	Yes (Apache 2.0)	×4.03
Comet	Rust	Spark plugin (protobuf)	DataFusion	Yes (Apache 2.0)	×5.49
Auron	Rust	Spark plugin	DataFusion	Yes (Apache 2.0)	×4.71
Blaze	Rust	Spark plugin	DataFusion	Yes	Not in ClickBench
Photon	C++	Spark plugin (proprietary)	Proprietary	No	N/A
RAPIDS	C++/CUDA	GPU Spark plugin	NVIDIA GPU	Yes	N/A

Why Sail Leads on This Benchmark

All the other accelerators (Comet, Gluten, Auron, Blaze) operate as Spark plugins. They intercept physical plans inside Spark’s JVM executors and offload computation to a native engine. But they still inherit:

JVM startup latency: every Spark executor starts a JVM, loads classes, and runs GC.
Py4J bridge: Python UDFs still cross a serialization boundary between the JVM and the Python process.
Row-to-columnar conversion: Spark’s internal row format must convert to Arrow for the native engine.
JVM shuffle: shuffle data is still written to disk by the JVM shuffle service.

Sail removes all of these because it never touches the JVM. The Spark Connect protocol means the PySpark client is a thin gRPC stub, and all execution is Rust-native from plan to result.

Ready for Production Today

Core SQL + DataFrame API (Spark 3.5.x and 4.x): passes 94.5% of 1,373 Spark SQL parsing tests and >80% of ~3,800 total.
Parquet, Delta Lake, and Iceberg read and write with native format support.
Single-node local mode: starts in seconds; reliable for dev, CI, and moderate ETL.
Python UDFs via PyO3: zero-copy, faster than Spark’s Py4J.
Most analytical SQL functions: aggregations, windows, joins, subqueries, CTEs.
Multiple catalog providers: Iceberg REST, Glue, Unity, Hive, OneLake.
Multiple storage backends: all major cloud providers plus HDFS and local.

Still Maturing

Distributed cluster mode: the control plane redesign landed in v0.5 (Feb 2026). It runs on Kubernetes but is newer and less tested than local mode.
Structured Streaming: listed as a goal and partially supported, not yet the primary focus.
PySpark compatibility checker: explicitly experimental. It checks whether functions are implemented, not whether behavior matches.
RDD API: not supported, and not planned (it depends on JVM internals outside the Spark Connect protocol).
Pandas-on-Spark: roadmap item, not yet available.
Enterprise observability: available on the managed platform but gated.

Risk Profile for Migration

For teams using PySpark with SQL and DataFrames (the vast majority of Spark usage), the migration risk is low. The switch is a one-line change: SparkSession.builder.remote("sc://host:50051"). If something does not work, flip back to the Spark cluster. The Sail team provides a compatibility check script (python -m pysail.examples.spark.compatibility_check) for a first pass, though the docs warn it does not verify behavioral parity.

Recommended migration path: test on a small dataset first, compare output with Spark, write to a new output location (never overwrite existing data), then expand scope gradually.

Who Should Use Sail Today

Based on the current maturity, here is where Sail delivers the most value today:

Teams running Spark analytics on moderate datasets (100 GB to 10 TB): the 4-6x performance gain translates to real cost savings and faster iteration.
Teams frustrated by JVM tuning: no GC, no heap sizing, no shuffle-spill management. Sail runs on sensible defaults.
Interactive analytics workloads: where Spark’s multi-second startup dominates query time, Sail’s instant startup makes interactive BI workable.
Spark-on-Kubernetes deployments: stateless workers that scale to zero fit the Kubernetes model better than heavyweight executors.
Python-heavy PySpark pipelines: zero-copy UDFs via PyO3 remove the Py4J serialization bottleneck.
CI/CD test suites: Sail starts in seconds and uses little memory, which suits automated testing of Spark code.

Less suitable today: RDD-based workloads, deep JVM library integrations, and very large distributed jobs that need the proven fault tolerance of Spark’s DAG scheduler at petabyte scale.

Bottom Line

Sail is the most credible attempt I have seen at a Spark-compatible engine that genuinely breaks free of the JVM. The architecture is clean, the performance data is reproducible and independently verifiable via ClickBench, and the project’s velocity (24 releases in 2 years) signals a well-funded, focused team.

The 0.x version number is honest about the remaining gaps. Distributed mode is newer, streaming is partial, and RDD is not supported. But the core proposition works today for the workloads that make up the bulk of Spark usage: analytical SQL and DataFrame operations on batch data.

What stands out most is the ClickBench comparison against other Spark accelerators. Comet, Gluten+Velox, and Auron all claim speedups over plain Spark, and they deliver, typically 2-3x. Sail runs 4-6x faster than those accelerators on the same hardware. That gap is not incremental. It reflects a different architectural choice: replace the JVM instead of working around it.

If the team delivers on streaming and AI workload unification (the stated roadmap), Sail could become the default compute engine for the open-source lakehouse stack. For now, it is a strong alternative for any team spending real money on Spark infrastructure or hitting JVM limits on moderate-scale analytics. The migration risk is low: a one-line change, tested on a small dataset, reverted if needed. The upside is large: 4-6x faster, far lower cost, zero shuffle spills. That is enough to justify a serious evaluation.

If you are weighing a migration like this, or fighting Spark cost and performance on your own platform, an independent review can de-risk the decision before you commit. Book a free 30-minute scoping call.

References

github.com/lakehq/sail: source code, README, benchmark data.
lakesail.com: official site, blog, documentation.
Architecture docs: detailed architecture description.
TPC-H benchmark results: official Sail benchmark page.
ClickBench performance page: Sail vs all Spark configurations on ClickBench.
Supercharge Spark blog post: detailed benchmark methodology.
Sail 0.6: Arrow, End to End: release blog post.
How Sail Compares to Photon: the team’s own comparison.
Writing a Rust SQL Parser in One Week.
The Composable Data Stack.
Comet vs Gluten comparison: official Apache DataFusion docs.
ClickBench: public analytical DBMS benchmark.
Spark Native Accelerators landscape: community overview of the ecosystem.
Medium: Switching Spark’s Engine to LakeSail: third-party migration experience.

Research conducted June 2026. All benchmark data sourced from published, reproducible benchmarks (ClickBench and the Sail project’s derived TPC-H). Performance varies by workload. Test with your own data before making infrastructure decisions.

Julien Laurenceau, Data Infrastructure and Performance Engineering at PepiteData.

Sail: When Apache Spark Meets Rust (A Practitioner’s Deep Dive)

Published by jlu on 2026-06-14

Architecture: A Clean Break From the JVM

The Stack, Bottom to Top

Control Plane and Data Plane

Deployment Modes

Performance: The ClickBench Numbers

Beyond ClickBench: TPC-H Resource Utilization

A Note on Benchmark Choice

Spark Accelerators Compared

Why Sail Leads on This Benchmark

Ready for Production Today

Still Maturing

Risk Profile for Migration

Who Should Use Sail Today

Bottom Line

References

0 Comments

Leave a Reply Cancel reply

MinIO on XFS: Inode Exhaustion and Prefix Design for Lots of Small Files

MinIO and Small Files: When Erasure Coding Becomes 15x Replication

Upgrading a Ceph Multi-Site S3 Platform from Squid to Tentacle

Sail: When Apache Spark Meets Rust (A Practitioner’s Deep Dive)

Published by jlu on 2026-06-14

Architecture: A Clean Break From the JVM

The Stack, Bottom to Top

Control Plane and Data Plane

Deployment Modes

Performance: The ClickBench Numbers

Beyond ClickBench: TPC-H Resource Utilization

A Note on Benchmark Choice

Spark Accelerators Compared

Why Sail Leads on This Benchmark

Ready for Production Today

Still Maturing

Risk Profile for Migration

Who Should Use Sail Today

Bottom Line

References

0 Comments

Leave a Reply Cancel reply

Related Posts

MinIO on XFS: Inode Exhaustion and Prefix Design for Lots of Small Files

MinIO and Small Files: When Erasure Coding Becomes 15x Replication

Upgrading a Ceph Multi-Site S3 Platform from Squid to Tentacle