LanceDB is hiring an Open Source Engineer to advance high-performance multimodal databases. You will leverage Java/Scala and Rust to expand the reach of Lance and LanceDB within the broader data infrastructure ecosystem.
What You'll Do
- Drive OSS community efforts to integrate Lance format into Spark, Hive Metadata Store, Presto, Trino, Ray, and other data infrastructure systems.
- Promote the Lance format at big data conferences and meetups.
- Design and maintain efficient distributed Lance dataset operations.
- Design efficient indices to power predicate pushdown in Spark, Ray, or Trino.
- Work on table format, data encodings, and various aspects of the Lance format in Rust.
- Operate on in-house data processing infrastructure.
What We're Looking For
- At least five years of experience building high-performance databases, big data systems, or web-scale data services.
- Experience with internals of open source big data or AI training systems, such as Hadoop, Spark, Flink, Ray, Iceberg, Delta-lake, Hudi, Clickhouse, Trino, Presto, PyTorch, or JAX.
- Hands-on experience with high-performance computing in Java or Scala.
- You thrive in a small, high-caliber team with autonomy, drive, and the ability to iterate fast.
Nice to Have
- You are an open-source veteran, committer, or PMC of large open source systems in the Apache community.
- You fearlessly challenge the status quo and dismiss mediocre engineering as unacceptable.
- You have a proven record of driving large features in Apache projects.
- You are familiar with Java, Rust, C++, Apache Arrow, Apache DataFusion, Apache Parquet, Apache Iceberg, and Delta Lake.
Technical Stack
- Languages: Java, Scala, Rust, C++
- Big Data Systems: Hadoop, Spark, Flink, Ray, Iceberg, Delta-lake, Hudi, Clickhouse, Trino, Presto
- AI Frameworks: PyTorch, JAX
- Apache Ecosystem: Apache Arrow, Apache DataFusion, Apache Parquet
Team & Environment
You'll work with a small, high-caliber team where autonomy, drive, and fast iteration are the standard.



