Responsibilities
- Design, build, and operate scalable, fault-tolerant infrastructure for LLM Research: distributed compute, data orchestration, and storage across modalities.
- Develop high-throughput systems for data ingestion, processing, and transformation — including training data catalogs, deduplication, quality checks, and search.
- Build systems for traceability, reproducibility, and robust quality control at every stage of the data lifecycle.
- Implement and maintain monitoring and alerting to support platform reliability and performance.
- Collaborate with research teams to unlock new features, improve data quality, and accelerate training cycles.
Requirements
- Bachelor’s degree or equivalent experience in computer science, engineering, or similar.
- Proficiency in at least one backend language (we use Python or Rust).
- Are fluent in distributed compute frameworks such as Apache Spark or Ray.
- Are deeply familiar with cloud infrastructure, data lake architectures, and batch and streaming pipelines.
- Comfort operating across the stack and owning projects end-to-end.
- Thrive in a highly collaborative environment involving many, different cross-functional partners and subject matter experts.
- A bias for action with a mindset to take initiative to work across different stacks and different teams where you spot the opportunity to make sure something ships.
Nice to Have
- Have hands-on experience with Kafka, dbt, Terraform, and Airflow.
- Have experience building a web crawler.
- Have extensive experience understanding and scaling deduplication, data mining, and search.
- Have strong knowledge of file formats and storage systems (e.g., Parquet, Delta Lake, etc.) and how they impact performance and scalability.
- Are proactive about documentation, testing, and empowering your teammates with good tooling.
Benefits
- generous health, dental, and vision benefits
- unlimited PTO
- paid parental leave
- relocation support as needed
Work Arrangement
On-site — San Francisco, California
Additional Information
- This is an 'evergreen role' that we keep open on an on-going basis to express interest.
- We continuously review applications and reach out to applicants as new opportunities open.
- You may reapply if you gain more experience, but please avoid applying more than once every 6 months.
- You are welcome to apply to project or team specific roles in addition to this evergreen role.
Visa sponsorship available