About the Role
Work on the development and optimization of data pipelines, storage systems, and ML operations tools that power autonomous vehicle training and simulation workflows.
Responsibilities
- Develop and maintain distributed data storage and processing systems
- Design scalable pipelines for ingesting and transforming sensor and simulation data
- Improve reliability and efficiency of machine learning training data workflows
- Collaborate with research and product teams to understand data needs
- Implement monitoring and observability for data pipelines
- Optimize query performance across petabyte-scale datasets
- Build tools to automate data validation and quality checks
- Support versioning and lineage tracking for ML datasets
- Enhance infrastructure for model training and evaluation pipelines
- Contribute to the design of low-latency data retrieval systems
- Work on fault-tolerant systems for large-scale data processing
- Integrate new data sources into existing data architectures
- Develop APIs for programmatic access to data and models
- Improve developer experience for data and ML workflows
- Troubleshoot production issues in data infrastructure
- Write clean, maintainable code with comprehensive testing
- Participate in system design and architecture reviews
- Ensure data consistency and integrity across pipelines
- Support deployment and scaling of ML models in production
- Collaborate on security and access control for sensitive data
- Optimize cloud resource usage for cost efficiency
- Document system designs and operational procedures
- Contribute to on-call rotations for critical systems
- Stay current with advancements in data engineering and ML Ops
- Mentor team members on best practices in data infrastructure
Nice to Have
- Master’s degree in computer science or related field
- Experience with real-time data processing systems
- Contributions to open-source data or ML projects
- Knowledge of autonomous vehicle technology
- Experience with data lake architectures
- Familiarity with workflow orchestration tools
- Background in high-performance computing
- Experience with data governance frameworks
- Understanding of MLOps best practices
- Prior work with simulation data systems
Compensation
Competitive salary and equity package
Work Arrangement
Hybrid work model with office and remote flexibility
Team
Part of a high-performing engineering team focused on data systems and ML infrastructure
About the Axion Platform
- Axion is a high-performance data engine designed for autonomous vehicle companies to manage, search, and utilize massive datasets from real-world and simulated environments.
- It enables teams to accelerate development by providing fast access to relevant scenarios and training data.
Our Approach to ML Ops
- We focus on building robust pipelines that connect data collection, model training, and deployment in a repeatable, auditable way.
- The goal is to reduce iteration time for ML teams while ensuring data quality and system reliability.