About the Role
The role involves building robust data pipelines, ensuring data quality, and enabling efficient access to large-scale datasets used in AI research and product development.
Responsibilities
- Develop and manage data infrastructure for high-volume systems
- Design reliable data pipelines to support training and evaluation workflows
- Ensure data consistency, accuracy, and accessibility across platforms
- Collaborate with machine learning teams to understand data needs
- Optimize data storage and retrieval for performance and cost
- Implement monitoring and alerting for data pipeline health
- Support the integration of new data sources into existing systems
- Maintain documentation for data models and workflows
- Work with cross-functional teams to align data practices with business goals
- Contribute to data governance and security standards
Nice to Have
- Background in machine learning operations or AI infrastructure
- Experience with large-scale data warehousing solutions
- Exposure to real-time data processing systems
- Contributions to open-source data projects
- Prior work in fast-paced research-driven environments
Benefits
- Comprehensive health insurance coverage
- Retirement savings plan with employer contributions
- Paid time off and flexible vacation policy
- Parental leave for all caregivers
- Professional development stipend
- Remote work equipment allowance
- Mental health and wellness resources
- Inclusive workplace culture with employee resource groups
Compensation
Competitive salary and equity package
Work Arrangement
Hybrid work model with flexible remote options
Team
Collaborative engineering team focused on machine learning infrastructure
Our Mission
We aim to advance natural language understanding through responsible AI development and broad access to language models.
Impact of Your Role
Your work will directly influence the scalability and reliability of data systems powering cutting-edge AI applications.
Available for qualified candidates