Remote (Country)

Torc is hiring a Senior, ML Engineer - ML Ops Framework

About the Role

The role involves building and improving the infrastructure that supports machine learning workflows, ensuring models are efficiently trained, deployed, and monitored in production.

Responsibilities

  • Design and implement scalable ML pipelines for training and inference
  • Develop tools to automate model deployment and rollback processes
  • Monitor system performance and model behavior in production
  • Collaborate with data scientists to integrate models into production systems
  • Improve reliability and efficiency of ML infrastructure
  • Create versioning systems for models and datasets
  • Optimize resource usage for training and serving workloads
  • Ensure reproducibility across ML workflows
  • Support continuous integration and delivery for ML systems
  • Troubleshoot issues in model serving environments
  • Maintain documentation for ML operations processes
  • Enforce security and compliance standards in ML systems
  • Work with infrastructure teams to manage cloud resources
  • Implement monitoring and alerting for model performance
  • Contribute to internal frameworks for experiment tracking
  • Support model validation and testing procedures
  • Assist in scaling systems for high-throughput inference
  • Evaluate new tools and technologies for ML Ops
  • Promote best practices in machine learning engineering
  • Ensure seamless collaboration between research and engineering teams

Nice to Have

  • Master’s degree in computer science or related field
  • Experience with large-scale distributed systems
  • Contributions to open-source ML projects
  • Knowledge of real-time data processing frameworks
  • Background in automated testing for ML systems
  • Experience with feature store implementations
  • Familiarity with regulatory requirements for ML systems
  • Prior work in safety-critical or high-assurance domains

Compensation

Competitive salary and benefits package

Work Arrangement

Remote position with flexible hours

Team

Collaborative team focused on scalable machine learning systems

About the Team

This team builds foundational systems that enable reliable and scalable machine learning in production. Members work closely with researchers and engineers to bridge the gap between experimentation and deployment.

What We Value

We prioritize technical excellence, clear communication, and a collaborative mindset. Candidates should demonstrate a strong ownership culture and a drive to solve complex infrastructure challenges.

Available for qualified candidates

Required Skills
PythonPytorchAWSTerraformEKSMachine LearningCloud Infrastructure
About company
Torc
A leader in autonomous driving since 2007, Torc is now part of the Daimler family and is focused solely on developing software for automated trucks to transform how the world moves freight.
All jobs at Torc Visit website
Job Details
Category data
Posted 7 months ago