Remote (Global) Employment

Autodesk Inc. is hiring a Principal Machine Learning Operations for AI Research Remote US or Canada

About the Role

Autodesk Ltd. is looking for a Principal Machine Learning Operations (MLOps) Developer for AI Research to join a team building and scaling foundation models trained on vast amounts of design data. In this role, you will work directly with AI researchers to create the infrastructure that will influence how designers, architects, and engineers interact with AI tools.

What You'll Do

  • Build scalable ML training pipelines and infrastructure to support foundation model development.
  • Design efficient data processing workflows for large-scale design datasets and industry-specific file formats.
  • Optimize distributed training systems and develop solutions for model parallelism, checkpointing, and resource management.
  • Analyze performance bottlenecks and provide solutions to scaling problems.
  • Implement and maintain robust, testable, and well-documented code.
  • Collaborate on projects at the intersection of research and product with a diverse, global team.
  • Present results to collaborators and leadership.

What We're Looking For

  • BSc or MSc in Computer Science or related field, or equivalent industry experience.
  • Experience with distributed systems for machine learning and deep learning at scale.
  • Strong knowledge of ML infrastructure and model parallelism techniques, including frameworks like PyTorch, Lightning, Megatron, DeepSpeed, and FSDP.
  • Proficiency in Python and strong software engineering practices.
  • Experience with cloud services and architectures (AWS, Azure, etc.).
  • Familiarity with version control, CI/CD, and deployment pipelines.
  • Excellent written documentation skills for code, architectures, and experiments.

Nice to Have

  • Experience with AEC data formats (e.g., BIM models, IFC files, CAD files, Drawing Sets).
  • Knowledge of the AEC industry and its specific data processing challenges.
  • Experience scaling ML training and data pipelines for large datasets.
  • Experience with distributed data processing and ML infrastructure (e.g., Apache Spark, Ray, Docker, Kubernetes).
  • Experience with performance optimization, monitoring, and efficiency in large-scale ML systems.
  • Experience with Autodesk or similar products (Revit, Sketchup, Forma).

Technical Stack

  • Frameworks: PyTorch, Lightning, Megatron, DeepSpeed, FSDP
  • Language: Python
  • Cloud: AWS, Azure
  • Infrastructure/Tools: Apache Spark, Ray, Docker, Kubernetes

Team & Environment

You will join a rapidly growing team within Autodesk Research that is distributed globally.

Work Mode

This is a remote position open to candidates in the US and Canada.

Autodesk is proud to be an equal opportunity employer and considers all qualified applicants for employment without regard to race, color, religion, age, sex, sexual orientation, gender, gender identity, national origin, disability, veteran status or any other legally protected characteristic.

Required Skills
PyTorchLightningMegatronDeepSpeedFSDPPythonAWSAzureApache SparkRayMachine LearningMLOpsDistributed TrainingCloud Infrastructure
Planning long-term in Thailand?

Full relocation support, start to finish

From visa strategy to housing, banking, and schools for your family — SVBL plans and manages every detail of your move to Thailand so nothing falls through the cracks.

Complete relocation planning
Family visa & school enrollment
Banking & insurance setup
Cultural integration support
Plan your move
One partner for everything
About company
Autodesk Inc.

Autodesk helps innovators turn their ideas into reality, transforming not only how things are made, but what can be made. Amazing things are created every day with their software – from the greenest buildings and cleanest cars to the smartest factories and biggest hit movies.

Visit website
Job Details
Category infrastructure
Posted 8 months ago