Abundant is hiring a Senior Machine Learning Engineer to build and ensure the reliability of ML systems. This role focuses on system behavior, moving beyond model training to work across training, evaluation, and infrastructure, ensuring systems are robust and correct in real-world applications.
What You'll Do
- Design, debug, and maintain ML systems in realistic, tools-enabled environments.
- Work across training, evaluation, and infrastructure to ensure ML systems behave correctly and robustly.
- Diagnose ML system failures, including issues caused by data, evaluation artifacts, or underspecified requirements.
- Improve ML evaluation pipelines, focusing on metrics design, test coverage, and error analysis.
- Make pragmatic engineering tradeoffs under ambiguous requirements and incomplete specifications.
What We're Looking For
- 4+ years of professional experience in Machine Learning Engineering, Applied ML, ML-focused Software Engineering, or related roles.
- Strong proficiency in Python, with experience writing production-quality code and working with ML libraries.
- Experience training, evaluating, and iterating on ML models, with an emphasis on diagnosing failure modes.
- Strong understanding of ML evaluation: metrics design, test coverage, error analysis, and tradeoffs.
- Ability to debug complex ML system failures, including issues caused by data, evaluation artifacts, or underspecified requirements.
- Comfort working with incomplete specifications and multiple valid solutions, especially in open-ended or real-world tasks.
- Experience working with ML pipelines or systems, including training workflows, evaluation harnesses, or model-in-the-loop systems.
Nice to Have
- Experience building or maintaining ML training and evaluation pipelines.
- Familiarity with ML infrastructure concepts like reproducibility, experiment tracking, and model versioning.
- Experience working with tools-on environments such as programmatic evaluation, scripting, notebooks, or terminal-driven workflows.
- Exposure to LLM systems, including model evaluation, benchmarking, or prompt/agent behavior analysis.
- Experience reasoning about multiple valid implementations and tradeoffs in engineering solutions.
- Strong written communication skills for explaining system behavior, failures, and engineering decisions.
Technical Stack
- Python
- PyTorch
- TensorFlow
- scikit-learn
Team & Environment
You will contribute to applied AI research and collaborate with industry research labs.
Work Mode
This position supports a global work mode.




