Design and deploy robust machine learning systems that support end-to-end model development and production deployment. You will create evaluation frameworks tailored for Large Language Models, with a focus on Retrieval-Augmented Generation architectures, ensuring accuracy and performance across diverse use cases.
What You'll Do
- Build scalable infrastructure for training, deploying, and monitoring ML models
- Develop evaluation pipelines for LLMs using tools like RAGAS or DeepEval
- Support the design and implementation of generative AI and agent-based systems
- Enhance observability of models in production with monitoring and logging solutions
- Collaborate with data scientists, engineers, and product teams to align technical solutions with business goals
- Optimize cloud-based workflows for efficiency, speed, and cost-effectiveness
- Apply rigorous testing, version control, and CI/CD practices to ML pipelines
Requirements
- 5+ years of hands-on experience in Python, with a focus on machine learning and backend systems
- Proven background in ML infrastructure, deployment workflows, and model lifecycle management
- Direct experience evaluating LLMs using frameworks such as RAGAS, DeepEval, or comparable tools
- Strong grasp of generative AI concepts, including LLMs and Agentic AI patterns
- Familiarity with RAG architectures and their practical implementation
- Proficiency with cloud platforms like AWS, GCP, or Azure
- Experience using monitoring and debugging tools for ML systems in production
- Ability to work autonomously in a distributed, remote environment
- Advanced English communication skills
Benefits
- Competitive compensation paid in USD
- Full remote flexibility within LATAM
- Access to ongoing learning and professional development resources
- Inclusive, multicultural team culture focused on innovation
- Exposure to global projects and emerging technologies in AI

