Responsibilities
- Design and maintain scalable infrastructure supporting the full machine learning lifecycle, including training, fine-tuning, and deployment of traditional models and large language models.
- Implement CI/CD pipelines, container orchestration, and automated workflows to enable seamless deployment of ML and generative AI workloads.
- Establish standardized practices for model reproducibility, version control, packaging, and deployment across hybrid environments.
- Manage GPU clusters, inference servers, and vector databases; optimize performance metrics such as throughput, latency, and token usage for generative AI applications.
- Build monitoring systems for ML and generative AI to track model drift, token consumption, hallucinations, bias, and GPU utilization, ensuring system reliability, security, and compliance.
- Collaborate with data scientists, AI engineers, and IT teams to develop user-friendly, robust, and scalable platform solutions.
- Evaluate and integrate emerging tools and frameworks in MLOps and generative AI, including MLflow, Ray, vLLM, Hugging Face TGI, Triton, and orchestration agents.