Newsela is seeking a contract AI Operations Specialist to support its ML/AI systems and workflows. This role focuses on productionizing machine learning models, building and maintaining CI/CD pipelines, managing containerized services on AWS, and collaborating with ML engineers and data teams to scale AI services.
What You'll Do
- Design and maintain CI/CD pipelines for ML model training, packaging, and deployment across microservices
- Manage containerized services on AWS ECS, optimizing for cost, latency, and availability
- Automate infrastructure provisioning and service configuration with Terraform
- Work to maintain and scale services that make use of third party LLM providers
- Build and improve data pipelines that feed models from BigQuery, S3, and DynamoDB into training and inference workflows
- Instrument services with observability tooling (Datadog, OpenTelemetry, Langfuse) and establish SLOs for model-serving endpoints
- Collaborate with ML engineers to productionize new models using BentoML, FastAPI, and container-based serving
What We're Looking For
- 2-3 years in ML Ops supporting ML/AI features, systems and workflows
- 3-4 years prior experience in DevOps, CloudOps or SRE
- Strong proficiency in Python
- Hands-on experience with Docker containerization and container orchestration
- Solid understanding of CI/CD for ML workflows in an enterprise production environment
- Experience with Infrastructure as Code, preferably Terraform
- Familiarity with cloud platforms — specifically AWS (ECS, ECR, S3, DynamoDB, CloudWatch) and GCP (BigQuery, Vertex AI)
- Experience with LLM integration and observability (OpenAI API, Google GenAI, Langfuse tracing)
- Experience building and maintaining data pipelines for ML training and feature engineering
- Familiarity with ML modeling workflows — training, evaluation, experiment tracking (e.g. MLFlow, Weights & Biases), and model versioning
- Experience monitoring and flagging model drift over time
- Exposure to NLP/NLU models and frameworks such as Hugging Face Transformers, spaCy, or sentence-transformers
- Knowledge of vector databases (LanceDB, FAISS) and embedding-based retrieval systems
- Experience with scaling and maintaining deep learning frameworks (TensorFlow, PyTorch) in production settings
- Familiarity with classical ML libraries (scikit-learn, XGBoost, LightGBM) and model explainability tools (SHAP)
- Working knowledge of ML serving frameworks such as BentoML or similar
- Comfort working with FastAPI or similar async Python web frameworks
Technical Stack
Python, Docker, AWS ECS, Terraform, AWS, ECS, ECR, S3, DynamoDB, CloudWatch, GCP, BigQuery, Vertex AI, OpenAI API, Google GenAI, Langfuse, Datadog, OpenTelemetry, Langfuse tracing, BentoML, FastAPI, MLFlow, Weights & Biases, Hugging Face Transformers, spaCy, sentence-transformers, LanceDB, FAISS, TensorFlow, PyTorch, scikit-learn, XGBoost, LightGBM, SHAP
Team & Environment
ML/AI team working on classical Machine Learning and AI/Generative pipelines, collaborating with data and site reliability engineers
Benefits & Compensation
- This role will not be eligible to participate in company-sponsored benefits
Work Mode
Remote
Newsela is an equal opportunity employer, committed to diversity and inclusion in the workplace.
