Requirements
- 7+ years of relevant hands-on experience in software engineering, platform engineering, DevOps, MLOps, or related technical roles.
- 5+ years of experience with Docker and Kubernetes in production environments.
- 5+ years of experience supporting enterprise cloud infrastructure or applications in AWS, Azure, or similar environments.
- Strong experience provisioning, operating, and troubleshooting Kubernetes clusters in production.
- Experience building and maintaining machine learning platforms, infrastructure, or pipelines used by engineering or data science teams.
- Practical experience deploying machine learning workloads on Kubernetes.
- Experience managing clusters or workloads that use GPUs.
- Strong understanding of Helm and Kubernetes deployment patterns.
- Strong scripting or programming skills, preferably in Python.
- Experience with modern software engineering practices including Git, CI/CD, DevOps, and Agile/Scrum workflows.
- Strong troubleshooting, systems thinking, and communication skills.
- Ability to work independently and collaboratively in a fast-moving environment.
- Ability to obtain and maintain a Top Secret clearance.
- Ability to obtain Security+ certification within the first 90 days of employment.
Nice to Have
- Experience with ML model serving and inference platforms such as Triton Inference Server, KServe, Ray Serve, vLLM, or similar technologies.
- Experience with secure and compliant deployment practices in regulated or government environments.
- Experience with Kubernetes-based ML platforms such as Kubeflow.
- Familiarity with service mesh technologies such as Istio.
- Experience provisioning and debugging complex CI/CD systems.
- Experience with infrastructure as code tools such as Terraform.
- Familiarity with software supply chain security, container hardening, vulnerability management, and runtime scanning.
- Experience supporting ML systems across multiple deployment environments, including cloud, on-prem, and edge.
- Background working with machine learning engineers on model training, evaluation, packaging, and release workflows.
- Familiarity with storage and artifact systems used in ML platforms, such as S3-compatible object stores, registries, and metadata/catalog system.
Benefits
- Highly competitive salary.
- Fully covered healthcare, dental, and vision coverage.
- 401(k) and company match.
- Take as you need PTO + 11 paid holidays.
- Education & training benefits.
- Annual budget for your tech/gadgets needs.
- Monthly box of yummy snacks to eat while doing meaningful work.
- Remote, hybrid, and flexible work options.
- Team off-site in fun places!
- Generous Referral Bonuses.
Work Arrangement
Hybrid
Additional Information
- All of the programs we support require U.S. citizenship to be eligible for employment. All work must be conducted within the continental U.S.
- Ability to obtain and maintain a Top Secret clearance.
- May require up to 40% travel.


