Design and manage scalable cloud infrastructure on GCP with Kubernetes to support high-performance machine learning workloads.
Develop automated workflows for training, evaluating, and releasing ML models using tools such as Jenkins, GitHub Actions, or Airflow.
Set up observability systems to detect model drift, accuracy changes, latency issues, and performance degradation in live environments.
Facilitate communication and coordination between data, machine learning, backend, and frontend engineering teams for seamless operations.
Establish monitoring solutions covering both system health metrics and ML-specific signals like feature drift and data distribution changes.
Enable individual engineering teams to monitor their own services through self-service tooling and platforms.
Take part in on-call duties and contribute to maintaining compliance with security standards such as SOC.

Point Wild is hiring a Principal Platform Engineer