United States Hybrid Full-time

EX Squared is hiring a Senior Site Reliability Engineer (Remote - US)

About the Role

Jobgether is hiring a Senior Site Reliability Engineer to design, build, and maintain highly available, secure, and scalable systems for our production and machine learning environments. You will collaborate closely with software engineers, data scientists, and platform architects to ensure system reliability and performance.

What You'll Do

  • Design, implement, and maintain cloud-native infrastructure on Kubernetes (EKS/GKE/AKS) for production systems.
  • Architect and manage microservice deployments, ensuring reliable CI/CD pipelines and service performance.
  • Collaborate with ML and Data teams to design, optimize, and monitor ML/AI workflows using tools like Databricks, Spark, Flyte, or Airflow.
  • Establish and enforce SLOs/SLIs, conduct incident postmortems, and enhance system reliability and developer velocity.
  • Lead improvements in architecture focusing on scalability, fault tolerance, performance, and cost optimization.
  • Support secure infrastructure practices, including IAM, secret management, policy-as-code, and compliance controls.
  • Mentor junior engineers and contribute to best practices across observability, infrastructure-as-code, and production readiness.

What We're Looking For

  • Bachelor’s degree in a related field or equivalent work experience.
  • 8+ years of experience in software, systems, or DevOps engineering.
  • Strong expertise in Kubernetes deployment, scaling, networking, monitoring, and debugging.
  • Proficiency in Golang and Python.
  • Solid understanding of distributed systems, cloud architecture, and container orchestration.
  • Experience building and maintaining microservice-based architectures in production.
  • Familiarity with CI/CD pipelines (GitLab CI, ArgoCD, Flux, or similar).
  • Deep experience with monitoring/observability tools (Datadog, Prometheus, Grafana, OpenTelemetry).

Nice to Have

  • Experience designing or operating ML workflows and data pipelines.
  • Background in system design or infrastructure architecture.
  • Exposure to multi-cloud environments (AWS, GCP, Azure).
  • Knowledge of security, compliance, and automation in production-grade systems.
  • Contributions to open-source projects or internal platform tooling, or experience leading SRE transformations.
  • Familiarity with service meshes (Istio, Linkerd) and API gateways (Kong, Envoy).

Technical Stack

  • Kubernetes (EKS/GKE/AKS), Golang, Python
  • Databricks, Spark, Flyte, Airflow
  • GitLab CI, ArgoCD, Flux
  • Datadog, Prometheus, Grafana, OpenTelemetry
  • AWS, GCP, Azure
  • Istio, Linkerd, Kong, Envoy

Team & Environment

You will collaborate closely with software engineers, data scientists, and platform architects.

Benefits & Compensation

  • Competitive salary range: $138,000–$213,000.
  • Performance-based bonuses and stock options.
  • Unlimited paid time off.
  • Health, dental, and vision coverage.
  • Remote or hybrid work flexibility.
  • Opportunities for professional growth and impact in a collaborative environment.

Work Mode

This is a remote position open to candidates in the United States, with hybrid work flexibility.

Required Skills
KubernetesGolangPythonDatabricksSparkFlyteAirflowGitLab CIArgoCDFluxAWSGCPAzureTerraformCI/CD
Scaling your freelance income?

Invoice multiple clients effortlessly

Managing 3+ international clients? Glopay streamlines everything. One EU company, unlimited invoices, automatic compliance. You just send and get paid.

Unlimited clients & invoices
Multi-currency support
Automated tax compliance
Client portal for easy payments
Scale with Glopay
Trusted by 10,000+ freelancers
About company
EX Squared

Technology company focused on IT and software solutions

Visit website
Job Details
Category infrastructure
Posted 4 months ago