United States Hybrid Full-time

EX Squared is hiring a Senior Site Reliability Engineer (Remote - US)

About the Role

Jobgether is hiring a Senior Site Reliability Engineer to design, build, and maintain highly available, secure, and scalable systems for our production and machine learning environments. You will collaborate closely with software engineers, data scientists, and platform architects to ensure system reliability and performance.

What You'll Do

  • Design, implement, and maintain cloud-native infrastructure on Kubernetes (EKS/GKE/AKS) for production systems.
  • Architect and manage microservice deployments, ensuring reliable CI/CD pipelines and service performance.
  • Collaborate with ML and Data teams to design, optimize, and monitor ML/AI workflows using tools like Databricks, Spark, Flyte, or Airflow.
  • Establish and enforce SLOs/SLIs, conduct incident postmortems, and enhance system reliability and developer velocity.
  • Lead improvements in architecture focusing on scalability, fault tolerance, performance, and cost optimization.
  • Support secure infrastructure practices, including IAM, secret management, policy-as-code, and compliance controls.
  • Mentor junior engineers and contribute to best practices across observability, infrastructure-as-code, and production readiness.

What We're Looking For

  • Bachelor’s degree in a related field or equivalent work experience.
  • 8+ years of experience in software, systems, or DevOps engineering.
  • Strong expertise in Kubernetes deployment, scaling, networking, monitoring, and debugging.
  • Proficiency in Golang and Python.
  • Solid understanding of distributed systems, cloud architecture, and container orchestration.
  • Experience building and maintaining microservice-based architectures in production.
  • Familiarity with CI/CD pipelines (GitLab CI, ArgoCD, Flux, or similar).
  • Deep experience with monitoring/observability tools (Datadog, Prometheus, Grafana, OpenTelemetry).

Nice to Have

  • Experience designing or operating ML workflows and data pipelines.
  • Background in system design or infrastructure architecture.
  • Exposure to multi-cloud environments (AWS, GCP, Azure).
  • Knowledge of security, compliance, and automation in production-grade systems.
  • Contributions to open-source projects or internal platform tooling, or experience leading SRE transformations.
  • Familiarity with service meshes (Istio, Linkerd) and API gateways (Kong, Envoy).

Technical Stack

  • Kubernetes (EKS/GKE/AKS), Golang, Python
  • Databricks, Spark, Flyte, Airflow
  • GitLab CI, ArgoCD, Flux
  • Datadog, Prometheus, Grafana, OpenTelemetry
  • AWS, GCP, Azure
  • Istio, Linkerd, Kong, Envoy

Team & Environment

You will collaborate closely with software engineers, data scientists, and platform architects.

Benefits & Compensation

  • Competitive salary range: $138,000–$213,000.
  • Performance-based bonuses and stock options.
  • Unlimited paid time off.
  • Health, dental, and vision coverage.
  • Remote or hybrid work flexibility.
  • Opportunities for professional growth and impact in a collaborative environment.

Work Mode

This is a remote position open to candidates in the United States, with hybrid work flexibility.

Required Skills
KubernetesGolangPythonDatabricksSparkFlyteAirflowGitLab CIArgoCDFluxAWSGCPAzureTerraformCI/CD
Your first international client?

Don't lose them over invoicing

Clients ghost freelancers with unprofessional invoicing. Glopay gives you a real EU company partnership so they take you seriously from invoice #1.

Instant EU company partnership
Invoice builder with your branding
Automated payment reminders
Real-time payment tracking
Get EU company now
Ready in 24 hours
About company
EX Squared

Technology company focused on IT and software solutions

Visit website
Job Details
Category infrastructure
Posted 4 months ago