Texas, United States Hybrid Full-time

EX Squared is hiring a Senior Site Reliability Engineer (Remote)

About the Role

At Jobgether, we are hiring a Senior Site Reliability Engineer to play a key part in ensuring the performance, scalability, and resilience of our large-scale, mission-critical systems. You will work closely with cross-functional engineering teams to implement observability, monitoring, and profiling solutions, define reliability standards, and improve incident response processes.

What You'll Do

  • Build, maintain, and enhance monitoring, tracing, and profiling systems to ensure visibility into system health and performance.
  • Define, implement, and optimize SLIs, SLOs, and SLAs that accurately reflect user experience.
  • Partner with engineering teams to identify and resolve performance bottlenecks and reduce operational toil.
  • Lead incident response efforts, conduct post-incident reviews, and implement learnings to improve system resilience.
  • Collaborate on platform improvements that enhance developer productivity and system reliability.
  • Continuously evaluate and recommend new tools and process improvements to optimize operational efficiency.

What We're Looking For

  • Bachelor’s degree in Computer Science, Engineering, or equivalent experience.
  • 6+ years of experience in site reliability, DevOps, or software engineering roles.
  • Expertise with observability, monitoring, and alerting platforms (e.g., Prometheus, Grafana, Loki, OpenTelemetry).
  • Experience implementing tracing, logging, and profiling for distributed systems.
  • Strong knowledge of incident management, postmortem processes, and reliability metrics.
  • Familiarity with Linux, Kubernetes, Terraform, and cloud platforms (GCP preferred).
  • Proficiency in at least one scripting or backend language (e.g., Python, Go, Bash).
  • Excellent problem-solving, communication, and collaboration skills with a passion for continuous improvement.

Technical Stack

  • Monitoring & Observability: Prometheus, Grafana, Loki, OpenTelemetry
  • Infrastructure & Cloud: Linux, Kubernetes, Terraform, GCP
  • Languages: Python, Go, Bash

Team & Environment

You will work closely with cross-functional engineering teams.

Benefits & Compensation

  • Competitive salary and equity opportunities with significant upside.
  • Flexible schedules and generous time-off policies, including a sabbatical after five years of service.
  • Comprehensive health, dental, and vision coverage.
  • Supportive work environment promoting work-life balance and personal growth.
  • Access to modern office amenities, collaborative spaces, and team-building activities.

Work Mode

This role follows a hybrid remote model and is open to candidates located in Texas (USA).

Jobgether is an equal opportunity employer.

Required Skills
PrometheusGrafanaLokiOpenTelemetryKubernetesTerraformGCPLinuxPythonGoSite Reliability EngineeringMonitoringInfrastructure as CodeCloud Platforms
Invoicing holding you back?

Focus on work, not paperwork

Stop worrying about invoicing, taxes, and compliance. Glopay handles the business setup, you handle the client work. Get paid faster and look professional.

Auto-generated compliant invoices
Built-in expense management
Income reports for tax season
95% of earnings stay with you
Try Glopay free
No credit card needed
About company
EX Squared

Technology company focused on IT and software solutions

Visit website
Job Details
Category infrastructure
Posted 4 months ago