United States Remote (Country) Full-time

EX Squared is hiring a Senior Engineer, Production Operations (Remote - US)

About the Role

Jobgether is hiring a Senior Engineer, Production Operations to shape and maintain highly reliable, scalable, and performant cloud systems supporting mission-critical services in our fast-growing fintech environment. This remote role focuses on driving operational excellence through automation, Infrastructure as Code, and robust monitoring while collaborating closely with development and security teams.

What You'll Do

  • Design, implement, and maintain core cloud infrastructure and Site Reliability Engineering practices to ensure high availability and performance.
  • Develop and optimize cloud infrastructure using Infrastructure as Code tools, primarily Terraform, and automation platforms.
  • Collaborate with development and security teams to integrate SRE principles into the software development lifecycle.
  • Design and manage monitoring, logging, and alerting solutions to provide clear visibility into system health.
  • Participate in incident response, conduct root cause analyses, and contribute to blameless postmortems.
  • Identify and implement architectural improvements to enhance system reliability, resilience, and efficiency.
  • Automate operational tasks and processes to reduce toil and improve productivity.
  • Research, evaluate, and advocate for new tools or technologies to improve operational posture.
  • Enhance engineering tooling, processes, and standards for consistent and repeatable application delivery.

What We're Looking For

  • 5+ years of experience in Site Reliability Engineering, Production Operations, or similar roles focused on cloud infrastructure and distributed systems.
  • Proven experience architecting and maintaining highly available, secure, and scalable systems in a public cloud environment (AWS preferred).
  • Strong proficiency with Infrastructure as Code tools, particularly Terraform.
  • Experience automating operational tasks using scripting languages (Python, Go, Bash) and automation platforms.
  • Expertise in monitoring, logging, and alerting solutions (Datadog, Prometheus, Grafana, ELK stack).
  • Solid understanding of incident response best practices and troubleshooting complex production issues.
  • Knowledge of distributed systems, microservices architectures, and containerization technologies (Docker, Kubernetes/EKS).
  • Exceptional analytical, problem-solving, and collaboration skills, with the ability to communicate technical concepts effectively to technical and non-technical stakeholders.
  • Passion for improving system reliability, performance, and operational efficiency.

Nice to Have

  • Experience with payments infrastructure or high-volume transactional systems.
  • Familiarity with database technologies (PostgreSQL, Cassandra, DynamoDB).
  • Experience with CI/CD pipelines and automation of software delivery.

Technical Stack

  • Infrastructure as Code: Terraform
  • Cloud: AWS
  • Languages/Scripting: Python, Go, Bash
  • Monitoring/Observability: Datadog, Prometheus, Grafana, ELK stack
  • Containers/Orchestration: Docker, Kubernetes/EKS
  • Databases: PostgreSQL, Cassandra, DynamoDB

Benefits & Compensation

  • Competitive salary with market-based adjustments depending on location and experience.
  • Discretionary performance bonus and equity rewards.
  • Medical, dental, vision coverage, and HSA match.
  • Paid life insurance, AD&D, and disability benefits.
  • Traditional 401(k) plan with company match.
  • Unlimited PTO and paid company holidays, including pop-up bonus holidays.
  • Professional development stipends and mental health resources.
  • Fertility healthcare support and 100% paid parental and caregiving leave with additional home support services.
  • Flexible work arrangements, remote or in-office opportunities.
  • Fully stocked office kitchen, catered lunches, and occasional in-office events.
  • Employee resource groups promoting inclusion and collaboration.

Work Mode

This is a remote position open to candidates located within the United States.

Jobgether is an equal opportunity employer.

Required Skills
TerraformAWSPythonGoBashDatadogPrometheusGrafanaELK stackDockerCI/CDInfrastructure as CodeMonitoringScriptingCloud Infrastructure
Ready to relocate and code from paradise?

Thailand or Vietnam — your office, your rules

Iglu offers relocation to Bangkok, Chiang Mai, Ho Chi Minh City, or Hong Kong. Full employment, legal setup, and a community of 200+ digital professionals.

Relocation to 5 countries
Full legal work setup
Developer community access
Work-life balance culture
Explore locations
Relocation support included
About company
EX Squared

Technology company focused on IT and software solutions

Visit website
Job Details
Category infrastructure
Posted 7 months ago