Cincinnati or New York

Luma Financial Technologies is hiring a Site Reliability Engineer

The Site Reliability Engineer plays a critical role in ensuring the reliability, scalability, and performance of our production systems. This position bridges the gap between development and operations by applying engineering principles to infrastructure and operations problems. You will be responsible for building robust monitoring systems, automating operational tasks, managing incident response, and driving improvements in system design to prevent future outages. The ideal candidate thrives in a fast-paced environment, values automation over manual intervention, and is passionate about creating systems that are observable, resilient, and self-healing. You will work cross-functionally with engineering teams to embed reliability practices into the development lifecycle and ensure services meet strict uptime and performance standards.

Responsibilities

  • Work closely with engineering teams to develop and support the infrastructure powering their services.
  • Maintain high availability, security, and scalability of Kubernetes clusters running on AWS EKS.
  • Build and implement resilience strategies including multi-region deployment, backup systems, and disaster recovery processes.
  • Automate infrastructure provisioning and management using Terraform and Infrastructure-as-Code practices.
  • Enhance CI/CD workflows to enable faster, safer, and more reliable software deployments.
  • Use observability tools to monitor system performance, detect issues, and ensure service reliability.
  • Participate in on-call rotations and lead incident response efforts, prioritizing root-cause resolution and prevention.

Requirements

  • Minimum of five years of professional experience in site reliability engineering or software development.
  • Proficient in programming languages including Java, Python, Bash, and Go for automation and problem-solving.
  • Extensive hands-on experience with AWS services such as RDS, CloudFront, IAM, and VPCs, along with Terraform and Kubernetes.
  • Demonstrated ability to design and operate resilient systems that perform reliably under failure conditions.
  • Experience optimizing and managing CI/CD pipelines using tools like CircleCI or GitHub Actions.
  • Proven incident management skills with the ability to analyze root causes and remain effective under pressure.
  • Strong collaborator with clear communication skills and a commitment to continuous improvement.

Nice to Have

  • Bachelor’s degree in Computer Science, Software Engineering, or a related field.

Tech Stack

AWS, EKS, Kubernetes, Terraform, CI/CD, CircleCI, GitHub Actions, Java, Python, Bash, Go

Team

SRE team collaborating with product engineering teams

  • Driven by solving complex technical challenges
  • Prioritizes automation across systems and workflows
  • Committed to building resilient and dependable infrastructure
  • Values collaboration and ongoing improvement

Additional Information

  • Incident response and root cause analysis experience is required.
  • Strong focus on automation to minimize manual intervention and prevent errors.
  • Emphasis on implementing sustainable, long-term fixes during incident resolution.
Required Skills
AWSEKSKubernetesTerraformCI/CDGitHub ActionsJavaPythonBashGo
About company
Luma Financial Technologies
Luma builds financial technology platforms focused on reliability, security, and speed.
All jobs at Luma Financial Technologies Visit website
Job Details
Department Engineering
Category infrastructure
Posted 3 months ago