UTC-5 (EST) to UTC+2, with preference for UTC-3 Remote (Global)

Axiom is hiring a Site Reliability Engineer

About the Role

The role involves maintaining high availability and performance of production systems by combining software engineering and operational expertise to build resilient infrastructure.

Responsibilities

  • Monitor system performance and respond to incidents
  • Design and implement automated deployment pipelines
  • Troubleshoot and resolve infrastructure issues
  • Collaborate with development teams to improve code deployability
  • Maintain system security and compliance standards
  • Optimize resource utilization and system efficiency
  • Develop tools for monitoring and alerting
  • Participate in on-call rotations
  • Ensure disaster recovery procedures are tested and effective
  • Improve incident response workflows
  • Manage configuration and version control for infrastructure
  • Support capacity planning initiatives
  • Enforce observability best practices across services
  • Contribute to post-mortem analyses after outages
  • Drive adoption of reliability best practices
  • Scale systems to meet growing demand
  • Reduce technical debt in operational systems
  • Implement self-healing mechanisms in production environments
  • Work with distributed systems and cloud platforms
  • Ensure service level objectives are met

Nice to Have

  • Master's degree in a technical field
  • Experience with large-scale distributed systems
  • Contributions to open-source projects
  • Certifications in cloud or systems engineering
  • Prior work in high-availability environments

Compensation

Competitive salary and benefits package

Work Arrangement

Hybrid work model with flexibility for remote work

Team

Collaborative engineering team focused on system reliability and performance

Technology Stack

  • Uses modern cloud-native technologies including Kubernetes, Prometheus, and Terraform
  • Leverages managed services on Google Cloud Platform

Growth Opportunities

  • Engineers are encouraged to lead initiatives and mentor others
  • Opportunities for advancement in technical and leadership tracks

Available for qualified candidates

Required Skills
AWSKubernetesTerraformDockerLinuxGitHub ActionsGitLabInfrastructure as Code
About company
Axiom
A remote-first, globally distributed team building a cloud native, serverless data analytics platform that enables developers to get fast insights into their data with unlimited, cost-effective storage and lightning-fast querying.
All jobs at Axiom Visit website
Job Details
Category infrastructure
Posted 7 months ago