Responsibilities
- Design and implement cloud-based infrastructure on AWS or Azure using Infrastructure as Code tools such as Terraform or Pulumi.
- Enhance system reliability, performance, and scalability to maintain high availability and low latency for essential IT services.
- Develop and manage CI/CD pipelines using platforms like GitHub Actions, supporting both hosted and self-hosted runners for specialized build needs.
- Ensure all new internal applications are built with security, logging, monitoring, and alerting capabilities enabled from the start.
- Develop internal AI-driven tools and automation scripts to improve developer productivity and operational efficiency.
- Support incident management by analyzing data, refining response workflows, and building dashboards to track service health.
- Take part in on-call rotations, leading fast resolution of production outages and technical issues.
- Lead post-incident reviews to determine root causes and implement long-term engineering fixes.
- Work closely with Security, Engineering, and Support teams to deliver measurable business impact.
Work Arrangement
Remote (Worldwide)
Compliance
If job responsibilities require access to export-controlled technology or source code, the employer may choose whether to apply for a U.S. government license. The employer may decline to proceed with a candidate based solely on this factor.