What You'll Do
Design and maintain self-service infrastructure platforms that enable engineering teams to deploy quickly and securely. Use Terraform to build reusable, scalable modules that promote autonomy while enforcing security and compliance standards.
Collaborate with development teams to implement SRE principles, defining meaningful service level indicators and error budgets that align reliability with business goals. Focus on balancing rapid iteration with system stability.
Improve developer experience by building streamlined, GitOps-based workflows using GitHub Actions and ArgoCD. Enable safe deployment patterns such as canary and blue/green releases through automated pipelines.
Develop and maintain a unified observability platform using Elastic Cloud, Prometheus, and Grafana. Correlate metrics, logs, and traces to reduce noise and accelerate incident diagnosis.
Manage large-scale Kubernetes environments with Helm, optimizing for availability, efficiency, and security. Lead responses to critical incidents, conducting post-mortems and driving automation to prevent recurrence.
Requirements
- Proven experience operating production systems on AWS or comparable cloud providers
- Strong background in Kubernetes and Helm for container orchestration
- Track record of writing clean, maintainable Terraform code for infrastructure automation
- Experience building CI/CD systems with GitHub Actions and ArgoCD using GitOps
- Familiarity with observability tools including Prometheus, Grafana, and Elastic Cloud
- Programming experience in Python, Go, or Java for platform tooling development
- Proficiency in Linux, shell scripting, and distributed systems operations
- Operational knowledge of SQL and NoSQL databases in cloud environments
Benefits
- Work in an inclusive culture that values diversity, transparency, and empowerment
- Contribute to meaningful projects with real community impact
- Be part of a leading iGaming brand in Latin America, operating under Brazil’s regulated sports betting framework (Portaria 2.093/2024)
- Support innovation in a secure, compliant, and fast-growing environment


