Alternative Payments is hiring a Senior Site Reliability Engineer (SRE) to drive operational excellence, enhance infrastructure reliability, and establish modern observability across critical systems. This hands-on role involves shaping our reliability strategy, implementing robust SRE practices, and tackling significant platform challenges.
What You'll Do
- Lead and execute key reliability initiatives from planning to delivery, focusing on monitoring, alerting, and incident response for the Firefighters team.
- Configure comprehensive alerting systems, including queue monitoring and service health checks.
- Build performance dashboards, implement load testing, and create capacity metrics for presentations.
- Implement end-to-end traceability with distributed tracing and service profiling.
- Work on pipeline improvements, including moving to strength-based pipelines.
- Continue the migration back to Datadog and optimize our monitoring stack.
- Collaborate with cross-functional teams to deliver scalable solutions, optimize processes, and implement highly reliable systems.
- Take ownership of complex SRE tasks, including configuring monitoring systems, defining and enforcing SLIs, SLOs, and SLAs.
- Propose improvements and help establish best practices, workflows, and standards for incident response, blameless post-mortems, and continuous improvement.
What We're Looking For
- 7-10+ years of experience in Site Reliability Engineering, DevOps, or a similar role focused on large-scale distributed systems.
- Strong skills in Kubernetes for container orchestration and cluster management.
- Extensive experience with AWS as a core cloud platform for infrastructure management.
- Critical proficiency with Datadog for monitoring, logging, tracing, and alerting.
- Proven experience in designing, implementing, and optimizing CI/CD pipelines, ideally with GitHub Pipelines.
- Strong understanding and practical application of SRE principles: SLI/SLO/SLA definition, error budget management, incident response, post-mortem analysis, and toil reduction.
- A proactive mindset with the ability to solve complex problems, drive projects independently, and continuously innovate reliability practices.
- Strong communication skills, especially in English, to collaborate effectively across technical teams and stakeholders.
Nice to Have
- Experience in FinTech, payments, startup, or scale-up environments.
- AWS certification, demonstrating commitment to learning and mastering cloud technologies.
- Experience with SOC2 compliance, CI security validations, and other infrastructure security aspects.
- Demonstrated knowledge or experience with Infrastructure as Code tools, particularly Terraform.
- Familiarity with other monitoring tools like Grafana.
- Comfort working in fast-paced, dynamic, and high-impact environments.
Technical Stack
- Kubernetes
- AWS
- Datadog
- GitHub Pipelines
- Terraform
- Grafana
Team & Environment
You will be part of a cross-team DevOps structure within the EPD Team.
Benefits & Compensation
- Competitive salary tailored to experience, skills, and expertise: $72,000 - $90,000 USD/year.
- Equity opportunities.
- Unlimited PTO and flexibility.
- Referral bonus.
- Yearly learning & development stipend.
Work Mode
This is a local-country position located in Brazil.
Alternative Payments is an equal opportunity employer.




