Responsibilities
- Design and deploy systems that ensure high availability, fault tolerance, and resilience across infrastructure and applications.
- Lead responses during outages, conduct root cause analysis, and establish safeguards to prevent recurrence.
- Analyze system performance, detect bottlenecks, and implement optimizations to improve efficiency.
- Advance automation by building and maintaining tools, scripts, and frameworks for deployment, monitoring, and diagnostics.
- Produce recurring reports on system uptime, reliability, and performance to inform stakeholders.
- Work with teams across functions to define KPIs and build reporting systems for tracking operational health.
- Create detailed summaries of incidents, resolutions, and improvement suggestions for executive review.
- Communicate trends, insights, and recommendations to leadership to support strategic decisions.
Work Arrangement
Remote (Country)
Other
- Proficiency in English at an advanced level
- Flexible working hours
- Home office setup supported
- Company operates with a remote-first mindset
- Eligible to work from any city in Brazil
- Provision of Apple equipment (MacBook Pro, iPhone), with purchase option per internal guidelines
- Employee referral program offering rewards