Responsibilities
- Serve as the primary contact responsible for the overall application health, performance, and capacity
- Support services before they go live through activities such as system design consulting, capacity planning and launch reviews.
- Partner with the development and product team of a new application to establish the right monitoring and alerting strategy and create the framework to achieve zero downtime during deployment.
- Serve as the primary contact responsible for ensuring application scalability, performance, and resilience.
- Practice sustainable incident response and blameless post-mortems while taking a holistic approach to problem solving and optimizing time to recover.
- Automate data-driven alerts to proactively escalate issues. Work with development teams to establish SLOs and improve reliability.
- Tackle complex development, automation, and business process problems. Engage in and improve the whole lifecycle of services—from inception and design, through deployment, operation, and refinement.
- Support the application CI/CD pipeline for promoting software into higher environments through validation and operational gating, and lead Mastercard in DevOps automation and best practices.
- Increase automation and tooling to reduce toil and manual intervention
- Analyses ITSM activities of the platform and provide feedback loop to development teams on operational gaps or resiliency concerns
Work Arrangement
Remote (Worldwide)