Responsibilities
- Monitor applications continuously to maintain high availability and peak performance levels.
- Take an active role in managing incidents, including leading Situation Rooms for high-severity events, conducting root cause analyses, and resolving problems efficiently.
- Ensure all production activities adhere to ITIL frameworks and meet defined service level agreements.
- Carry out change implementations and software deployments following ITIL and DevOps methodologies.
- Identify technical issues proactively and apply corrective measures to support uninterrupted business functions.
- Participate in rotating on-call schedules to provide round-the-clock support for essential systems.
- Serve as the main point of contact for development teams, diagnosing issues and coordinating timely resolutions.
- Work closely with Scrum teams to support system design, deployment, and ongoing improvements.
- Apply system upgrades, security patches, and new features while reducing disruption to end users.
- Keep technical documentation current, including operational procedures, configuration details, and troubleshooting instructions.
- Exchange knowledge and best practices with global support units to enhance overall team effectiveness.
- Deploy and fine-tune monitoring solutions such as Dynatrace within live environments.
- Partner with development groups and Centers of Expertise to establish effective observability and monitoring strategies.
- Advocate for observability practices to enable early identification and resolution of system issues.