Responsibilities
- Manage the full problem management process from detection to final resolution.
- Coordinate efforts between application, infrastructure, and external vendor teams as needed.
- Follow established ITIL Problem Management practices and procedures.
- Conduct in-depth investigations into repeated or high-severity incidents, including software bugs, system outages, and interface failures.
- Apply systematic techniques like 5 Whys, Fishbone diagrams, or Fault Tree Analysis for root cause identification.
- Develop detailed root cause analysis reports with clear corrective action plans.
- Detect underlying system flaws and promote long-term solutions over temporary fixes.
- Analyze incident data to uncover trends and anticipate potential systemic risks.
- Initiate problem tickets proactively based on analytical findings.
- Deliver precise and timely updates to IT executives, business units, and operations staff.
- Lead post-incident review meetings and verify implementation of corrective measures.
- Translate complex technical outcomes into clear, non-technical terms for business audiences.
- Ensure all activities align with ITIL guidelines.
- Comply with internal service level and operational level agreements.
- Update and maintain knowledge base entries for future reference and team use.
- Assist in developing standardized procedures for incident response and resolution.
- Encourage ownership and high standards across teams through constructive feedback.
- Guide less experienced analysts and contribute to overall team capability development.