Responsibilities

Assess current monitoring setups and introduce enhancements for full system observability across all platforms and environments.
Build and manage dashboards and reporting tools that deliver real-time insights into system performance, capacity, and resource usage.
Maintain stable system operations by tracking key performance metrics and ensuring optimal functionality.
Deliver clear visibility into system conditions to support consistent and high-quality user experiences.
Optimize alerting configurations to reduce false alarms and ensure precise, timely notifications for urgent issues.
Create structured escalation paths and incident response procedures to improve resolution efficiency.
Examine monitoring outputs to detect patterns, irregularities, and opportunities for system improvements.
Generate practical recommendations for teams using data analysis, including machine learning techniques to distinguish normal from abnormal behaviors.
Collaborate with software developers, DevOps personnel, and other stakeholders to align monitoring practices with technical and business objectives.
Design and support automation scripts and utilities that simplify monitoring workflows and reduce manual intervention.
Maintain thorough documentation of monitoring frameworks, alerting protocols, and recommended practices.
Offer training and support to help teams effectively use monitoring tools and understand performance data.
Regularly evaluate and update monitoring strategies to keep pace with evolving technologies and organizational needs.
Keep current with advancements and emerging solutions in the field of system observability.

Work Arrangement

On-site

Other

Work takes place in a standard office setting with regular use of computers and phones; no significant physical requirements are involved.
Occasional travel is required, which may include commercial flights and rental vehicles for business purposes.

DMSi is hiring a Site Reliability Engineer

Responsibilities

Work Arrangement

Other

Similar Jobs

Senior Site Reliability Engineer

DevOPS Engineer

Senior Site Reliability Engineer - Ireland

DevOps Engineer

Contact Center Production Control Engineer (Amazon Connect preferable)

Enterprise Architect

Related Articles

AI Boom Job Impact: Tech Decline vs. Service Growth in SF

Remote Tech Job Risks 2026: Automation, Loyalty, and Pay

Tech Layoffs AI Efficiency: Block Cuts 40% Workforce