Omaha, Nebraska, United States On-site Employment

DMSi is hiring a Site Reliability Engineer

About the Role

DMSi is hiring a Site Reliability Engineer to review, optimize, and complete the monitoring and alerting systems for our applications. Your work will transform raw data into actionable intelligence, improve system observability, and enhance the overall user experience.

What You'll Do

  • Evaluate existing monitoring systems and implement improvements to ensure comprehensive observability across all systems and environments.
  • Develop and maintain dashboards and reports that provide real-time visibility into system health, capacity/utilization trends, and performance.
  • Ensure the overall system environment operates nominally by monitoring critical performance indicators to maintain a smooth user experience.
  • Review and refine alerting mechanisms to minimize false positives and ensure timely and accurate notifications for critical issues.
  • Develop escalation processes and response playbooks to streamline incident management.
  • Analyze monitoring data to identify trends, anomalies, and potential areas of improvement.
  • Provide actionable insights to relevant teams and drive data-driven decision-making leveraging machine learning and normal versus abnormal system behaviors.
  • Work closely with software engineers, DevOps teams, and other stakeholders to ensure monitoring and alerting systems are aligned with business goals.
  • Develop and maintain automation scripts and tools to streamline monitoring and alerting processes.
  • Document monitoring and alerting systems, processes, and best practices. Provide training and guidance to teams.
  • Continuously assess and improve monitoring and alerting strategies to adapt to changing technologies and business needs.

What We're Looking For

  • Bachelor's degree in Computer Science, Engineering, a related field, or equivalent experience.
  • Minimum of 3 years of experience in a Site Reliability Engineering or similar role, with a focus on monitoring and alerting in a SaaS environment.
  • Strong experience with monitoring and observability tools (e.g., Nagios, Prometheus, Grafana, ELK Stack, Datadog, New Relic).
  • Proficiency in scripting languages (e.g., Python, Bash, PowerShell) for automation.
  • Familiarity with cloud platforms (AWS, Azure, GCP) and hybrid cloud environments.
  • Understanding of infrastructure-as-code tools (e.g., Terraform, Ansible).
  • Knowledge of CI/CD pipelines and version control systems (e.g., Git, Jenkins).
  • Basic understanding of networking, security, and system administration.

Technical Stack

  • Monitoring/Observability: Nagios, Prometheus, Grafana, ELK Stack, Datadog, New Relic
  • Scripting/Automation: Python, Bash, PowerShell
  • Cloud Platforms: AWS, Azure, GCP
  • Infrastructure-as-Code: Terraform, Ansible
  • CI/CD & Version Control: Git, Jenkins

Team & Environment

You will work closely with development, operations, and product teams.

Work Mode

This position is onsite.

Required Skills
NagiosPrometheusGrafanaELK StackDatadogNew RelicPythonBashPowerShellAWSAzureGCPSaaSMonitoringAutomation
Landing international contracts?

Invoice globally with an EU company

GloPay creates an Estonian partnership for you automatically. Your clients get proper invoices, you keep 95% of payments. Setup takes 5 minutes, works in 100+ currencies.

EU-registered company for compliance
Multi-currency invoicing & payments
Expense tracking & tax reports
Money in your bank in 1 business day
Start invoicing free
5% per invoice • No subscriptions
About company
Job Details
Department Engineering
Category infrastructure
Posted 14 days ago