The Site Reliability Engineer Intern will play a key role in enhancing system reliability and observability across distributed cloud environments. This position involves working closely with engineering teams to implement monitoring solutions, analyze system performance, and automate responses to incidents. The intern will gain exposure to cutting-edge observability platforms like Dynatrace and leverage AI-driven insights to proactively detect and resolve issues. Emphasis is placed on building scalable, resilient systems through automation, data analysis, and continuous improvement. This role is ideal for someone passionate about cloud infrastructure, operational excellence, and learning in a dynamic, collaborative environment.

Responsibilities

Assist in deploying and configuring Dynatrace monitoring components using automated methods
Define and implement user-focused metrics and service-level objectives in the observability platform
Use AI-powered tools to detect anomalies, identify root causes, and reduce incident resolution time
Develop real-time dashboards that provide actionable insights into application and cloud system health
Write scripts in Python or Bash to automate responses and enable self-healing from monitoring alerts

Requirements

Currently enrolled as a rising junior, senior, or in a master's program
Demonstrates strong communication and collaboration skills in fast-paced operational settings
Applies systems thinking to understand interactions between applications, networks, and databases
Shows commitment to reliability, scalability, and managing error budgets
Has experience with scripting languages such as Python, Go, or PowerShell
Understands cloud fundamentals including containerization and microservices architecture
Can interpret telemetry data—metrics, logs, traces—to assess and communicate system health

Tech Stack

Dynatrace, OneAgent, ActiveGates, Davis® AI, Copilot, Claude, Python, Bash, Docker, Kubernetes, microservices

Benefits

Engage in meaningful projects with measurable impact on system reliability
Collaborate across engineering and operations teams
Attend learning sessions, workshops, and networking events for interns
Participate in executive-led career development programs

Compensation

$30-34/hour based on location

Work Arrangement

hybrid — Primarily project-based and can be remote depending on location

Team

Collaborates with Platform, Application Development, and Security teams in a cross-functional structure

Conducts blameless post-incident reviews
Prioritizes automation to reduce repetitive manual work
Emphasizes comprehensive measurement and data-driven decisions
Supports high-impact learning and professional growth
Fosters collaboration across technical teams

Additional Information

Full-time, 10-week temporary internship position
Not eligible for company benefits
Employer encourages applications from women, minorities, veterans, and individuals with disabilities
Application process does not include relocation assistance or visa sponsorship

Not available

AWP Safety is hiring a Site Reliability Engineer Intern

Responsibilities

Requirements

Tech Stack

Benefits

Compensation

Work Arrangement

Team

Additional Information

Similar Jobs

Senior Site Reliability Engineer

Senior Engineer - Cloud Platforms

DevOPS Engineer

Cloud Platform Engineer

Senior Infrastructure Engineer

DevOps Azure Senior MS055SG

Related Articles

Platform Engineering: Kubernetes for All

Developer Experience Platform: Lessons from Europe

Kubernetes Remote Jobs: AI & Cloud-Native Careers in 2026