The Site Reliability Engineer Intern will play a key role in enhancing system reliability and observability across distributed cloud environments. This position involves working closely with engineering teams to implement monitoring solutions, analyze system performance, and automate responses to incidents. The intern will gain exposure to cutting-edge observability platforms like Dynatrace and leverage AI-driven insights to proactively detect and resolve issues. Emphasis is placed on building scalable, resilient systems through automation, data analysis, and continuous improvement. This role is ideal for someone passionate about cloud infrastructure, operational excellence, and learning in a dynamic, collaborative environment.
Responsibilities
- Assist in deploying and configuring Dynatrace monitoring components using automated methods
- Define and implement user-focused metrics and service-level objectives in the observability platform
- Use AI-powered tools to detect anomalies, identify root causes, and reduce incident resolution time
- Develop real-time dashboards that provide actionable insights into application and cloud system health
- Write scripts in Python or Bash to automate responses and enable self-healing from monitoring alerts
Requirements
- Currently enrolled as a rising junior, senior, or in a master's program
- Demonstrates strong communication and collaboration skills in fast-paced operational settings
- Applies systems thinking to understand interactions between applications, networks, and databases
- Shows commitment to reliability, scalability, and managing error budgets
- Has experience with scripting languages such as Python, Go, or PowerShell
- Understands cloud fundamentals including containerization and microservices architecture
- Can interpret telemetry data—metrics, logs, traces—to assess and communicate system health
Tech Stack
Dynatrace, OneAgent, ActiveGates, Davis® AI, Copilot, Claude, Python, Bash, Docker, Kubernetes, microservices
Benefits
- Engage in meaningful projects with measurable impact on system reliability
- Collaborate across engineering and operations teams
- Attend learning sessions, workshops, and networking events for interns
- Participate in executive-led career development programs
Compensation
$30-34/hour based on location
Work Arrangement
hybrid — Primarily project-based and can be remote depending on location
Team
Collaborates with Platform, Application Development, and Security teams in a cross-functional structure
- Conducts blameless post-incident reviews
- Prioritizes automation to reduce repetitive manual work
- Emphasizes comprehensive measurement and data-driven decisions
- Supports high-impact learning and professional growth
- Fosters collaboration across technical teams
Additional Information
- Full-time, 10-week temporary internship position
- Not eligible for company benefits
- Employer encourages applications from women, minorities, veterans, and individuals with disabilities
- Application process does not include relocation assistance or visa sponsorship
Not available


