About the Role

The role involves ensuring system stability, automating operations, and enhancing observability across distributed services that process large-scale data.

Responsibilities

Design and maintain scalable infrastructure for data processing systems
Implement automated solutions to reduce manual operational tasks
Monitor system performance and respond to incidents efficiently
Collaborate with engineering teams to improve service resilience
Develop and enforce best practices for deployment and configuration management
Troubleshoot complex production issues across distributed environments
Optimize system reliability and reduce error rates through proactive measures
Support incident response and lead post-mortem analyses
Maintain comprehensive documentation for systems and procedures
Evaluate and integrate new tools for monitoring and observability
Ensure infrastructure meets security and compliance standards
Participate in on-call rotations with support for rapid resolution
Drive improvements in system uptime and mean time to recovery
Work closely with developers to refine service-level objectives
Contribute to capacity planning and performance testing
Implement infrastructure as code using modern tooling
Enhance alerting systems to reduce noise and improve response
Manage and scale containerized workloads and orchestration platforms
Support continuous integration and delivery pipelines
Promote a culture of reliability across engineering teams

Nice to Have

Master’s degree in a technical field
Experience with high-throughput data pipelines
Contributions to open-source infrastructure projects
Certifications in cloud or systems engineering
Exposure to observability platforms like Datadog or New Relic
Background in database administration or tuning
Familiarity with service mesh technologies
Experience in fast-growing startup environments

Compensation

Competitive salary and equity package

Work Arrangement

Remote-friendly with flexible hours

Team

Collaborative engineering team focused on data infrastructure and reliability

Why Join Us

Opportunity to shape the reliability culture of a growing data platform
Work with cutting-edge technologies at scale
Impactful role with direct influence on product stability and performance

Benefits

Health, dental, and vision insurance
401(k) plan with company match
Generous paid time off and parental leave
Professional development stipend
Home office setup allowance

Available for qualified candidates

People Data Labs is hiring a Senior Site Reliability Engineer (SRE)

About the Role

Responsibilities

Nice to Have

Compensation

Work Arrangement

Team

Why Join Us

Benefits

Similar Jobs

Implementation Engineer

Senior DevOps Engineer

Senior Engineer - Cloud Platforms

DevOPS Engineer

Cloud Platform Engineer

Senior Cloud Infrastructure Developer (Remote)

Related Articles

Platform Engineering: Kubernetes for All

Become an AI Developer: Your Career Guide

CI/CD Testing Tools: 23 Best Options for 2026