United States Remote (Global) USD 155,070 – 172,300 / year

Sephora is hiring a Senior Engineer, SRE

Responsibilities

Ensure Platform Stability. Operate and support the Dotcom and OMNI platform (including BOPIS and Same-Day Delivery), ensuring high availability, resilience, and hyper-stable customer experiences during normal operations and peak traffic events.
Lead Incident Response. Triage, diagnose, and resolve L2/L3 production incidents; lead post-incident reviews and partner with engineering teams on permanent corrective actions to eliminate root causes.
Drive Intelligent Automation. Build automation solutions, reduce operational toil, and create AI-driven reliability tools and agentic workflows to improve mean time to resolution, productivity, and overall stability.
Enhance Observability. Develop and optimize observability through logs, metrics, traces, dashboards, and anomaly detection; refine alerting and telemetry pipelines to proactively identify and resolve issues.
Validate Release Readiness. Ensure world-class readiness for releases, seasonal events, feature launches, and traffic spikes through resiliency checks, performance validation, and comprehensive change reviews.
Maintain Reliability Standards. Maintain and optimize SLO/SLI frameworks; monitor error budgets and partner with application teams on continuous reliability improvements.

Requirements

Deep SRE Expertise. 6+ years of hands-on SRE, DevOps, or Production Engineering experience in high-scale digital applications, with a strong understanding of reliability principles and operational excellence.
Cloud-Native Technical Skills. Strong exposure to Azure AKS, Kubernetes, Docker, Service Mesh, and API-driven architectures, with operational support experience for React front-end and Spring Boot microservices in production environments.
Observability and Automation Mastery. Hands-on experience with observability tools (Dynatrace, Splunk, Grafana, Prometheus) and strong scripting abilities (Python, Bash, PowerShell, YAML) to build automation that reduces toil and improves incident response.
Incident Management Excellence. Proven experience in incident management, root cause analysis, and implementing permanent corrective actions that drive long-term reliability improvements.
CI/CD and Platform Knowledge. Experience with SRE principles, CI/CD pipelines (Jenkins, GitHub Actions), and cloud platforms (Azure required; AWS/GCP/OCI a plus).
Analytical Problem-Solver. Strong analytical and problem-solving abilities with clear communication skills under pressure, a collaborative mindset, and passion for reducing toil while improving developer and operator experiences.

Benefits

Caring Community. Thrive in a supportive, mentorship-driven environment from your leaders while also creating that same environment for your teams!
Fulfilling Path. We invest in you, not just your role, with opportunities to learn, innovate and lead.
Meaningful Work. Your work creates real impact. With every decision leading to thousands of shipments, you bring beauty to life for our clients.

Required Skills

Incident ManagementRoot Cause Analysis

About company

Sephora is a beauty retailer connecting deeply with others, celebrating diversity and inclusivity, and making a difference every day.

All jobs at Sephora Visit website

Job Details

Department Information Technology

Category infrastructure

Posted 4 months ago

Similar Jobs

Other opportunities you might be interested in

Senior Software Engineer [REMOTE]

Phiture

Austin Remote (Global)

Senior DevOps Engineer (m/w/d) im KI-Startup

Codefy GmbH

Heidelberg Hybrid

Senior SRE Engineer

Altium

Belgrade On-site

Senior Infrastructure Engineer

SentiLink

DevOps & Site Reliability Engineer

Oowlish Technology

Brasília Remote (Global)

Senior Site Reliability Engineer (Resilience) - Platform Resilience

Endgame Systems, LLC

Remote (Global)

Related Articles

Insights related to this role

Data center rack with network switches and fiber connections, illustrating automated network deployment using CI/CD and network configuration as code.

Network Configuration as Code: CI/CD for Automation | NVIDIA

4 min 2 months ago

A remote developer working in a well-lit, modern workspace, illustrating a productive environment enabled by a developer experience platform.

Developer Experience Platform: Lessons from Europe

5 min 2 months ago

Home office setup with laptop running cloud monitoring tools, symbolizing remote SRE jobs in financial services cloud transformation.

Remote SRE Jobs: Vanguard’s Cloud Transformation

4 min 2 months ago