Mexico Employment

Photon Group is hiring a Site Reliability Engineer

About the Role

Photon Group is hiring a Site Reliability Engineer to ensure the availability, reliability, scalability, and performance of our most critical, customer-facing eCommerce microservices. You will apply Google-inspired SRE principles to balance feature velocity and system reliability using Service Level Objectives, Service Level Indicators, and error budgets.

What You'll Do

  • Define, implement, and own SLIs, SLOs, and error budgets for critical microservices in collaboration with product and engineering teams.
  • Use error budgets to influence release decisions, prioritize reliability work, and manage operational risk.
  • Design and maintain observability platforms including metrics, logs, traces, and real-time telemetry.
  • Track, manage, and reduce operational toil by converting repetitive work into actionable Jira stories and epics.
  • Design, implement, and validate resiliency mechanisms such as graceful degradation, redundancy, automated failover, and disaster recovery.
  • Lead incident response, act as an escalation point for high-severity incidents, and drive blameless postmortems.
  • Capture incident action items and reliability improvements in Jira, ensuring closure, accountability, and continuous improvement.
  • Partner with scrum teams to improve reliability through release readiness reviews, production change validation, and testing strategies.
  • Perform deep root cause analysis, debugging, and performance tuning across distributed systems.
  • Promote shift-left reliability by embedding operability, monitoring, and failure testing early in the software development lifecycle.
  • Drive continuous improvement through automation, self-healing systems, chaos engineering, and capacity planning.
  • Maintain runbooks, playbooks, and knowledge repositories, linking documentation to Jira tasks to reduce Mean Time to Resolution.
  • Provide technical leadership and mentoring to junior SREs and engineers.
  • Collaborate with global, distributed teams, leveraging Jira for transparent planning, dependency tracking, and execution.

What We're Looking For

  • 4+ years of experience in SRE, software engineering, or production operations supporting large-scale eCommerce platforms.
  • Hands-on experience with Java/J2EE-based distributed systems.
  • Proven ability to design and operate systems using SLO-driven reliability models.
  • Experience defining and measuring SLIs (availability, latency, error rates, throughput, saturation).
  • Good understanding with NoSQL technologies and RDBMS, including the ability to write queries.
  • Experience deploying and operating services on cloud platforms (AWS, Azure, or Google Cloud).
  • Expertise with observability, APM, and caching tools (Dynatrace, Splunk, ELK, Akamai, QuantumMetric/Tealeaf, etc.).
  • Strong experience using Jira for backlog management, incident follow-ups, toil reduction tracking, and cross-team coordination.
  • Ability to independently own services and drive reliability initiatives end-to-end.
  • Strong communication skills and ability to influence engineering and product teams.
  • Experience being on an On-Call rotation and handling critical or high-severity incidents.

Nice to Have

  • React experience is a plus.
  • Experience building and operating microservices architectures using Spring Boot, Groovy, React, or similar.
  • Strong understanding of CI/CD pipelines, release automation, and progressive delivery.
  • Experience with eCommerce domains such as Catalog, Customer Data, and Order Management.
  • Familiarity with search platforms (Endeca, Solr, Lucene, Elasticsearch).
  • Proficiency in scripting and automation (Python, Bash, Ruby, Perl, PowerShell).
  • Experience with ITSM tools integrated with Jira workflows.
  • Exposure to capacity planning, load testing, and chaos engineering.

Technical Stack

  • Languages & Frameworks: Java/J2EE, React, Spring Boot, Groovy
  • Databases: NoSQL, RDBMS, Endeca, Solr, Lucene, Elasticsearch
  • Cloud Platforms: AWS, Azure, Google Cloud
  • Observability & Tools: Dynatrace, Splunk, ELK, Akamai, QuantumMetric/Tealeaf, Jira
  • Scripting & Automation: Python, Bash, Ruby, Perl, PowerShell

Photon Group is an equal opportunity employer.

Required Skills
JavaJ2EEReactNoSQLRDBMSAWSAzureGoogle CloudDynatraceSplunkELKSLOSLIDistributed Systems
Scaling your freelance income?

Invoice multiple clients effortlessly

Managing 3+ international clients? Glopay streamlines everything. One EU company, unlimited invoices, automatic compliance. You just send and get paid.

Unlimited clients & invoices
Multi-currency support
Automated tax compliance
Client portal for easy payments
Scale with Glopay
Trusted by 10,000+ freelancers
About company
Job Details
Department Information Technology
Category infrastructure
Posted 14 days ago