Czechia Hybrid

Bloomreach is hiring a Senior Site Reliability Engineer for Datacraft team

About the Role

The individual will collaborate with engineering teams to improve platform stability, automate operational tasks, and drive best practices in monitoring, incident response, and system design.

Responsibilities

  • Collaborate with development teams to ensure service reliability and scalability
  • Design and implement automated solutions for operational workflows
  • Monitor system performance and respond to incidents in production environments
  • Develop and maintain tools for deployment, monitoring, and diagnostics
  • Contribute to capacity planning and system architecture improvements
  • Troubleshoot complex technical issues across distributed systems
  • Enforce observability standards through logging, metrics, and tracing
  • Participate in on-call rotations for critical system support
  • Optimize system uptime and reduce mean time to recovery
  • Drive post-incident reviews and implement corrective actions
  • Support continuous integration and delivery pipelines
  • Ensure configurations adhere to security and compliance standards
  • Improve system resilience through proactive failure testing
  • Collaborate on disaster recovery planning and execution
  • Document system architecture and operational procedures
  • Mentor junior engineers in reliability best practices
  • Evaluate new technologies for operational efficiency
  • Work closely with product teams to align reliability goals
  • Maintain infrastructure as code for consistency and repeatability
  • Analyze system dependencies to reduce single points of failure
  • Implement scalable solutions for data processing systems
  • Promote a culture of shared ownership for system health
  • Contribute to service level objective definitions and tracking
  • Support cloud infrastructure management and optimization
  • Ensure efficient resource utilization across environments

Nice to Have

  • Master’s degree in computer science or related field
  • Experience with big data platforms such as Apache Spark or Flink
  • Contributions to open-source infrastructure projects
  • Certifications in cloud or DevOps technologies
  • Prior work in high-throughput data processing environments
  • Exposure to service mesh technologies like Istio
  • Knowledge of gRPC and protocol-level observability
  • Experience with large-scale event streaming systems
  • Background in machine learning infrastructure operations
  • Leadership in reliability initiatives across engineering teams

Compensation

Competitive salary with performance-based incentives

Work Arrangement

Hybrid work model with flexibility for remote or office-based work

Team

Part of a distributed engineering team focused on data infrastructure and reliability

About the Datacraft team

This team is responsible for building and maintaining scalable data infrastructure that powers core product functionality. The focus is on reliability, automation, and performance at scale.

What We Value

Collaboration, transparency, technical excellence, and a proactive approach to system health and incident prevention.

Available for qualified candidates requiring sponsorship

About company
Bloomreach

Loomi AI, Bloomreach's agentic platform, understands each customer to personalize their experience in real time — across email, web, mobile, and search. The platform connects first-party customer and product data with business metrics to deliver intelligent personalization at scale.

Bloomreach powers AI-driven marketing automation, ecommerce search, and conversational shopping experiences, helping brands increase revenue, loyalty, and conversion rates across 13+ channels.

All jobs at Bloomreach Visit website
Job Details
Department Datacraft team
Category infrastructure
Posted 16 hours ago