Germany Remote (Global) Full-time

Albatross is hiring a Site Reliability Engineer

About the Role

Join Albatross as a Site Reliability Engineer to take ownership of the reliability and observability of our platform. This is a hands-on leadership role where you will design, build, and maintain our observability stack, lead incident response, oversee releases, and establish processes and standards.

What You'll Do

  • Own and evolve our observability stack, including Prometheus, Grafana, Loki, and Jaeger, along with dashboards, alerts, and SLOs.
  • Instrument services for meaningful metrics and tracing, reducing noise and improving signal.
  • Lead incident response and establish blameless postmortems, runbooks, and automated remediation.
  • Define, track, and improve SLIs and SLOs to proactively reduce reliability risk.
  • Own the release process end-to-end, improving deployment speed, safety, and recovery.
  • Implement progressive rollouts, feature flags, and rollback strategies.
  • Embed observability into the development lifecycle in close collaboration with engineering teams.
  • Maintain and evolve our Kubernetes-based platform, adopting new tools when they add real value.

What We're Looking For

  • 5–7+ years in SRE, platform engineering, DevOps, or a similar hands-on role.
  • Strong production experience with Kubernetes and modern observability stacks like Prometheus, Grafana, Loki, and Jaeger/OpenTelemetry.
  • Proven track record leading incident response and building monitoring systems teams actually use.
  • Deep distributed systems knowledge and production debugging experience.
  • A pragmatic approach to tooling and alerting that teams trust.
  • Clear communicator across engineering, product, and leadership.
  • A STEM degree (Computer Science, Engineering, Mathematics, or similar).

Nice to Have

  • Contributions to open-source observability projects.
  • A background in high-scale or high-availability environments.

Technical Stack

  • Prometheus
  • Grafana
  • Loki
  • Jaeger
  • OpenTelemetry
  • Kubernetes

Benefits & Compensation

  • Remote-first, async-friendly culture.
  • Ownership and autonomy; you'll shape how we do reliability.
  • A team that cares about building things right.

Work Mode

This is a global, remote position open to candidates based in Europe.

Required Skills
PrometheusGrafanaLokiJaegerOpenTelemetryKubernetesSite Reliability EngineeringMonitoringObservabilityDistributed TracingInfrastructure as CodeCloud PlatformsIncident ManagementAutomation
Invoicing holding you back?

Focus on work, not paperwork

Stop worrying about invoicing, taxes, and compliance. Glopay handles the business setup, you handle the client work. Get paid faster and look professional.

Auto-generated compliant invoices
Built-in expense management
Income reports for tax season
95% of earnings stay with you
Try Glopay free
No credit card needed
About company
Albatross

We’re building the second pillar of AI: a perception layer that understands how users actually experience content, in real time. Trained on live user interactions, Albatross learns and reasons on the fly. Our technology powers real-time, in-session discovery by adapting to evolving user interests, in real-time.

Visit website
Job Details
Category infrastructure
Posted 2 months ago