Platform Engineer - Product Reliability (Mid Level) at Kraken (Expired)

Role Overview

We’re seeking a Platform Engineer to join our Product Reliability team, focused on building robust, scalable systems within our energy management platform. In this role, you’ll partner with product teams to enhance system availability, performance, and fault tolerance, ensuring services remain resilient under real-world demands.

Key Responsibilities

Advise engineering teams on reliability best practices, including infrastructure design and failure mitigation strategies
Collaborate directly on code and configuration to strengthen system resilience and operational performance
Identify opportunities for improvement in core platform infrastructure based on hands-on experience and incident analysis
Support the development of proof-of-concept solutions to evolve deployment architecture in line with scaling needs
Guide teams in implementing observability frameworks using tools like Datadog, Prometheus, and Grafana
Contribute to post-incident reviews, helping teams implement corrective actions and prevent recurrence
Use metrics and monitoring data to detect patterns, recommend changes, and improve service reliability
Work across distributed systems to solve complex technical challenges in high-availability environments

Required Qualifications

Proven experience with AWS, Terraform, and Kubernetes in production environments
Familiarity with observability platforms such as Datadog, Prometheus, or similar tooling
Programming experience in Python or related languages to analyze application behavior in production
Strong written communication skills, particularly in asynchronous formats like Slack, Notion, or technical documentation
Ability to thrive in autonomous settings, define structure in ambiguous situations, and drive initiatives independently
Experience collaborating with developers and product stakeholders to deliver measurable improvements
Demonstrated commitment to continuous learning and iterative problem solving

Preferred Background

Prior work as a Site Reliability Engineer or similar role
Experience supporting SaaS platforms at scale, including knowledge transfer across teams
Background in incident response, outage management, and technical post-mortem facilitation
Exposure to large relational databases and performance tuning
Experience defining and tracking service level objectives to guide reliability improvements

Technology Environment

Our platform runs on AWS with infrastructure managed through Terraform, orchestrated via Kubernetes, and monitored using Datadog, Grafana, Prometheus, and Rootly. Development and operations workflows are supported in Python, TypeScript, Go, and C#.

Work Environment

This role is open to candidates based in Australia, with full remote flexibility within the country. We value autonomy, clear documentation, and inclusive collaboration across distributed teams.

Culture and Values

We foster a culture rooted in empathy, sustainability, and technical excellence. Our teams operate with independence while maintaining strong accountability. We prioritize diversity, proactive learning, and transparent communication—especially in written form—to support long-term growth and innovation.

Kraken was looking for a Platform Engineer - Product Reliability (Mid Level)

Key Responsibilities

Required Qualifications

Preferred Background

Technology Environment

Work Environment

Culture and Values

Similar Jobs

DevOps Engineer (Mid level)

Senior Infrastructure Engineer

Software Engineer - Observability

Sr Cloud Engineer | NodeJS + TS/JS | Europe remote

KTO - Platform Engineer - SRE - Lever

Entry Level - Site Reliability Engineer (Remote - Ireland)

Related Articles

Platform Engineering: Kubernetes for All

Developer Experience Platform: Lessons from Europe

Kubernetes Remote Jobs: AI & Cloud-Native Careers in 2026