What You'll Do

Design and maintain Upbound Spaces, the foundation of our control plane management platform, ensuring it scales efficiently across thousands of instances. You'll play a central role in operating and evolving a system used in both cloud and on-prem deployments, focusing on reliability, performance, and operational clarity.

Develop new features based on customer needs and deliver enhancements that improve system behavior and user experience. Investigate and resolve intricate issues in multi-control plane environments, including reconciliation failures, resource inconsistencies, and performance degradation.

Write production-grade Go code that interacts with the Kubernetes API, building controllers, operators, and extensions with observability and maintainability as core priorities. Contribute to the full lifecycle of service development—from design and implementation to deployment and ongoing support—while ensuring systems remain production-ready.

Use metrics, logs, and distributed tracing to monitor, debug, and optimize live services. Create internal tools that streamline incident diagnosis, assess control plane health, and automate responses to common operational issues.

Document your work thoroughly, including design proposals, post-incident analyses, runbooks, and technical content to guide users and teammates. Support the release process for self-hosted versions of Spaces, helping diagnose problems in customer-run environments.

Participate in on-call rotations to respond to platform incidents, lead resolution efforts, and implement follow-up improvements to prevent recurrence.

Requirements

Proven experience running large-scale cloud services with a focus on monitoring, alerting, incident management, and post-mortem analysis
Strong debugging skills in distributed systems, with hands-on use of observability tools such as Prometheus, Grafana, OpenTelemetry, and distributed tracing
Direct experience building and managing Kubernetes controllers and operators, including tuning reconciliation logic and handling API rate limits
Ability to collaborate with customers to understand, replicate, and fix complex technical problems in their environments
A mindset of ownership—stepping in to resolve issues even when they fall outside your immediate domain, especially during critical outages
Commitment to operational excellence, with a focus on reliability, debuggability, and long-term system health
Customer empathy, ensuring solutions are built with real-world use and supportability in mind
Clear, thoughtful communication in both technical documentation and team collaboration
Active support for a learning culture—helping teammates grow, sharing on-call knowledge, and fostering psychological safety

Technical Stack

Go, Kubernetes, Crossplane, Prometheus, Grafana, OpenTelemetry, distributed tracing, controllers, operators, add-ons, Kubernetes API

Work Mode

Remote - global

Our Culture

Rooted in operational rigor and continuous learning, we value ownership, clear communication, and teamwork. We prioritize customer needs, encourage open collaboration, and maintain a supportive environment where engineers can grow and thrive—even during high-pressure situations.

Phiture is hiring a Senior Software Engineer [REMOTE]

What You'll Do

Requirements

Technical Stack

Work Mode

Our Culture

Similar Jobs

Commercial Solutions Architect

Software Engineer Kube/Kapsule

DevOps & Site Reliability Engineer

Software Engineer - Observability

Senior SRE Engineer

Senior Engineer - Site Reliability Engineering

Related Articles

Platform Engineering: Kubernetes for All

Developer Experience Platform: Lessons from Europe

Kubernetes Remote Jobs: AI & Cloud-Native Careers in 2026

Phiture is hiring a Senior Software Engineer [REMOTE]

What You'll Do

Requirements

Technical Stack

Work Mode

Our Culture

Similar Jobs

Commercial Solutions Architect

Software Engineer Kube/Kapsule

DevOps &amp; Site Reliability Engineer

Software Engineer - Observability

Senior SRE Engineer

Senior Engineer - Site Reliability Engineering

Related Articles

Platform Engineering: Kubernetes for All

Developer Experience Platform: Lessons from Europe

Kubernetes Remote Jobs: AI & Cloud-Native Careers in 2026

DevOps & Site Reliability Engineer