Lead the design and operation of scalable, secure cloud infrastructure on Google Cloud Platform, ensuring high availability and reliability for a growing SaaS platform.
Responsibilities
- Architect and maintain resilient, scalable, and secure cloud systems on Google Cloud Platform
- Manage containerized workloads using Docker and Kubernetes for consistent application deployment
- Automate infrastructure provisioning and configuration using Terraform and Infrastructure-as-Code practices
- Collaborate with development teams to ensure platform performance, scalability, and operational reliability
- Implement robust monitoring, alerting, and observability solutions to maintain system health
- Lead incident response and contribute to high availability and service continuity
- Develop tools and automation to improve operational efficiency and infrastructure management
- Support long-term platform readiness, including integration planning and capacity scaling
Requirements
- Proven experience supporting large-scale SaaS platforms with global users in continuous operation environments
- Background in Site Reliability Engineering, Cloud Operations, or Infrastructure teams
- Hands-on expertise with Google Cloud Platform, Kubernetes, and Terraform
Nice to Have
- Experience designing large-scale Google Cloud architectures
- Familiarity with deploying and managing Elasticsearch clusters
- Leadership experience with distributed or remote teams
Tech Stack
Google Cloud Platform, Bigtable, Cloud SQL, Dataflow, Datastore, GKE, GCS, KMS, Pub/Sub, Email ingestion via Microsoft Graph API automation and IMAP integrations, ElasticSearch hosted with Kubernetes Operator, CI/CD using Gitlab Pipelines and ArgoCD, Infrastructure-as-Code with Terraform, Terragrunt, and Atlantis, Monitoring and Security: Cloud Armor Enterprise, Grafana, Grafana Tempo, OpenTelemetry, OpsGenie, Renovate, Sentry, AI Tools: Augment Code, GitHub Copilot, Claude Code
Benefits
- Hybrid work model requiring at least two days per week in the London office
- Commitment to fostering a diverse and inclusive workplace
- Investment in employee growth and professional development
- Collaborative environment with a multidisciplinary team of professionals
Compensation
not specified
Work Arrangement
hybrid — London, Southwark — Minimum of two days/week in person attendance
Team
Potentially growing team with a Principal Site Reliability Engineer collaborating closely with embedded SREs on the Shipfix product and software engineers across the organization
- Focused on empowering the global maritime industry to manage complex trade operations
- Combines AI-driven workflows, reliable data, and seamless collaboration to drive innovation
- Dedicated to client success and transforming the commercial marine sector
- Invests in employee development and creates meaningful career experiences
- Committed to high client satisfaction and operational excellence
Additional Information
- Position located in the London office in Southwark
- Hybrid work policy requires minimum of two days per week in person
- Potential for leadership responsibilities as the team expands
- Applicants are encouraged to apply even if they do not meet all listed qualifications
not specified
