Responsibilities

Ensure consistent performance and uptime of production systems across Google Cloud Platform, Kubernetes, and Node.js with Postgres databases
Take primary responsibility during critical system outages, lead incident resolution, and conduct follow-up analyses to prevent recurrence
Enhance monitoring capabilities, refine alerting systems, and optimize on-call procedures to proactively detect and resolve issues
Establish service level objectives and agreements for key services, and promote their consistent use across engineering teams
Develop internal tools and automated frameworks that enable safer code deployments and streamline infrastructure management for development teams
Work closely with Product, Engineering, and Machine Learning teams to integrate reliability practices into the development lifecycle
Create and maintain technical roadmaps that balance immediate stability needs with long-term scalability for an expanding user base
Promote best practices in platform engineering, including blameless postmortems, operational discipline, and a culture of ongoing learning

Work Arrangement

Remote (Worldwide) — San Francisco, Seoul, Tokyo, Taipei, Ljubljana

Other

Mastering a new language is among the most transformative abilities a person can develop, yet nearly 99% fail to reach proficiency due to ineffective learning methods. The mission is to empower millions to succeed in language acquisition and positively transform their lives.

Speak is hiring a Platform Engineer

Responsibilities

Work Arrangement

Other