Responsibilities
- Design, implement, and maintain the observability stack (Loki, Grafana, Tempo, Mimir/Prometheus) as the primary observability platform, balancing cost, performance, and developer experience.
- Design and develop internal platform products with React/TypeScript frontends and Python/Rust backends for log search, metrics visualization, and trace analysis.
- Architect and build high-performance log indexing solutions using Rust for efficient log processing and search.
- Design and implement SQL analytics for logs using AWS Athena or similar engines for ad-hoc analysis and historical queries.
- Build web interfaces for querying logs, metrics, and traces with features like saved queries, query templates, and pattern detection.
- Architect solutions leveraging both AWS-managed services and open-source tooling to optimize for cost, performance, and operational flexibility.
- Design seamless integration between AWS CloudWatch and the custom observability platform for unified visibility.
- Develop smart dashboards, monitors, and alerting systems to reduce noise and detect anomalies.
- Work with product teams to integrate observability into their services and establish logging and metrics standards.
- Provide the observability foundation for identifying performance bottlenecks and measuring platform stability.
- Define and document observability standards including logging patterns, metric naming conventions, and dashboard design principles.
- Lead workshops, create documentation, and build self-service tooling to promote observability best practices.
- Mentor engineers on observability practices and lead architecture reviews for instrumentation approaches.
- Work in an Agile/Scrum environment to deliver value to stakeholders and clients.
- Adhere to the company's Code of Conduct, including reporting noncompliance.
Compensation
Competitive
Work Arrangement
On-site
Team
Collaborative and innovative engineering teams
Qualifications
- Proven experience in designing and implementing observability platforms.
- Proficiency in Rust, Python, and TypeScript.
- Experience with AWS services and open-source tooling.
- Strong architectural and problem-solving skills.
- Ability to work in an Agile/Scrum environment.
- Excellent communication and mentoring skills.
Preferred Qualifications
- Experience with log indexing systems and SQL analytics.
- Familiarity with React and Grafana.
- Knowledge of performance optimization techniques.
- Experience with cloud-native and open-source technologies.
- Ability to lead workshops and create documentation.
Not specified
