About the Role
The consultant will work with engineering teams to build robust observability frameworks using modern tooling and practices. Responsibilities include diagnosing system behavior, improving monitoring coverage, and guiding teams on best practices for metrics, logs, and traces.
Responsibilities
- Design and deploy observability pipelines for cloud-native applications
- Collaborate with developers to integrate logging, metrics, and tracing
- Analyze system performance data to identify bottlenecks and failure points
- Develop dashboards and alerting rules for proactive incident detection
- Advise on instrumentation strategies for microservices and serverless functions
- Troubleshoot data quality issues in telemetry collection
- Optimize storage and retention policies for observability data
- Support incident response with real-time data analysis
- Evaluate and recommend observability tools and platforms
- Document architecture patterns and implementation guidelines
- Lead knowledge-sharing sessions on observability best practices
- Ensure compliance with security and data governance standards
- Integrate observability into CI/CD pipelines
- Monitor service-level objectives and error budgets
- Assist in root cause analysis for production outages
- Work with SRE and platform teams to improve system resilience
- Implement distributed tracing across complex service topologies
- Standardize metric naming and tagging conventions
- Assess observability maturity across projects
- Provide feedback to tool vendors and open-source communities
- Maintain up-to-date knowledge of emerging observability trends
- Support on-call teams with diagnostic tooling
- Configure synthetic monitoring for critical user journeys
- Audit existing monitoring coverage for gaps
- Promote a culture of system ownership and visibility
Compensation
Competitive market rate based on experience and location
Work Arrangement
Fully remote with flexible scheduling across time zones
Team
Collaborative engineering team focused on scalable monitoring and system visibility
Why This Role Matters
Systems are growing in complexity, and traditional monitoring is no longer sufficient. This role ensures engineering teams can understand behavior, detect issues early, and respond effectively. The consultant directly influences reliability, developer productivity, and customer experience by making systems more transparent and predictable.
Tech Stack
We use a combination of open-source and managed tools including Prometheus for metrics, Loki for logs, Tempo for tracing, and Grafana for visualization. Instrumentation follows OpenTelemetry standards. Infrastructure is cloud-hosted with Kubernetes orchestration. CI/CD pipelines integrate observability checks.
Not applicable — fully remote roles available globally