Judi Health is seeking a Senior Scalability Engineer - Observability to define, own, and build our organization-wide observability strategy, tooling, and platform products. You will architect and develop a custom observability platform that provides visibility into every layer of our infrastructure. Together with our clients, we’re rebuilding trust in healthcare in the U.S. and deploying the infrastructure we need for the care we deserve.
What You'll Do
- Design, implement, and maintain the LGTM stack (Loki, Grafana, Tempo, Mimir/Prometheus) as the primary observability platform across all engineering teams.
- Design and develop production-grade internal platform products with React/TypeScript frontends and Python/Rust backends.
- Architect and build high-performance log indexing solutions using Rust that process logs and provide sub-second search across billions of log lines.
- Design and implement solutions leveraging AWS Athena or similar SQL query engines (DuckDB, ClickHouse) for ad-hoc log analysis and historical queries.
- Build sophisticated web interfaces that allow engineers to query logs, metrics, and traces with features like saved queries, query templates, and correlation analysis.
- Architect solutions that thoughtfully leverage both AWS-managed services (CloudWatch, Athena, Kinesis) and open-source tooling (LGTM stack, Quickwit).
- Design seamless integration between AWS CloudWatch Logs/Metrics and our custom observability platform.
- Develop smart dashboards, monitors, and alerting systems that reduce noise, detect anomalies, and help teams respond to incidents quickly.
- Work directly with product teams to integrate observability into their services, establish logging and metrics standards, and instrument code effectively.
- Provide the observability foundation that allows the Scalability team to identify performance bottlenecks, track optimization impact, and measure platform stability.
- Define and document comprehensive observability standards including structured logging patterns, metric naming conventions, and trace instrumentation.
- Lead workshops, create documentation, and build self-service tooling that democratizes observability across engineering.
- Mentor engineers on observability practices, lead architecture reviews for instrumentation approaches, and represent the Scalability team in cross-functional planning.
- Work in an Agile/Scrum environment to continually deliver value to stakeholders and clients.
- Adhere to the Capital Rx Code of Conduct including reporting of noncompliance.
What We're Looking For
- 10+ years of software engineering or infrastructure engineering experience with demonstrated progression into technical leadership roles.
- Several years of experience leading technical initiatives, building platform products, or serving as a subject matter expert on observability infrastructure.
- Strong experience with React/TypeScript for frontend development and Python (Flask/SQLAlchemy) for backend services.
- Deep production experience with the LGTM stack (Loki, Grafana, Tempo, and Prometheus/Mimir) for logs, metrics, and distributed tracing at scale.
- Extensive experience with AWS CloudWatch Logs and Metrics, including custom metrics, log insights, dashboard creation, and integration patterns.
Technical Stack
- Frontend: React, TypeScript
- Backend: Python, Flask, SQLAlchemy, Rust
- Observability: Loki, Grafana, Tempo, Prometheus, Mimir
- AWS: CloudWatch, Athena, Kinesis
- SQL Analytics: DuckDB, ClickHouse
- Open Source: Quickwit
Team & Environment
You will be a key member of the Scalability team.
Work Mode
This is a fully remote position with a global hiring scope.
Judi Health is an equal opportunity employer.




