As a Staff Data Engineer, you will shape the foundation of a scalable, reliable data platform that powers critical healthcare services. Your primary responsibility is to ensure data systems are efficient, resilient, and future-proof—supporting both transactional and analytical demands at scale.
Key Responsibilities
- Lead the optimization of relational databases, including deep tuning of AlloyDB and PostgreSQL for high-concurrency OLTP workloads.
- Enhance NoSQL infrastructure to meet growing scalability and performance requirements.
- Design and structure BigQuery environments to accelerate analytical queries and reporting.
- Identify and resolve performance bottlenecks across storage and processing layers.
- Develop strategies for data sharding, partitioning, and lifecycle management to maintain lean operational systems.
- Build high-throughput, backpressure-aware pipelines for ingesting HL7 and DICOM metadata.
- Ensure system stability during traffic spikes while maintaining data freshness and user experience.
- Define data flow patterns from operational databases to data warehouse, balancing real-time and batch approaches.
- Advocate for sound data practices on the Architecture Review Board, shaping how services interact with data.
- Enforce data contracts that promote loose coupling and long-term maintainability.
- Mentor engineers in query optimization, data modeling techniques, and distributed consistency patterns.
Qualifications
- Minimum of 8 years designing and managing data-heavy systems.
- Proven experience solving scalability challenges across the full data lifecycle.
- Deep technical knowledge of relational databases, including MVCC, locking, and query execution internals.
- Strong experience with NoSQL solutions, particularly document and key-value stores.
- Expertise in data warehousing platforms such as BigQuery or Snowflake, with understanding of columnar storage.
- Proficiency with data pipeline tools including Kafka, PubSub, Beam, Dataflow, or dbt.
- Experience handling data integrity concerns like duplicates, late arrivals, and processing guarantees.
- Fluency in Python, Java, and SQL with a focus on clean, maintainable code.
- Ability to diagnose and resolve complex performance issues in distributed environments.
- Experience optimizing ingestion pipelines to minimize latency under heavy load.
Preferred Experience
- Familiarity with healthcare data standards such as HL7 and DICOM, and compliance frameworks like HIPAA.
- Background in building real-time analytics or hybrid transactional/analytical systems.
- Experience leading zero-downtime migrations or large-scale schema changes in always-on environments.
Work Environment
This is a fully remote role with a globally distributed team. We operate on a flexible schedule that supports asynchronous collaboration across time zones.
Benefits
- Comprehensive medical, dental, and vision insurance
- Flexible, use-as-needed vacation policy
- Eligibility to participate in the employee option program