What You'll Do
- Design and manage robust data pipelines in cloud data warehouses, ensuring alignment with architectural guidelines and operational reliability.
- Develop and maintain dbt models across multiple data layers, emphasizing data quality, automated testing, and incremental processing.
- Orchestrate workflows using Airflow, with attention to scheduling, error handling, and alerting mechanisms.
- Build and optimize dimensional models and data marts that support both clinical analytics and machine learning applications.
- Create intuitive dashboards in BI tools through close collaboration with stakeholders across finance, operations, product, and clinical teams.
- Integrate and normalize data from diverse sources including EHRs, payment systems, APIs, and internal databases.
- Enforce strict data protection standards by applying PHI/PII masking, tokenization, audit logging, and access controls.
- Develop and refine retrieval-augmented generation (RAG) pipelines, covering document ingestion, chunking, embeddings, and retrieval using frameworks like LangChain or LangGraph.
- Support end-to-end MLOps processes, including training pipeline maintenance, deployment coordination, and performance monitoring.
- Review code contributions, provide technical feedback, and uphold engineering best practices across the team.
- Partner with product managers to translate business needs into reliable data and AI deliverables.
- Monitor system health, troubleshoot pipeline failures, and escalate architectural concerns when necessary.
- Document data models, pipeline designs, and key decisions to ensure traceability and compliance.
- Evaluate emerging technologies through prototyping and technical assessments to guide tooling decisions.
Requirements
- Minimum of 5 years in data or analytics engineering roles with hands-on technical delivery.
- At least 2 years working in healthcare, with familiarity with clinical data standards, workflows, and regulatory environments.
- Solid understanding of HIPAA requirements, including data classification, access controls, and audit logging.
- Proven experience with cloud data warehouses such as BigQuery, Snowflake, or Redshift, including advanced SQL and performance tuning.
- Extensive experience with dbt, including model development, testing, documentation, and multi-environment deployment.
- Strong background in Apache Airflow for orchestrating complex data workflows.
- Expertise in dimensional modeling techniques, including star schemas and slowly changing dimensions.
- Experience building reports and dashboards in enterprise BI platforms such as Looker, Power BI, or Tableau.
- Proficiency in Python for pipeline scripting, API integrations, and automation tasks.
- Direct experience developing RAG pipelines and integrating large language models using LangChain, LangGraph, or similar tools.
- Familiarity with MLOps practices including model deployment, monitoring, and retraining triggers.
- Working knowledge of CI/CD systems for data and AI workloads, such as GitHub Actions or dbt Cloud CI.
- Strong grasp of data governance principles, including lineage tracking, data contracts, and automated quality checks, with experience in tools like OpenMetadata.
- Excellent communication skills and the ability to collaborate effectively across technical and non-technical teams.
- Ability to work independently while keeping leadership informed of progress and risks.
Benefits
- Work in a trans-centered, culturally inclusive, and supportive environment.
- Engage in purpose-driven work focused on improving healthcare access for transgender individuals.
- Align professional growth with social impact by contributing to meaningful change in healthcare delivery.
- Grow your expertise in data engineering and applied AI with guidance from seasoned technical leaders.
