Responsibilities
- Contribute to building and maintaining a medallion/curated data warehouse stack (bronze/silver/gold) for product, usage, billing, and operational data.
- Build and maintain Airflow orchestrated pipelines and dbt transformation projects (modular, tested, documented).
- Help design analytics-ready models: SCD Type 2, star schemas, and appropriate normalization for upstream canonical layers.
- Learn and apply Master Data Management (MDM) patterns (golden records, reference data, deduping, identity resolution).
- Implement data quality checks (freshness, nulls, referential integrity, distribution drift, anomaly detection).
- Contribute to data governance habits: data stewardship, ownership, SLAs, and clear definitions for “source of truth.”
- Help build and maintain a business semantic layer (consistent metric definitions, dimensions, and reusable logic) used by notebooks/BI.
- Partner with stakeholders (Product, Engineering, Finance, GTM, Ops) to translate questions into durable datasets and metrics.
- Use SQL, Python, and Spark where scale demands it; optimize for correctness, performance, and cost.
Requirements
- 0–4 years of professional experience (or strong internships/projects) working with data warehouses, pipelines, or analytics engineering.
- Solid SQL fundamentals — you’re comfortable writing queries and have some exposure to window functions or dimensional modeling concepts.
- Good communication skills: you can ask clarifying questions, explain your reasoning, and work with stakeholders to understand their needs.
- High standards for data quality, reliability, and maintainability — you care about getting things right.
Nice to Have
- Some hands-on experience with dbt or Airflow, or strong eagerness to learn — coursework and personal projects count.
- Basic Python for scripting and data tooling; any exposure to Spark (PySpark/SQL) is a plus.
- Familiarity with data modeling concepts like SCD2 or star schemas — even if only from coursework.