Responsibilities
- Design and implement a structured Unity Catalog environment with Catalogs, Schemas, and Volumes to ensure data governance, security, and clear documentation across the organization.
- Lead the transition of intricate business logic from outdated systems into a consolidated Databricks Lakehouse, improving legacy SQL into modular, efficient, and sustainable code.
- Develop an internal data transformation framework using open-source tools such as Delta Live Tables or custom Python and SQL Spark pipelines, enabling scalable processing without third-party SaaS dependencies.
- Act as the primary expert in query optimization by evaluating Spark execution plans and Spark UI to identify performance issues, correct data skew, and enhance join efficiency on large datasets.
- Optimize Databricks compute usage by applying advanced techniques like Z-Ordering, Liquid Clustering, partitioning strategies, and Serverless SQL Warehouse setups to improve cost-performance ratio.
- Develop and manage CI/CD workflows using GitHub Actions or similar tools to automate testing, validation, and deployment of data models.
- Design the semantic layer in Omni to enable self-service analytics with fast dashboard response times under one second.
- Occasionally perform BI Developer duties by creating high-level dashboards that extract clear, actionable insights from complex data sources.
- Collaborate with teams across Finance, Sales, Product, Marketing, and Trust & Safety to convert business inquiries into scalable, long-term data solutions.
- Transform technical performance and cost data into strategic recommendations for executives, aligning technical precision with business outcomes.