Responsibilities
- Construct and sustain ETL/ELT systems that efficiently transfer data between heterogeneous sources such as laboratory information platforms, billing systems, and cloud repositories.
- Create and refine data processing workflows in Databricks using PySpark, Delta Lake, and Unity Catalog, adhering to medallion architecture principles from raw to curated layers.
- Design and oversee integrations with Azure Data Factory, Azure SQL Managed Instance, and Azure Blob or ADLS Gen2 storage solutions.
- Work alongside the Director of Data Architecture to plan, implement, and document significant changes to data pipelines and schemas.
- Develop and maintain automation scripts in Python, along with data validation checks and monitoring procedures.
- Provide support for analytics tools like Power BI and Zoho Analytics by ensuring efficient, well-structured data models and timely dataset updates.
- Assist in designing and managing SQL Server database structures, including stored procedures, views, and schema organization.
- Support data governance initiatives including access management, Unity Catalog permissions, data lineage tracking, and change control processes.
- Collaborate with internal teams from clinical, billing, and operations units to convert business needs into technical data solutions.
- Proactively detect and fix issues related to data accuracy, system latency, and pipeline stability.
