Responsibilities
- Architect and deploy data pipelines for ingesting, processing, and visualizing data from diverse sources using Azure and Databricks.
- Define and implement data models, schemas, and structures that support analytics while ensuring scalability, performance, and integrity.
- Set up and manage data storage systems for structured, semi-structured, and unstructured data, optimizing for speed, cost, and regulatory compliance.
- Build and maintain data integration workflows including extraction, transformation, migration, and loading, ensuring accurate and consistent data movement.
- Develop and enhance data workflows, automation tools, and API integrations to improve system efficiency.
- Conduct unit and integration testing; assist in user acceptance testing, training, and implementation support.
- Support executive reporting needs through ad-hoc analysis, dashboard development, and data modeling.
- Work with data scientists and analysts to align data systems with business goals and technical requirements.
- Produce technical documentation such as design specs and program requirements throughout the development lifecycle.
- Apply DevOps principles to automate development, deployment, and monitoring of data systems.
- Partner with DevOps teams to establish CI/CD pipelines and automated testing for reliable, scalable solutions.
- Monitor data pipeline performance, identify bottlenecks, and optimize query and Spark job efficiency.
- Apply caching, partitioning, and indexing techniques to accelerate data processing and reduce latency.
- Enforce data security through encryption and access controls for data at rest and in transit.
- Ensure adherence to industry, organizational, and regulatory compliance standards.
- Support system maintenance tasks such as applying updates and patches as needed.
- Oversee third-party vendor deliverables to ensure quality and alignment with project goals.
- Lead training sessions and knowledge transfer activities for internal teams and stakeholders.
- Stay current with emerging technologies and recommend improvements to enhance data engineering practices.