About the Role
The role involves developing and managing data infrastructure with a focus on Databricks, supporting analytics and data workflows through robust engineering practices and collaboration with cross-functional teams.
Responsibilities
- Design and implement data pipelines for large-scale data processing
- Develop and maintain ETL workflows using modern data tools
- Optimize data storage and query performance in cloud environments
- Support integration of data from multiple source systems
- Ensure data accuracy, consistency, and accessibility
- Collaborate with data analysts and scientists to understand requirements
- Monitor data pipeline health and troubleshoot issues
- Implement data validation and quality checks
- Work with cloud-based data platforms and services
- Document data architecture and system designs
- Participate in code reviews and technical discussions
- Contribute to data governance and security practices
- Support deployment automation and CI/CD pipelines
- Improve data processing efficiency and scalability
- Assist in defining data modeling standards
- Integrate machine learning outputs into production systems
- Maintain up-to-date knowledge of data engineering trends
- Collaborate with infrastructure teams on platform stability
- Support compliance with data privacy regulations
- Provide technical guidance on data-related initiatives
- Participate in incident response for data system outages
- Evaluate new tools and technologies for data workflows
- Ensure systems meet performance and reliability standards
- Work with stakeholders to refine data requirements
- Contribute to agile project planning and delivery
Nice to Have
- Master’s degree in a technical field
- Experience with Spark optimization techniques
- Knowledge of machine learning pipelines
- Familiarity with streaming data platforms
- Experience in regulated industries
- Certifications in cloud or data platforms
- Contributions to open-source data projects
- Experience with data mesh architecture
- Background in software development practices
Compensation
Competitive salary based on experience
Work Arrangement
Hybrid work model with flexible remote options
Team
Collaborative team environment focused on data solutions
Technology Stack
- Primary use of Databricks for data processing
- Integration with cloud storage solutions
- Application of Delta Lake for data reliability
- Use of Python and SQL for transformations
- Leverage of Git for source control
Professional Development
- Access to training platforms and courses
- Opportunities to attend industry conferences
- Support for earning technical certifications
- Internal knowledge-sharing sessions
- Mentorship programs for career growth
Sponsorship available for qualified candidates