Iris Software seeks a Data Engineer to join our team. You will be responsible for designing and developing scalable data pipelines and modernizing legacy workflows using Databricks and Apache Spark. This role involves performance tuning and orchestrating pipelines using CI/CD tools. At Iris, we foster an award-winning culture that values your talent and ambitions, offering personalized career development, continuous learning, and mentorship on cutting-edge projects.
What You'll Do
- Design and develop scalable batch and near-real-time ETL/ELT pipelines using Databricks on AWS and Apache Spark with PySpark, Spark SQL, and Structured Streaming.
- Modernize legacy SQL, Hive, and stored procedure workflows into distributed Spark-native architectures.
- Perform Spark performance tuning and build structured streaming pipelines using Kafka and Spark Structured Streaming.
- Design dimensional data models, including Fact/Dimension tables and SCD Type 2 implementations.
- Orchestrate pipelines using Databricks Workflows or Apache Airflow.
- Integrate CI/CD pipelines using Jenkins, Git, and Bitbucket/GitHub for automated deployment across development, UAT, and production environments.
What We're Looking For
- Proficiency in Apache Spark (Core, SQL, Structured Streaming), PySpark, and advanced SQL.
- Hands-on experience with Databricks on AWS or Azure.
- Strong skills in Python and SQL for data engineering tasks.
- Experience with CI/CD tools including Jenkins and Git/GitHub/Bitbucket.
- Knowledge of AWS services like S3, EMR, EC2, IAM, and CloudWatch (Azure experience is preferred).
- Familiarity with Databricks runtime and cluster management, Apache Kafka, and PySpark for big data processing.
- Experience applying data science and machine learning concepts with Databricks, Apache Spark, and Python.
- Proven database programming skills with SQL and experience with AWS storage services like S3, S3 Glacier, and EBS.
- Excellent communication and collaboration skills.
Nice to Have
- Programming skills in Java or Scala.
- Experience with Snowflake integration and Apache Airflow.
- Background in financial services, regulatory reporting, or enterprise data platforms.
- Hands-on experience in Delta Lake optimization and incremental processing strategies.
- Experience with Snowflake data warehousing.
- A Databricks Certification, preferably at the Professional level.
- A strong understanding of distributed computing principles.
Technical Stack
- Apache Spark, PySpark, Databricks, SQL, Jenkins, Git
- Python, AWS, Azure, Kafka, Snowflake, Airflow
- Java, Scala
Iris Software is an equal opportunity employer.






