Data Engineer I at GitLab

GitLab is hiring a Data Engineer I to champion data accessibility and empower colleagues to use data effectively. You will design and maintain data pipelines with high observability and reliability in a collaborative environment.

What You'll Do

Design and maintain data pipelines with high observability and reliability.
Build and maintain large, complex data sets that meet functional and non‑functional requirements.
Identify and implement process improvements such as automating manual tasks, optimizing data delivery and redesigning infrastructure for scalability.
Construct extraction, transformation and loading infrastructure in SQL, Python, pyspark, and dbt.
Collaborate with executive, product, clinical, data and design teams to resolve data‑related technical issues and support their data infrastructure needs.
Ensure that pipelines are scalable, efficient and secure.

What We're Looking For

2–3 years of professional data‑engineering experience
Graduate degree in computer science, statistics, informatics, information systems or a related quantitative field
Advanced proficiency in Python and SQL
Experience using dbt to build data pipelines
Expertise with relational, NoSQL, and cloud database technologies
Hands‑on experience with Databricks (Delta Lake, Spark SQL and workflows)
Strong knowledge of data‑warehousing concepts and ETL/ELT design
Experience performing root‑cause analysis on internal and external datasets to answer business questions and identify improvement opportunities
Proven history of manipulating and extracting value from large, disconnected datasets
Basic understanding of performance tuning and optimization for data systems
Experience working with cross‑functional teams in a dynamic environment
Self‑motivated professional who takes ownership of assigned tasks and seeks guidance when necessary

Nice to Have

Healthcare‑domain experience
Experience with building a Data-as-a-service platform
Experience with building APIs
Experience with cloud-based data warehouse: Snowflake
Experience with relational SQL and NoSQL databases
Experience with object-oriented/object function scripting languages: Golang, Python, Java, C++, Scala, etc.
Experience with big data tools: Spark, Kafka, etc.
Experience with data pipeline and workflow management tools like Airflow
Experience with AWS cloud services: EC2, EMR, RDS, Redshift
Experience with stream-processing systems: Storm, Spark-Streaming, etc.

Technical Stack

SQL
Python
pyspark
dbt
Databricks
Delta Lake
Spark SQL

Benefits & Compensation

Salary: $95,000 - $105,000 USD + equity: Stock options
Competitive medical, dental, and vision coverage
Competitive 401(k) Plan with a generous company match
Flexible Time Off/Paid Time Off, 12 paid holidays
Protection Plans including Life Insurance, Disability Insurance, and Supplemental Insurance
Mental Health and Wellness benefits
Corporate bonus program or sales incentive
Stock options