Responsibilities

Engage in data discovery sessions to map source systems such as property management tools, marketing platforms, and CRM databases, and convert insights into data lake design specifications.
Create and manage a multi-tier data lake on Amazon S3 with distinct zones for raw, conformed, enriched, and aggregated data, incorporating ingestion, cleansing, and business logic layers.
Develop batch and real-time data pipelines using AWS Glue, Amazon Kinesis, and containerized applications to ingest data from customer data platforms, marketing sources, and property systems.
Write scalable PySpark and Python ETL scripts for AWS Glue to process and enhance large datasets, using Apache Iceberg for transactional integrity and schema evolution.
Build data transformation workflows using AWS Glue ETL and AWS Step Functions, and automate metadata discovery with AWS Glue Data Catalog crawlers.
Apply AWS Lake Formation to enforce granular access controls at the table and column level, including data filtering and cross-account sharing, beyond basic IAM policies.
Optimize Amazon Athena for efficient SQL queries on the data lake using Parquet formatting, partitioning, and caching strategies, and use DynamoDB with DAX for low-latency customer profile access.
Develop and deploy AWS Lambda functions with structured logging and observability using AWS Lambda Powertools, implementing retry logic, dead-letter queues, and monitoring via CloudWatch.
Use Terraform, CloudFormation, or CDK to define and deploy AWS data infrastructure through CI/CD pipelines, with data engineers managing their own deployments.
Integrate GitHub Actions into CI/CD workflows to automate testing, linting, and deployment of Glue jobs, Lambda functions, and Step Functions with validation gates.
Lead migration from Azure Data Lake to AWS by auditing ADLS assets, replicating environments on AWS, transferring data via AWS DataSync, and validating results through row counts and checksums.
Design entity resolution systems to merge customer records into unified profiles using exact and fuzzy matching, with traceability and manual review options.
Construct and maintain data models that power 360-degree customer views and executive dashboards in Amazon QuickSight.
Ensure data accuracy, consistency, and validation across all pipeline stages, and support user acceptance testing for data-driven features.
Partner with full-stack, DevOps/MLOps, and AI/ML teams using SageMaker and Bedrock, contributing to architecture documentation, runbooks, and data governance standards.

Benefits

Remote work

Work Arrangement

Remote (Worldwide)

Job Type

Full-time, 1099

Equal Opportunity Employer

We are an equal opportunity employer. We embrace and celebrate diversity and are committed to creating an inclusive and safe environment for all employees.

Application Encouragement

We encourage you to apply even if your experience doesn’t perfectly align with what we have listed.

Agency Policy

No Agencies Please!

Capnexus is hiring a Senior Data Engineer (AWS)