Responsibilities
- Develop high-capacity data ingestion pipelines to incorporate datasets from diverse external sources.
- Design and deploy scalable systems for entity resolution, including record linkage, deduplication, clustering, and conflict resolution.
- Improve matching algorithms, decision logic, and similarity measures to ensure precise and comprehensive dataset alignment.
- Establish and monitor data quality metrics such as overlap rates, match precision and recall, duplication levels, and data completeness.
- Generate datasets formatted for machine learning training, including TFRecords, tailored to research needs.
- Build data processing components using Dataflow and Apache Beam, and manage large-scale analytical workloads in BigQuery.
- Utilize distributed computing frameworks like Ray to speed up large experiments, feature extraction, and research data workflows.
- Work closely with machine learning researchers to understand future needs and adapt data linkage approaches as new data sources and applications arise.
Benefits
- Highly competitive compensation and equity package
- Quarterly budget allocation for productivity tools
- Flexible paid time off policy
- Prime office location in Manhattan
- Productivity suite including ChatGPT Plus, Claude Code, and GitHub Copilot
- Comprehensive private health, dental, and vision coverage for employees and dependents
- 401(k) plan with employer matching contributions
- Access to concierge-level primary care via One Medical and Rightway
- Mental health resources and support through Spring Health
- Customized life insurance, travel assistance, and additional lifestyle benefits
Compensation
Highly competitive salary and equity
Work Arrangement
On-site — Manhattan
Other
- The company adheres to equal employment opportunity principles, ensuring fair treatment for all employees and applicants regardless of race, color, religion, sex, national origin, age, disability, genetics, sexual orientation, gender identity, or expression.
- The organization is dedicated to fostering a diverse and inclusive workplace, actively encouraging individuals from varied backgrounds, experiences, viewpoints, and abilities to join the team.