Hugging Face is looking for a Data/Infrastructure Advocate Engineer to bridge the gap between cutting-edge data infrastructure and the global community of data engineers, researchers, and developers. You will champion Xet storage on the Hugging Face Hub to empower users to efficiently store, version, and collaborate on large-scale datasets, defining the future of open data workflows.
What You'll Do
- Grow and nurture the open-source data/infra community by launching initiatives, collaborating with data-focused groups, and organizing events or challenges.
- Engage with communities like Apache Parquet, Open Table Formats, and data engineering forums to promote best practices and Hugging Face tools.
- Promote the Hugging Face Hub as the go-to platform for data storage, versioning, and collaboration by curating and showcasing datasets, benchmarks, and tools like Xet.
- Highlight use cases like efficient large dataset updates, Parquet editing, and deduplication to demonstrate the Hub’s value for data workflows.
- Create demos, benchmarks, and tools (e.g., Colab notebooks) to illustrate best practices for data storage and versioning.
- Experiment with Xet, Parquet, and other data formats to showcase their potential for ML and data engineering.
- Produce high-quality tutorials, blog posts, and videos that make complex topics accessible.
- Share insights on storage optimization, dataset versioning, and deduplication to empower developers.
- Actively participate in online communities (Discord, GitHub, forums) to highlight contributions, answer questions, and foster collaboration.
- Ensure datasets and tools released on the Hub are well-documented, with clear examples, benchmarks, and use cases.
What We're Looking For
- Strong technical skills in Python, data libraries (e.g., pandas, pyarrow, huggingface/datasets), and storage systems like Parquet, Open Table Formats, and S3.
- A hands-on builder who loves experimenting with data tools, storage optimization, and dataset versioning.
- Ability to clearly explain complex topics (e.g., deduplication, compression, Parquet editing) through writing, demos, or talks.
- Active in developer communities (GitHub, Discord, forums) and passionate about open source and knowledge sharing.
- Thrive in fast-moving environments and enjoy building in public to inspire others.
Technical Stack
- Python, pandas, pyarrow, huggingface/datasets
- Parquet, Open Table Formats, S3, Xet
Team & Environment
You will collaborate with teams like Datasets, Hub, and Infrastructure.
Benefits & Compensation
- Reimbursement for relevant conferences, training, and education.
- Flexible working hours and remote options.
- Health, dental, and vision benefits for employees and their dependents.
- Parental leave and flexible paid time off.
- Opportunity to visit offices in NYC and Paris.
- Workstation outfitting as needed.
- Company equity as part of compensation package.
Work Mode
This is a remote position open to candidates in the EMEA region.
Hugging Face is an equal opportunity employer and does not discriminate based on race, ethnicity, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or ability status.




