Innodata Inc is seeking a Language Data Scientist to help customers advance their generative AI applications. In this role, you will work hands-on with multi-modal and multi-lingual datasets, collaborate cross-functionally, and leverage your experience with human and synthetic data workflows to drive innovation.
What You'll Do
- Design and improve workflows to create data for AI/ML training and evaluation, including human annotation, data collection, and synthetic workflows.
- Dive deep into existing workflows to gather data, make recommendations, and drive improvement through innovation and cross-functional collaboration.
- Critically assess annotation tooling and workflows.
- Quantitatively analyze large datasets, perform statistical analysis, calculate metrics, and make recommendations to improve accuracy and performance.
- Work closely with client stakeholders to understand goals, gather requirements, propose solutions, and execute them.
What We're Looking For
- Master's degree in (computational) linguistics, data science, computer science (AI/ML/NLU), quantitative social sciences or a related scientific/quantitative field; PhD strongly preferred.
- Extensive experience working with human language data and designing human evaluation tasks, including multi-phase and complex workflows.
- Deep understanding of language and its relationship with culture.
- Ability to identify ambiguity and subjectivity in language.
- Ability to work with multi-lingual and multi-modal projects.
- Advanced knowledge of statistics, metrics (e.g., f1 score, inter-rater reliability), and data analysis methods such as sampling.
- Experience with Natural Language Processing (NLP) techniques and tools, such as SpaCy, NLTK, or Hugging Face.
- Proficiency in Python to handle and transform large datasets (e.g., pandas), perform quantitative analyses, and visualize data (e.g., matplotlib, seaborn).
- Deep understanding of data pipelines to support ML and NLP workflows.
- Knowledge of efficient data collection, transformation, and storage.
- Knowledge of data structures, algorithms, and data engineering principles.
- Excellent interpersonal skills for effective cross-functional stakeholder engagement.
- Excellent problem-solving skills, with the ability to think critically and creatively to develop innovative AI solutions.
- Ability to work independently and collaborate as part of a team.
- Adaptable to changing technologies and methodologies.
- Ability to translate experience, research and development information to understand client products and services.
Nice to Have
- Conducting research to stay up-to-date with the latest advancements in generative AI, machine learning, and deep learning techniques.
- Knowledge of optimizing existing generative AI models for improved performance, scalability, and efficiency.
- Experience developing and maintaining ML/AI pipelines, including data preprocessing, feature extraction, model training, and evaluation.
- Knowledge of fine-tuning pre-trained models to adapt them to specific tasks and datasets.
- Developing clear and concise documentation to communicate complex AI concepts to both technical and non-technical stakeholders.
- Contributing to establishing best practices and standards for generative AI development with customers and within the organization.
- Providing technical mentorship and guidance to junior team members.
- Understanding of techniques such as GPT, VAE, and GANs.
Technical Stack
- Python
- SpaCy
- NLTK
- Hugging Face
- pandas
- matplotlib
- seaborn
Benefits & Compensation
- Compensation: Up to $120k CAD
Work Mode
This is a local-country position. The role is remote within Canada (excluding Quebec).




