Remote (Country) Full-time

Innodata Inc is hiring a Language Data Scientist

About the Role

Innodata Inc is seeking a Language Data Scientist to help customers advance their generative AI applications. In this role, you will work hands-on with multi-modal and multi-lingual datasets, collaborate cross-functionally, and leverage your experience with human and synthetic data workflows to drive innovation.

What You'll Do

  • Design and improve workflows to create data for AI/ML training and evaluation, including human annotation, data collection, and synthetic workflows.
  • Dive deep into existing workflows to gather data, make recommendations, and drive improvement through innovation and cross-functional collaboration.
  • Critically assess annotation tooling and workflows.
  • Quantitatively analyze large datasets, perform statistical analysis, calculate metrics, and make recommendations to improve accuracy and performance.
  • Work closely with client stakeholders to understand goals, gather requirements, propose solutions, and execute them.

What We're Looking For

  • Master's degree in (computational) linguistics, data science, computer science (AI/ML/NLU), quantitative social sciences or a related scientific/quantitative field; PhD strongly preferred.
  • Extensive experience working with human language data and designing human evaluation tasks, including multi-phase and complex workflows.
  • Deep understanding of language and its relationship with culture.
  • Ability to identify ambiguity and subjectivity in language.
  • Ability to work with multi-lingual and multi-modal projects.
  • Advanced knowledge of statistics, metrics (e.g., f1 score, inter-rater reliability), and data analysis methods such as sampling.
  • Experience with Natural Language Processing (NLP) techniques and tools, such as SpaCy, NLTK, or Hugging Face.
  • Proficiency in Python to handle and transform large datasets (e.g., pandas), perform quantitative analyses, and visualize data (e.g., matplotlib, seaborn).
  • Deep understanding of data pipelines to support ML and NLP workflows.
  • Knowledge of efficient data collection, transformation, and storage.
  • Knowledge of data structures, algorithms, and data engineering principles.
  • Excellent interpersonal skills for effective cross-functional stakeholder engagement.
  • Excellent problem-solving skills, with the ability to think critically and creatively to develop innovative AI solutions.
  • Ability to work independently and collaborate as part of a team.
  • Adaptable to changing technologies and methodologies.
  • Ability to translate experience, research and development information to understand client products and services.

Nice to Have

  • Conducting research to stay up-to-date with the latest advancements in generative AI, machine learning, and deep learning techniques.
  • Knowledge of optimizing existing generative AI models for improved performance, scalability, and efficiency.
  • Experience developing and maintaining ML/AI pipelines, including data preprocessing, feature extraction, model training, and evaluation.
  • Knowledge of fine-tuning pre-trained models to adapt them to specific tasks and datasets.
  • Developing clear and concise documentation to communicate complex AI concepts to both technical and non-technical stakeholders.
  • Contributing to establishing best practices and standards for generative AI development with customers and within the organization.
  • Providing technical mentorship and guidance to junior team members.
  • Understanding of techniques such as GPT, VAE, and GANs.

Technical Stack

  • Python
  • SpaCy
  • NLTK
  • Hugging Face
  • pandas
  • matplotlib
  • seaborn

Benefits & Compensation

  • Compensation: Up to $120k CAD

Work Mode

This is a local-country position. The role is remote within Canada (excluding Quebec).

Required Skills
PythonSpaCyNLTKHugging Facepandasmatplotlibseaborndata sciencemachine learningnatural language processingdata analysisstatisticsdata visualizationmodel trainingdata annotation
Got hired remotely?

Get paid like a professional

Remote clients expect company invoices, not personal PayPal requests. Glopay forms an EU partnership that makes you look legitimate while you stay independent.

Professional invoices with EU company details
Compliance handled automatically
Withdraw to any bank account
Income reports for easy tax filing
Create free account
Free signup • 5 min setup
About company
Innodata Inc

Innodata (NASDAQ: INOD) is a leading data engineering company and AI technology solutions provider, serving over 2,000 customers including 4 out of 5 of the world’s biggest technology companies. By combining advanced ML/AI technologies, a global workforce of subject matter experts, and a high-security infrastructure, they provide clean and optimized digital data solutions across multiple industries.

Visit website
Job Details
Category data
Posted a month ago