We are seeking an exceptional Data Scientist that with a profound understanding of Large Language Models (LLMs) and a strong background in Automatic Speech Recognition (ASR), with in-depth experience working on various NLP domains such as text summarization, language models, sentiment analysis, named entity recognition, machine translation, question answering, chatbots, speech recognition, and more.
Key responsibilities:
• Design, build, and fine-tune large Language Model-based and approaches (e.g., Mixtral, Llama) for natural language understanding and generation tasks.
• Train, fine-tune, and optimize Automatic Speech Recognition (ASR) models using our private datasets.
• Develop, train, and evaluate custom NLP models for a variety of applications including text summarization, sentiment analysis, named entity recognition.
• Conduct data curation, preprocessing, and feature engineering for NLP datasets.
• Develop and improve existing pipelines to train, evaluate and deploy ML models.
• Continuously monitor system performance, troubleshoot issues, and implement improvements/optimizations.
• Collaborate with cross-functional teams to integrate NLP solutions into production environments.
• Stay updated with the latest trends and advancements in NLP and deep learning.
Required Skills and Qualifications:
• Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
• Strong proficiency in Python, with extensive experience in PyTorch and/or TensorFlow.
• Familiarity with NLP libraries such as NLTK, SpaCy, and Hugging Face Transformers.
• Proven experience in managing and analysing large text and/or audio datasets.
• Strong understanding of foundational DL algorithms.
• Experience with unit and integration tests (Pytest) and CICD pipelines.
• Experience with data versioning, model management, experiment tracking.
• Familiarity with key evaluation metrics and optimization techniques.
• Extensive knowledge and hands-on experience in Docker.
• Excellent problem-solving, analytical, and communication skills.
Nice to Have:
• Extensive experience and demonstrated proficiency in developing Large Language Models (LLMs), with a strong focus on integrating these models into Automatic Speech Recognition (ASR) or Text-to-Speech (TTS) applications.
• Experience with Prompt Engineering techniques (e.g., few-shot prompting, chain-of-thought reasoning), Retrieval Augmented Generation (RAG), and familiarity with LLM frameworks like LangChain, LlamaIndex, or Haystack.
• Proven track record in developing ASR or TTS systems, with hands-on experience working with diverse components such as acoustic models, language models, and pronunciation models.
• Hands-on experience with Vector/Graph Databases and their use in semantic search.
• Proficiency with Azure services, including ML Studio, Functions, and Blob Storage, or equivalent services from other cloud providers.
What we give you in return:
• We offer a highly competitive salary
• A detailed company training on highest standards
• A chance to work in friendly and supportive culture
• Tremendous growth opportunities in a large fast moving international company