ML Engineer; LLM Platform
Listed on 2026-02-16
-
IT/Tech
AI Engineer, Machine Learning/ ML Engineer
Stealth Startup
• Remote (Canada/US overlap) (Remote)
We’re building AI-powered tools that help developers ship better software faster. Our platform leverages large language models to provide intelligent assistance, from code generation to documentation and debugging. We’re at the forefront of applying LLM technology to real-world developer workflows, serving thousands of users who rely on our platform daily.
The RoleWe’re seeking a Machine Learning Engineer who will be responsible for the entire lifecycle of our LLM-powered features, from initial prototyping through production deployment and optimization. You’ll work at the cutting edge of applied AI, building systems that combine foundation models with retrieval, fine-tuning, and prompt engineering to deliver reliable, high-quality results.
This role requires a unique blend of ML expertise and software engineering discipline. You’ll need to understand both the theoretical foundations of language models and the practical challenges of running them at scale in production. You’ll collaborate with product and engineering teams to identify opportunities where AI can add value, prototype solutions quickly, and build robust systems that deliver on that promise.
WhatYou’ll Do
- Design and implement LLM-powered features from conception to production, owning the entire ML lifecycle
- Build and optimize RAG (Retrieval-Augmented Generation) pipelines using vector databases and embedding models
- Develop sophisticated prompt engineering strategies and templates that consistently produce high-quality outputs
- Implement model evaluation frameworks to measure quality, safety, and performance across different use cases
- Fine-tune and adapt foundation models (GPT-4, Claude, Llama, etc.) for domain-specific tasks when beneficial
- Design and maintain vector search infrastructure using technologies like Pinecone, Weaviate, or pgvector
- Build monitoring and observability systems to track model performance, latency, costs, and quality in production
- Implement safety measures including content filtering, PII detection, and harmful output prevention
- Optimize inference costs through techniques like caching, model selection, and prompt optimization
- Experiment with emerging techniques: function calling, agents, chain-of-thought reasoning, and multi-step workflows
- Build tools and infrastructure that enable other engineers to work effectively with LLMs
- Stay current with rapid developments in the LLM space and evaluate new models and techniques
- Collaborate with product to define success metrics and iterate based on user feedback
Required:
- 4+ years of experience in machine learning engineering or applied AI roles
- Hands-on production experience with LLMs (OpenAI, Anthropic, open-source models)
- Strong understanding of transformer architectures, attention mechanisms, and language model fundamentals
- Experience building RAG systems with vector databases and semantic search
- Proficiency in Python and ML frameworks (PyTorch, Transformers, Lang Chain, Llama Index)
- Strong software engineering fundamentals: testing, version control, CI/CD, code review
- Experience with embeddings and similarity search at scale
- Understanding of prompt engineering techniques and best practices
- Practical knowledge of model evaluation, including automated and human-in-the-loop approaches
- Experience with Postgre
SQL or similar databases for storing structured data - Familiarity with cloud platforms (AWS, GCP, Azure) and containerization (Docker, Kubernetes)
- Strong problem-solving skills and ability to navigate ambiguity
- Excellent communication skills for explaining technical concepts to non-ML stakeholders
Nice to Have:
- Experience fine-tuning language models (LoRA, full fine-tuning, RLHF)
- Background in NLP research or publications in relevant conferences (ACL, EMNLP, NeurIPS)
- Familiarity with LLM evaluation frameworks (RAGAS, Tru Lens, Phoenix)
- Experience with LLM orchestration frameworks (Lang Graph, DSPy, Guidance)
- Knowledge of model quantization and optimization techniques (GGUF, AWQ, GPTQ)
- Experience running open-source LLMs (Llama, Mistral, Falcon) in production
- Understanding of AI safety and alignment…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).