ML Engineer; LLM Platform Job Phoenix area,Arizona USA,IT/Tech

Position: ML Engineer (LLM Platform)

Stealth Startup
• Remote (Canada/US overlap) (Remote)

About the Company

We’re building AI-powered tools that help developers ship better software faster. Our platform leverages large language models to provide intelligent assistance, from code generation to documentation and debugging. We’re at the forefront of applying LLM technology to real-world developer workflows, serving thousands of users who rely on our platform daily.

The Role

We’re seeking a Machine Learning Engineer who will be responsible for the entire lifecycle of our LLM-powered features, from initial prototyping through production deployment and optimization. You’ll work at the cutting edge of applied AI, building systems that combine foundation models with retrieval, fine-tuning, and prompt engineering to deliver reliable, high-quality results.

This role requires a unique blend of ML expertise and software engineering discipline. You’ll need to understand both the theoretical foundations of language models and the practical challenges of running them at scale in production. You’ll collaborate with product and engineering teams to identify opportunities where AI can add value, prototype solutions quickly, and build robust systems that deliver on that promise.

What

You’ll Do

Design and implement LLM-powered features from conception to production, owning the entire ML lifecycle
Build and optimize RAG (Retrieval-Augmented Generation) pipelines using vector databases and embedding models
Develop sophisticated prompt engineering strategies and templates that consistently produce high-quality outputs
Implement model evaluation frameworks to measure quality, safety, and performance across different use cases
Fine-tune and adapt foundation models (GPT-4, Claude, Llama, etc.) for domain-specific tasks when beneficial
Design and maintain vector search infrastructure using technologies like Pinecone, Weaviate, or pgvector
Build monitoring and observability systems to track model performance, latency, costs, and quality in production
Implement safety measures including content filtering, PII detection, and harmful output prevention
Optimize inference costs through techniques like caching, model selection, and prompt optimization
Experiment with emerging techniques: function calling, agents, chain-of-thought reasoning, and multi-step workflows
Build tools and infrastructure that enable other engineers to work effectively with LLMs
Stay current with rapid developments in the LLM space and evaluate new models and techniques
Collaborate with product to define success metrics and iterate based on user feedback

What We’re Looking For

Required:

4+ years of experience in machine learning engineering or applied AI roles
Hands-on production experience with LLMs (OpenAI, Anthropic, open-source models)
Strong understanding of transformer architectures, attention mechanisms, and language model fundamentals
Experience building RAG systems with vector databases and semantic search
Proficiency in Python and ML frameworks (PyTorch, Transformers, Lang Chain, Llama Index)
Strong software engineering fundamentals: testing, version control, CI/CD, code review
Experience with embeddings and similarity search at scale
Understanding of prompt engineering techniques and best practices
Practical knowledge of model evaluation, including automated and human-in-the-loop approaches
Experience with Postgre

SQL or similar databases for storing structured data
Familiarity with cloud platforms (AWS, GCP, Azure) and containerization (Docker, Kubernetes)
Strong problem-solving skills and ability to navigate ambiguity
Excellent communication skills for explaining technical concepts to non-ML stakeholders

Nice to Have:

Experience fine-tuning language models (LoRA, full fine-tuning, RLHF)
Background in NLP research or publications in relevant conferences (ACL, EMNLP, NeurIPS)
Familiarity with LLM evaluation frameworks (RAGAS, Tru Lens, Phoenix)
Experience with LLM orchestration frameworks (Lang Graph, DSPy, Guidance)
Knowledge of model quantization and optimization techniques (GGUF, AWQ, GPTQ)
Experience running open-source LLMs (Llama, Mistral, Falcon) in production
Understanding of AI safety and alignment…


Increase/decrease your Search Radius (miles)



Job Posting Language