AI Engineer Job Bremen area,Georgia USA,Software Development

Position: AI Engineer (all genders)

At ellamind, we build evaluation-first AI infrastructure. Our platform elluminate turns AI evaluation from ad-hoc “vibe checks” into rigorous, repeatable engineering to enable teams to test, measure, and improve LLM applications with confidence.

What you’ll do

Build AI evaluation systems: Design and implement intelligent systems that automatically assess LLM outputs for quality, safety, and compliance with user-defined requirements across diverse use cases and industries.
Integrate cutting‑edge AI technologies: Work with multiple LLM providers and AI platforms, ensuring seamless, reliable connections that handle real‑world production scenarios at scale.
Develop automated testing frameworks: Create sophisticated workflows that enable teams to systematically evaluate their AI applications through batch testing, comparing model configurations, and tracking performance over time.
Optimize evaluation workflows: Design efficient systems that balance evaluation quality, cost, and speed, enabling customers to run comprehensive tests without breaking their budgets or timelines.
Build prompt optimization infrastructure: Develop systematic approaches to analyze prompt performance across large datasets, identify failure patterns, implement A/B testing frameworks, and create data‑driven optimization pipelines that surface actionable insights.
Scale AI operations: Architect systems that handle high‑volume evaluation workloads, managing concurrent processing, resource allocation, and ensuring consistent results across thousands of test cases.
Advance evaluation methodologies: Research and implement novel approaches to AI testing, quality measurement, and automated scoring that push the boundaries of what’s possible in AI evaluation.
Drive technical innovation: Explore emerging AI capabilities, experiment with new models and techniques, and integrate breakthrough technologies that give our customers a competitive advantage.
Ensure production reliability: Build robust, enterprise‑grade systems with proper monitoring, error handling, and quality assurance that customers can depend on for critical AI validation workflows.

You’ll work across our Python‑based platform, collaborating with fullstack engineers, product teams, and directly with customers to understand their evaluation challenges and deliver solutions that make rigorous AI testing accessible to teams of all sizes.

What we’re looking for

Must‑haves

Strong Python engineering skills: Experience building production AI systems with clean, maintainable code, comprehensive testing, and performance optimization at scale.

Hands‑on LLM experience: Practical work with OpenAI, Anthropic, or similar APIs – you’ve built features that call LLMs, handle responses, implement retry logic, and solve real‑world reliability and consistency challenges.
Software engineering fundamentals: Solid understanding of API design, data modeling, async processing, error handling, and building distributed systems that scale efficiently.
AI systems thinking: Experience designing evaluation methodologies, understanding model behavior and limitations, debugging inconsistent outputs, and implementing quality assurance for AI applications.
Enthusiasm for AI reliability: Genuine interest in testing, measuring, and improving AI systems – you care about building AI that works consistently and can be trusted in production.
On‑site collaboration: ≥3 days/week in Berlin or Bremen. Travel to our Bremen HQ during onboarding.
Fluency in English: At least B2 level for team collaboration and technical discussions.
Valid EU work authorization.

Nice‑to‑haves

Experience with AI evaluation frameworks, LLM benchmarking, and automated testing methodologies for AI systems.
Background in LLM fine‑tuning, RAG architectures, embedding models, or other advanced AI techniques.
Experience building developer tools, SDKs, or platforms for AI/ML teams.
Familiarity with experiment tracking platforms, versioning systems for prompts/models, or MLOps workflows.
Comfort with backend frameworks (Django, FastAPI) and databases (Postgre

SQL) – you can work across the stack when needed.
Experience with async workers, Docker/Kubernetes, and CI/CD workflows.
Understanding of AI safety, compliance requirements, or privacy‑sensitive/on‑prem deployments.
Experience working directly with clients or end‑users to understand requirements, gather feedback, and translate technical solutions into business value.
German language skills.

What matters most

We prioritize demonstrated excellence in your projects and career. If you’re motivated to build and optimize AI solutions, we want to hear from you—even if you don’t meet every single criterion.

Diversity & inclusion

Different perspectives make us stronger. We welcome applicants from all backgrounds and encourage you to apply.

#J-18808-Ljbffr