Backend Engineer, AI Systems
Listed on 2026-06-18
-
Software Development
Backend Developer, AI Engineer (Applied/Software), Machine Learning/ ML Engineer
About the Company
A1 is building a proactive AI chat app for everyday users to bring intelligence to conversations, errands, organising and workflows. Unlike traditional chat-based applications, our product focuses on achieving high reliability for long-running workflows, persistent context, and real-world task completion. The system must handle multi-step reasoning, interact with external tools, and remain reliable despite non-deterministic model behavior.
Role OverviewAs a Backend Engineer, AI, you own the inference and orchestration layer that powers every AI interaction in the product. Your work sits between models and users, where latency, correctness, reliability, and cost directly impact real-world experience. Build and operate production systems that turn model capability into fast, stable, observable APIs used across mobile and desktop clients.
Focus- Build and operate backend systems that serve AI-powered features in production.
- Design inference pipelines and orchestration layers that handle multi-step workflows, tool calls, and retries.
- Manage the full lifecycle of AI requests: routing, caching, batching, streaming, and state management.
- Optimize latency, throughput, and cost across model inference and downstream systems.
- Design systems that remain reliable despite non-deterministic model behavior and external dependencies.
- Implement observability for AI systems, including logging, tracing, and debugging of model outputs and failures.
- Collaborate with ML and product teams to translate model capabilities into stable, production-grade APIs.
- Strong backend engineering fundamentals in production environments.
- Experience running high-throughput, low-latency services.
- Familiarity with AI inference patterns (LLMs, embeddings, multimodal).
- Comfortable debugging distributed systems under load.
- Bias toward shipping and learning from production behavior.
- Backend systems run reliably at scale, handling production AI traffic with low latency and high throughput.
- Multi-step AI workflows complete successfully across tools and services, with robust handling of failures and retries.
- APIs are stable, clear, and support seamless integration with frontend and ML systems.
- Production incidents are quickly detected, diagnosed, and resolved, minimizing user impact.
- Iterative improvements based on real usage continuously increase system performance and reliability.
- System design evolves to support increasing scale, complexity, and new AI capabilities without major rewrites.
- Python
- Node Js
- Pytorch
- OpenAI / Anthropic / open-source LLMs
- SQL & No
SQL - Kubernetes
- Docker
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).