×
Register Here to Apply for Jobs or Post Jobs. X

Senior Staff Engineer - Agent Platform

Job in Swansea, Swansea County, SA1, Wales, UK
Listing for: tem
Full Time position
Listed on 2026-05-31
Job specializations:
  • Software Development
    AI Engineer, Software Engineer, Machine Learning/ ML Engineer
Job Description & How to Apply Below

🏅 The Role

We are looking for a Senior Staff Software Engineer to help shape and scale our Agentic layer s role combines deep hands-on engineering with strategic technical ownership and organisational influence.

You will build the foundations that make AI agents production-ready at tem: the core runtime and tooling, the integration interfaces to our systems, and the engineering standards that let teams ship agentic capabilities safely, reliably, and repeatedly.

You will work end-to-end from early pilots to production roll-outs
, partnering closely across engineering, product, data, and domain teams to translate real workflows into durable agent-powered systems. This role requires strong cross-team influence, the ability to align technical design with product and business outcomes, and the judgment to balance rapid delivery with long-term system integrity.

🚀 Responsibilities
  • Ship flagship agentic capabilities: Deliver high-impact agentic workflows end-to-end, from discovery through production roll-out, with clear success metrics and fast iteration loops.

  • Build and operate production-grade agent systems: Design reliable agentic systems that behave predictably under real-world constraints, including latency, cost, data quality, and failure modes, with strong patterns for state management, idempotency, and safe recovery.

  • Create shared foundations for agent delivery: Develop the core primitives that enable teams to build agents consistently (runtime patterns, tool interfaces, context management, shared libraries) while avoiding one-off implementations.

  • Establish a pragmatic Agent Development Life Cycle (ADLC): Implement evaluations, guardrails, tracing, monitoring, and release processes so agents can be measured, debugged, and improved continuously.

  • Integrate ML and LLM components into production workflows: Work with ML/Data teams to product ionise models and LLM capabilities with clear contracts, versioning, observability, and safe degradation patterns.

  • Maintain clear domain boundaries as adoption scales: Define shared semantics for agent tools and data access, preventing domain drift while enabling teams to move quickly.

  • Collaborate with Platform on infrastructure and developer tooling: Adopt and extend existing CI/CD, Dev Ex, and observability systems, contributing back where agentic workloads introduce new requirements.

Success measures

  • 3 months: Ship first flagship agentic workflow to production with defined KPI, runbook/on-call ownership, and baseline telemetry (success rate, latency, cost).

  • 6 months: Ship additional workflows or expansions and implement lightweight ADLC: evals + guardrails + monitored rollouts + rollback.

  • 12 months: Prove repeatable capability: 2+ product teams shipping on shared foundations, faster time-to-prod for new agents, and reliability/cost targets consistently met.

🎯 Requirements

Must-Haves:

  • Architectural depth: Proven ability to design and evolve complex, stateful distributed systems spanning APIs, event-driven architectures, data systems, and agentic applications - where domain logic is the primary source of complexity. Proven patterns for high-throughput performance and scaling architecture to support hundreds of thousands of customers, while preventing domain drift.

  • Proven experience building AI agents in production
    , not just demos, with a clear understanding of current best practices (agent architectures, tool calling, RAG where appropriate, prompt and context engineering). Ability to run AI/agentic systems reliably in production with observability, incident readiness, and cost controls.

  • Deep experience with:

    • AWS serverless architecture (Lambda, API Gateway, Event Bridge, Step Functions)

    • Event-driven systems and asynchronous workflows

  • Strong coding skills: deep hands-on experience with a variety of coding languages, and comfortable with a tech-agnostic approach. Familiarity with Python is a must-have.

  • Agent quality discipline: hands-on experience with evaluations (offline and online), regression testing, safety guardrails, and monitoring for reliability, cost, and drift.

  • Strong backend and distributed-systems fundamentals: APIs, asynchronous workflows, state management, idempotency,…

Position Requirements
10+ Years work experience
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary