Senior Software Engineer,AI Platform Job San Jose area,California USA,Software Development

About the Role

We are building an AI-powered cybersecurity platform that helps enterprises manage vulnerabilities AI team has built working services that analyze container images, standardize package data, extract natural language filters, and assess package maintenance — all powered by LLMs and intelligent automation.

Now we need an experienced software engineer to make these systems scale. You'll be the first dedicated engineering hire on the AI team, working alongside applied AI scientists and an AI infrastructure engineer to transform code into reliable, well-tested, and maintainable production services.

This is not an ML research role. This is a software engineering role on an AI team. You'll own the code quality, test coverage, CI/CD pipelines, and production reliability of services that call LLM APIs, interact with Azure cloud services, and serve critical data to our cybersecurity platform.

Key Responsibilities

Build the test suite from the ground up. You'll design the test infrastructure — unit tests with mocked LLM responses, integration tests against staging environments, and fixtures that make testing fast and reliable. You'll wire this into CI so nothing ships without passing tests.
Harden production services. Audit and fix security issues. Implement structured logging. Add health checks, metrics, and traces.
Improve the CI/CD pipeline. You'll add quality gates so the team catches issues before they reach production.
Refactor for maintainability. Extract shared patterns into reusable modules. Break apart oversized classes and reduce code duplication across services.
Fix dependency management. Introduce lock files for reproducible builds, remove unused dependencies, and resolve version inconsistencies across services.
Own the reliability and performance of our AI service fleet (Python/FastAPI microservices)
Build out observability — distributed tracing, latency dashboards, alerting on error rates and SLA breaches
Design and implement caching strategies, rate limiting, and circuit breakers for external API calls (Anthropic, Azure ML, package registries)
Collaborate with AI scientists on prompt engineering and output parsing, bringing engineering rigor to LLM integration patterns
Mentor mid-level engineers as the team grows

Required Qualifications

4+ years of professional software engineering experience with a strong backend focus
Deep Python expertise — not scripting, but well-structured production code. You understand when to use data classes vs Pydantic, how async/await actually works, and why global variables make testing painful
Testing as a core discipline. You've built test suites for services with external dependencies. You're comfortable with pytest, mocking, fixtures, and know how to test code that calls third‑party APIs without calling them
FastAPI or equivalent modern Python web framework experience (Django REST Framework, Flask with production patterns). You've designed and maintained REST APIs that other teams depend on
Azure or equivalent cloud platform experience. You've worked with managed container services, Kubernetes, managed databases, identity/auth systems, and CI/CD in a cloud environment. Azure preferred; AWS/GCP experience transfers well
CI/CD pipeline engineering. You've added test gates, lint checks, and automated quality enforcement to build pipelines. Experience with Azure Dev Ops Pipelines, Git Hub Actions, or Git Lab CI
Docker and containerization. You've written production Dockerfiles, understand multi‑stage builds, and have debugged container networking and configuration issues
Strong code review and collaboration skills. You'll be working with AI scientists who are strong in their domain but still developing engineering practices. You need to raise the bar without creating friction

Preferred Qualifications

Experience working with LLM provider APIs (Anthropic, OpenAI, Azure OpenAI) — understanding token limits, prompt design, structured output parsing, and retry patterns
Experience with structured logging (structlog), observability tools (Open Telemetry, Prometheus, Grafana), or APM platforms
Exposure to cybersecurity, vulnerability management, or compliance‑sensitive environments
Experience on a small engineering team at a startup, where you owned services end‑to‑end
Familiarity with RAG patterns, embedding pipelines, or vector databases (not required, but a plus for growth)

#J-18808-Ljbffr

Senior Software Engineer, AI Platform