Machine Learning Engineer Job Santa Barbara area,California USA,Software Development

Position: Staff Machine Learning Engineer

Description

Hi, We’re App Folio. We’re innovators, changemakers, and collaborators. We’re more than just a software company — we’re building the AI-native platform where the real estate industry comes to do business. We’re transforming Property Management; how property managers operate, how residents live, and how intelligence flows across an entire industry.

Realm‑X is App Folio’s AI-native platform powering this transformation. It enables a new generation of intelligent capabilities across our products, including Realm‑X Assistant (copilot), Flows (AI Agentic workflows) and Performers (autonomous AI Agents). Realm‑X serves as both a foundation for internal teams to build and scale AI-powered products, and a core layer delivering intelligent, high‑impact experiences directly to our customers.

At its core, Realm‑X is built on a structured domain ontology and a set of shared business primitives—such as transactions, actions, reports, metrics, and skills—that enable AI systems to deeply understand and operate across the full context of property management workflows. This foundation allows us to build context‑aware, action-oriented AI systems that go beyond simple assistance to power real automation and decision‑making.

Who

We Are Looking For

We’re hiring a Staff Machine Learning Engineer to help move forward the ML platform that every AI initiative at App Folio depends on — training, fine‑tuning, inference, RAG, evaluation, and cost. You’ll keep our AI cloud always‑on, observable, and economical, while staying close enough to applications to influence model and agent design.

This role works at the intersection of ML infrastructure, applied AI, and cost discipline. You’ll partner closely with our Voice & Agents and Research ML engineers to harden their prototypes into production systems, and help move forward the platform layer that lets Realm‑X scale across App Folio’s entire customer base.

Your Impact

ML Platform:
Design and operate App Folio’s ML infrastructure on AWS — ECS, Sage Maker, GPU fleets, model serving, autoscaling, and cost controls.
Drive AI Cost Discipline:
Optimize cost across all AI applications — provider routing, caching, batch vs. real‑time, model size selection, and inference economics.
Multi‑Provider Reliability:
Maintain reliable, multi‑provider LLM access across Google, OpenAI, and Anthropic with sensible fallbacks and abstractions.
Training & Fine‑Tuning Stack:
Build the training and fine‑tuning stack for Small Language Models, including data pipelines, GPU orchestration, and evaluation.
Productionize Research:
Partner with Voice & Agents and Research ML engineers to harden their prototypes into production systems with SLOs, on‑call rotations, and observability.
AI Safety & Guardrails:
Operate App Folio’s AI safety and authorization layer — guardrails on AWS, scoped tool permissions, and human‑in‑the‑loop gates for autonomous agent actions.

Qualifications

Systems thinker:
You think in terms of platforms and long‑term leverage, not just features.
Production builder:
You’ve built and scaled ML infrastructure in production with meaningful business impact.
Ambiguity:
You operate effectively in high ambiguity, turning unclear infra problems into clear direction.
Owner‑operator:
You take ownership with a founder/owner‑operator mindset, act with urgency, and focus on outcomes.
Pace:
You have a strong desire to move fast and deliver impact, while maintaining sound engineering judgment.
Collaboration:

You are humble, collaborative, and low‑ego, and you elevate those around you.
Sustainability:
You value work‑life balance as a foundation for sustained high performance.
Reliability mindset:
You treat ML infra like any other production system — SLOs, on‑call, observability, postmortems.

Must Have

ML infra at scale:
Has built and operated production ML infrastructure on AWS — ECS, Sage Maker, GPUs, autoscaling, and cost controls.
Inference platforms:
Production experience with model serving for both LLMs and custom models; understands quantization, batching, and routing.
Provider breadth:
Direct experience integrating with Google (Vertex / Gemini), OpenAI, and Anthropic APIs in production.
Training…