×
Register Here to Apply for Jobs or Post Jobs. X

Device Machine Learning Engineer Austin, TX

Job in Austin, Travis County, Texas, 78716, USA
Listing for: Webai
Full Time position
Listed on 2026-01-07
Job specializations:
  • IT/Tech
    AI Engineer, Machine Learning/ ML Engineer
Job Description & How to Apply Below
Position: On-Device Machine Learning Engineer Austin, TX

About Us:

webAI is pioneering the future of artificial intelligence by establishing the first distributed AI infrastructure dedicated to personalized AI. We recognize the evolving demands of a data-driven society for scalability and flexibility, and we firmly believe that the future of AI lies in distributed processing at the edge, bringing computation closer to the source of data generation. Our mission is to build a future where a company's valuable data and intellectual property remain entirely private, enabling the deployment of large-scale AI models directly on standard consumer hardware without compromising the information embedded within those models.

We are developing an end-to-end platform that is secure, scalable, and fully under the control of our users, empowering enterprises with AI that understands their unique business. We are a team driven by truth, ownership, tenacity, and humility, and we seek individuals who resonate with these core values and are passionate about shaping the next generation of AI.

About the Role

We’re looking for an On-Device Machine Learning Engineer to bring modern ML capabilities directly onto consumer hardware, specifically fast, private, and reliable. You’ll own the design, optimization, and lifecycle of models running locally (e.g., iPhone/iPad/Mac-class devices), with a sharp focus on latency, battery, thermal behavior, and real-world UX. This role sits at the intersection of ML systems, product engineering, and performance tuning, and will help power local RAG, memory, and personalized experiences without relying on the network.

What You’ll Do

On-device model optimization and deployment

  • Convert, optimize, and deploy models to run efficiently on-device using Core ML and/or MLX.

  • Implement quantization strategies (e.g., 8-bit / 4-bit where applicable), compression, pruning, distillation, and other techniques to meet performance targets.

  • Profile and improve model execution across compute backends (CPU/GPU/Neural Engine where relevant), and reduce memory footprint.

Local RAG + memory systems

  • Build and optimize local retrieval pipelines (embeddings, indexing, caching, ranking) that work offline and under tight resource constraints.

  • Implement local memory systems (short/long-term) with careful attention to privacy, durability, and performance.

  • Collaborate with product/design to translate “memory” behavior into concrete technical architectures and measurable quality targets.

Model lifecycle on consumer hardware

  • Own the on-device model lifecycle: packaging, versioning, updates, rollback strategies, on-device A/B testing approaches, telemetry, and quality monitoring.

  • Build robust evaluation and regression suites that reflect real device constraints and user workflows.

  • Ensure models degrade gracefully (low-power mode, thermals, back grounding, OS interruptions).

Performance, reliability, and user experience

  • Treat battery, thermal, and latency as first-class product requirements: instrument, benchmark, and optimize continuously.

  • Design inference pipelines and scheduling strategies that respect app responsiveness, animations, and UI smoothness.

  • Partner with platform engineers to integrate ML into production apps with clean APIs and stable runtime behavior.

What We’re Looking For
  • Strong experience shipping ML features into production, ideally including mobile / edge / consumer devices.

  • Hands‑on proficiency with Core ML and/or MLX, and the practical realities of running models locally.

  • Solid understanding of quantization and optimization techniques for inference (accuracy/perf tradeoffs, calibration, benchmarking).

  • Experience building or operating retrieval systems (embedding generation, vector search/indexing, caching strategies)—especially under resource constraints.

  • Fluency in performance engineering: profiling, latency breakdowns, memory analysis, and tuning on real devices.

  • Strong software engineering fundamentals: maintainable code, testing, CI, and debugging across complex systems.

Nice to Have
  • Experience with on-device LLMs, multimodal models, or real-time interactive ML features.

  • Familiarity with Metal / GPU compute, or performance tuning of ML workloads on Apple platforms.

  • Experien…

To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary