Senior Artificial Intelligence Engineer Job Cary area,North Carolina USA,IT/Tech

Senior Artificial Intelligence Engineer

We are hiring a Senior AI Engineer to design, build, and operate enterprise AI systems across our client portfolio. You will work end-to-end across the AI stack — from inference engines and platform infrastructure (vLLM, KV cache, Dynamo-style serving, GPU-accelerated AI Factory platforms) up through application-level engineering (RAG pipelines, agent workflows, prompt engineering, evaluation methodology).

This role is for an engineer who can lead work streams independently, mentor more junior engineers, and serve as the technical authority that clients trust to deliver production AI outcomes. You'll engage directly with client architects, data scientists, application teams, and executives — and you'll leave each engagement having raised both the client's capability and Blue Ally's practice.

Key Responsibilities

Lead end-to-end design, build, and operation of AI systems on AI Factory platforms (HPE PCAI, Dell AI Factory, Nutanix Enterprise AI, and adjacent ecosystem layers) across multiple client engagements.
Engineer and tune LLM inference serving stacks — primary depth in vLLM with breadth across the inference ecosystem — for client latency, throughput, and cost targets.
Tune inference performance through KV cache management, paged attention, batching strategies, and Dynamo-based disaggregated serving.
Architect and operate MLOps pipelines covering model lifecycle, registries, deployment, rollback, and observability.
Design and engineer RAG applications on top of vector databases — chunking strategies, retrieval tuning, reranking, citation handling, and context-window management.
Build and tune prompt-engineering patterns at production scale — system prompts, structured output, tool and function calling.
Design and maintain LLM evaluation harnesses — golden sets, regression suites, and online quality metrics.
Engineer high-performance storage and networking for AI workloads — parallel file systems, object storage tiers, and high-throughput, low-latency RDMA fabrics.
Operate Kubernetes clusters underpinning AI workloads — name spaces, RBAC, resource quotas, network policies, storage classes, and ingress.
Build and maintain container images, registries, and CI/CD pipelines for AI/ML services.
Implement monitoring, alerting, logging, and capacity planning across the AI stack.
Harden environments to meet client security and compliance requirements.
Lead troubleshooting across bare metal, BIOS/firmware, OS, containers, GPUs, frameworks, and models.
Engage directly with client stakeholders — technical and executive — to communicate status, root cause, options, and recommendations.
Mentor and code-review work from less senior engineers; raise the technical bar of every engagement you join.
Author runbooks, reference architectures, and knowledge base content; lead client knowledge transfer and enablement sessions.
Participate in on-call rotation and incident response for production AI workloads.
Contribute reusable patterns, tooling, and reference designs back to the practice.

Required Qualifications

Experience: 7+ years of software, data, or infrastructure engineering, with 3+ years specifically working with modern AI / LLM systems.
Software engineering: Production-quality Python at engineering level — testing, code review, version control fluency, and shipping code that other engineers depend on.
Linux engineering: Deep production Linux experience, including system internals, performance tuning, and troubleshooting.
Containers: Deep proficiency with Docker — image build, registry management, runtime tuning, and container security.
Hardware fundamentals: Strong server-platform skills including CPU/GPU topologies, PCIe, BMC management, BIOS/firmware lifecycle, and physical-to-logical troubleshooting.
AI Factory platforms: Hands-on experience deploying and operating one or more of HPE PCAI, Dell AI Factory, or Nutanix Enterprise AI.
Inference stack — vLLM: Production experience deploying, tuning, and operating vLLM.
Inference stack breadth: Working knowledge of multiple inference and model-serving frameworks beyond vLLM, with the ability to choose and tune the right tool for each…