Platform Engineer - Infrastructure Job Dearborn area,Michigan USA,IT/Tech

We made history and now we work to transform the future - for our customers, our communities and our families. You'll see your work on the road every day, helping people move freely and pursue their dreams. At Ford, you can build more than vehicles. Come build what matters.

The Ford Motor Credit Company team helps put people behind the wheels of great Ford and Lincoln vehicles. By partnering with dealerships, we provide financing, personalized service and professional expertise to thousands of dealers and millions of customers in over one hundred countries around the world.

In this position...

Ford Credit Platform Engineering builds and operates the shared infrastructure and paved paths that help product teams deliver securely, reliably, and quickly. As a Cloud Infrastructure & SRE Engineer on our Platform Engineering team, you will design, build, and operate the cloud platforms and reliability practices that hundreds of engineers depend on every day.

This role leans toward cloud infrastructure, Dev Ops, and Site Reliability Engineering, with strong software development skills. You will write code to automate infrastructure, define reliability targets, and create self-service workflows that eliminate toil. You will operate what you build, participate in on-call rotations, and drive systemic improvements from every incident.

About the Team

Our Platform Engineering team sits at the intersection of product and infrastructure. We treat our platform capabilities as products, with users, documentation, support, and a roadmap driven by real feedback. We partner closely with product engineering teams, security, and SRE to standardize patterns around identity, networking, CI/CD, secrets management, and deployment.

We believe the best platform work is invisible. Developers should not have to think about cluster wiring, pipeline configuration, or operational boilerplate. They should focus on their features, and the platform should just work.

How We Work

* Automate first:
Eliminate repeatable manual work. Measure and reduce toil.

* Reliability is a feature:
Design for failure with timeouts, retries with jitter, idempotency, and graceful degradation.

* Small, safe changes:
Incremental delivery, clear rollback strategies, and continuous improvement.

* Engineering excellence:
Design reviews, blameless postmortems, and strong documentation and runbooks.

What Success Looks Like

* Platform capabilities are easy to adopt, well-documented, and measurably reduce lead time for change.

* Reliability improves over time, measured by SLO attainment, reduced incident frequency and severity, and faster MTTR.

* Security posture improves through secure-by-default patterns and automated controls.

Build and operate the cloud infrastructure, paved paths, and SRE practices that help product teams at Ford Credit deliver securely, reliably, and quickly.

What you'll do...

* Design, build, and operate cloud infrastructure and platform capabilities (networking, compute, Kubernetes, CI/CD, secrets, certificates, identity).

* Define and improve reliability using service-level indicators (SLIs), service-level objectives (SLOs), and error budgets.

* Implement observability (metrics, logs, traces) with actionable alerting focused on user impact.

* Create self-service workflows and automation (infrastructure as code, Git Ops, build/release pipelines) that reduce toil.

* Improve security and compliance through least-privilege access, secure defaults, policy-as-code, and continuous hardening.

* Participate in on-call rotation, incident response, and post-incident reviews; drive systemic fixes and runbook quality.

* Partner with application teams to improve deployability, resilience, and cost efficiency (capacity planning, autoscaling, graceful degradation).

What you'll do...

* Design, build, and operate cloud infrastructure and platform capabilities (networking, compute, Kubernetes, CI/CD, secrets, certificates, identity).

* Define and improve reliability using service-level indicators (SLIs), service-level objectives (SLOs), and error budgets.

* Implement observability (metrics, logs, traces) with actionable alerting focused on user impact.

* Create self-service workflows and automation (infrastructure as code, Git Ops, build/release pipelines) that reduce toil.

* Improve security and compliance through least-privilege access, secure defaults, policy-as-code, and continuous hardening.

* Participate in on-call rotation, incident response, and post-incident reviews; drive systemic fixes and runbook quality.

* Partner with application teams to improve deployability, resilience, and cost efficiency (capacity planning, autoscaling, graceful degradation).