Site Reliability/Production Service Engineer; SRE - Cloud Operations

Position: Staff Site Reliability / Production Service Engineer (SRE) - Cloud Operations - Federal
Location: Orlando

Staff Site Reliability / Production Service Engineer (SRE) - Cloud Operations - Federal

Full-time
Employee Type:
Regular
Region: AMS - North America and Canada
Work Persona:
Flexible

Company Description

It all started in sunny San Diego, California in 2004 when a visionary engineer, Fred Luddy, saw the potential to transform how we work. Fast forward to today — Service Now stands as a global market leader, bringing innovative AI‑enhanced technology to over 8,100 customers, including 85% of the Fortune 500®. Our intelligent cloud‑based platform seamlessly connects people, systems, and processes to empower organizations to find smarter, faster, and better ways to work.

But this is just the beginning of our journey. Join us as we pursue our purpose to make the world work better for everyone.

Job Description

This is a Flexible position based in our Orlando, Florida office.

Our Flexible work persona requires a minimum of 2 days per week in the office.
In addition, this position requires shifts that cover weekend days.

Please Note: This position will include supporting our US Public Sector customers.

Service Now background screening, USFedPASS (US Federal Personnel Authorization Screening Standards) includes a credit check, criminal/misdemeanor check and taking a drug test. Any employment is contingent upon passing the screening.
Due to Federal requirements, only US citizens, US naturalized citizens or US Permanent Residents, holding a green card, will be considered.

The Service Now PSE (Production Service Engineering) team is a group of highly technical engineers who are tasked with maintaining and supporting the reliability, scalability and performance of the automations and platform to manage the Service Now cloud infrastructure.

Our engineers are empowered to drive technical resolutions across the technology stack of the cloud infrastructure and instance automations. They are also tasked with driving forward the operability and reliability of the automations to drive down the number of incidents and to reduce MTTR.

To accomplish this our engineers, combine solid analysis and troubleshooting skills, software development, networking and systems engineering expertise with a strong desire to be challenged by problems of scale and complexity and to make services better for our customers.

What you get to do in this role:

Investigate, Support and Provide sustainable resolution to issues within our cloud infrastructure and application stack.
Use your experience in software development, systems engineering and networking to proactively prevent repeatable issues.
Drive initiatives with partner teams to improve the reliability and performance of the cloud infrastructure through improved system design.
Drive a culture of intolerance to manual activity which results in a highly automated environment delivering scalable solutions.
Mentor and Coach other team members.

Important Note on the Role:

Availability for weekend shifts:
Must be able to work weekends, with corresponding days off during the week.
Willing to work 4 x 10 or 5 x 8 including weekends.
Required to be on‑call as needed.

Qualifications

To be successful in this role you have:

Experience in leveraging or critically thinking about how to integrate AI into work processes, decision‑making, or problem‑solving. This may include using AI‑powered tools, automating workflows, analyzing AI‑driven insights, or exploring AI's potential impact on the function or industry.
8+ years of experience with a Bachelor's degree or 6+ years of experience with a Master's degree in enterprise technical systems support, operations and development.
Expert knowledge of Linux systems.
Understanding of Networking services and protocols – Routing, Load Balancing, DNS, SNMP, HTTPS, TCP/IP, etc.
Working knowledge in one or more of the databases – Postgres, MySQL, Maria DB, Oracle.
Web application / API development and operations experience.
Experience in using Splunk for analysis and reporting.
Strong troubleshooting, analysis and problem‑solving skills.
Agile methodologies and software development lifecycle experience.
Familiarity with Cloud Technologies – AWS, Azure, GCP or Open Stack.
Engage with customers…


Increase/decrease your Search Radius (miles)



Job Posting Language

Site Reliability​/Production Service Engineer; SRE - Cloud Operations - Federal

Site Reliability/Production Service Engineer; SRE - Cloud Operations - Federal