Senior DevOps/SRE Engineer Job Ellicott City area,Maryland USA,IT/Tech

We are seeking a skilled mid‑level Senior Dev Ops Site Reliability Engineer (SRE) to ensure the reliability, availability, and performance of enterprise services hosted across Cloud Service Providers (CSPs) and on‑prem data centers. The SRE is responsible for the practical implementation of Site Reliability Engineering (SRE) principles through best practices, operations, and monitoring. Speed and stability are carefully balanced; and the SRE team acts as versatile problem solvers, filling gaps in knowledge and expertise to ensure efficient software operations.

If you are a proactive problem solver with a passion for continuous learning and innovation, join us as we endeavor to increase the dynamism and efficacy of our Dev Ops practices.

Eligibility:

Must be a US citizen or authorized to work in the United States.
Must have lived in the USA for three (3) of the last five (5) years.
Must be able to obtain a US federal government badge and be eligible for Public Trust clearance.
Must be able to pass a VITG background check, including a drug test.

We’re looking for candidates who:

Demonstrate hand‑on expertise in SRE principles, maintaining quality and stability of enterprise services in a continuous development environment.
Have experience designing and developing solutions using various AWS services.
Can develop scripts in Shell/Bash, Python and deploy them as step/lambda functions.
Have experience with monitoring and observability tools such as Splunk, Datadog, and New Relic.
Are skilled at troubleshooting issues while leveraging monitoring tools, AWS services, etc.
Can analyze, identify, and document root‑cause analysis.
Possess strong technical background and articulate technical concepts verbally and in writing.
Are eager to learn new technologies quickly and perform Proof of Concepts (POCs) based on project needs.
Use problem‑solving skills in monitoring system performance, troubleshooting, crisis management, etc.
Produce high‑quality work independently and collaboratively.
Excel in a fast‑paced environment.
Demonstrate effective communication and collaboration as a team player.

Job Responsibilities:

Design and develop monitoring solutions using approved AWS services and IaC tools.
Develop and maintain CI/CD pipelines using Git Hub and Jenkins.
Develop serverless functions and scripts using Python, curl, and/or Bash.
Apply observability best practices to proactively detect potential software issues and implement preventive measures.
Set and monitor critical metrics (latency, traffic, errors, saturation) to gain insights into system reliability.
Learn and adapt new technologies to perform POCs based on project needs.
Provide guidance, training, and support for external development teams to manage their infrastructure independently.
Develop, publish, and maintain all required documentation in the repository and ticketing system (e.g., Confluence and Jira).
Respond quickly and effectively to critical incidents, conduct post‑incident reviews to identify root causes and implement preventive measures.
Collaborate effectively with cross‑functional teams and communicate SRE concepts and recommendations clearly to both technical and non‑technical stakeholders.
Participate in reliability‑based release management processes.
Plan, participate, and manage on‑call rotations to ensure prompt response to reported performance and reliability issues.
Attend ongoing and ad‑hoc meetings with internal and external stakeholders.
Stay up‑to‑date with the latest industry trends, technologies, and best practices related to SRE, Dev Ops, and infrastructure management.

Our Tech Stack (Must have):

CI/CD:
Git Hub, Jenkins, Terraform, Cloud Formation, Containers, Docker.
Monitoring & Alerting:
Datadog, AWS Cloud Watch (including canaries and X‑Ray), Splunk (Enterprise, ITSI and On‑Call), New Relic.
OS:
Windows servers, Amazon Linux, Red Hat, Citrix VDI.

Certifications:

AWS Certified Sys Ops/Dev Ops Associate or equivalent AWS certification (Required).
Splunk Core Certified Certification (Strongly Preferred).

Job Type: Full Time (No 1099 or C2C)

Benefits:

401(k) with employer contribution.
Medical/Dental/Vision insurance (option for full coverage for employee).
Life, Short‑Term/Long‑Term disability insurance.
Company‑paid holidays and paid vacation (PTO).

Schedule:

8 hour shift during core business hours.
May include minimal after hours support depending on on‑call schedule.

Work Type:

Currently hybrid remote in Ellicott City, MD.
Minimum 2 days in office weekly.

Seniority level

Mid‑Senior level

Employment type

Full‑time

Job function

Information Technology

Industries

IT Services and IT Consulting

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language

Senior DevOps​/SRE Engineer

Senior DevOps/SRE Engineer