Site Reliability Engineer
Listed on 2025-12-21
-
IT/Tech
Systems Engineer, Cloud Computing, IT Support, Cybersecurity
Get AI-powered advice on this job and more exclusive features.
This position is as a full time position supporting the financial services/payments space and is fully onsite 5 days per week with some on call support (rotation basis). Please apply only if you have experience with a valid work authorization. Unfortunately we cannot work via C2C or C2H, this is W2.
About BlankfactorAt Blankfactor, we are dedicated to engineering impact. We build high-quality tech solutions for companies looking to innovate and grow—especially in fast-moving industries like payments, banking, capital markets, and life sciences.
About the RoleAs a Site Reliability Engineer, you will ensure the reliability, availability, and performance of mission-critical platforms by building scalable systems, robust automation, and data-driven operations. You will partner closely with development, cloud, infrastructure, and security teams to deliver resilient, high-performing services that support the way people live and work today.
Responsibilities- Design and implement solutions that enhance application reliability, performance, scalability, and resilience.
- Build and maintain monitoring, alerting, observability, and telemetry to drive proactive detection and rapid incident response.
- Lead incident management efforts, perform root cause analysis, and implement action-oriented post-mortem improvements.
- Automate operational workflows using scripting, IaC, and configuration management tools.
- Analyze capacity, performance, and usage trends to forecast demand and optimize cloud costs.
- Collaborate with engineering teams to embed operability, resilience, and security into application and architecture designs.
- Support safe, reliable deployments through CI/CD pipelines, release governance, and change control.
- Maintain clear runbooks, architecture diagrams, and operational documentation that enable efficient production support.
- Managing Kubernetes and containerized workloads (EKS, AKS, GKE), including scaling, networking, upgrades, and orchestration.
- Experience in public cloud platforms (AWS, Azure, or GCP) across compute, storage, networking, IAM, and cost governance.
- Using observability and APM tools such as Dynatrace, Splunk, Prometheus, Grafana, Datadog, Extra Hop, etc.
- Implementing security and compliance controls in regulated environments (e.g., PCI DSS, SOC
2), including secrets management and vulnerability remediation. - Infrastructure as Code experience using Terraform, Cloud Formation, Ansible, or similar tools.
- Designing and maintaining CI/CD pipelines using Jenkins, Git Lab CI, Git Hub Actions, or Azure Dev Ops.
- Scripting and automation using Bash, Power Shell, or Python.
- Equivalent combination of education, experience, and/or military background.
- Certifications such as AWS Sys Ops Administrator, AWS Dev Ops Engineer, Google Cloud Dev Ops Engineer, or CKA.
- Experience with Premier applications, IBM iSeries, and/or Unisys systems.
- Hands-on database operations and performance tuning (Oracle, SQL Server, Postgre
SQL). - Proven experience in major incident command, stakeholder communication, and cross-team coordination.
- Experience with ITIL and Service Now (change, problem, and configuration management).
Mid-Senior level
Employment typeFull-time
Job functionConsulting and Information Technology
IndustriesSoftware Development
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).