Site Reliability Engineering Professional
Listed on 2026-03-12
-
IT/Tech
Cloud Computing, Systems Engineer, Systems Administrator, IT Support
Introduction
At IBM Software, we transform client challenges into solutions. Building the world’s leading AI-powered, cloud-native products that shape the future of business and society. Our legacy of innovation creates endless opportunities for IBMers to learn, grow, and make an impact on a global scale. Working in Software means joining a team fueled by curiosity and collaboration. You’ll work with diverse technologies, partners, and industries to design, develop, and deliver solutions that power digital transformation.
With a culture that values innovation, growth, and continuous learning, IBM Software places you at the heart of IBM’s product and technology landscape. Here, you’ll have the tools and opportunities to advance your career while creating software that changes the world.
As a Site Reliability Engineering Professional, you will utilize your broad and deep skills in operations to ensure the resiliency, reliability, security, and scalability of I/T systems. You will apply your expertise in a variety of coding technologies to drive system excellence. Your primary responsibilities will include:
- Implement Resiliency Strategies:
Design and implement strategies to improve the resiliency and reliability of I/T systems, ensuring minimal downtime and optimal performance. - Ensure System Security:
Develop and apply security measures to protect I/T systems from potential threats and vulnerabilities, maintaining the integrity of sensitive data. - Optimize System Scalability:
Analyze and optimize system architecture to ensure seamless scalability, meeting the evolving needs of the organization. - Apply Coding Expertise:
Utilize coding skills to develop and maintain tools and scripts that support system reliability, resiliency, and scalability. - Collaborate with Teams:
Work closely with cross-functional teams to identify and address system issues, sharing expertise and best practices to drive system excellence.
Master's Degree
Required Technical And Professional Expertise- 5+ years of experience with similar role - experience in AWS, cloud operation/administration, Dev Ops and application support
- 3+ years experience in Linux/Open Shift Server System Administration
- 3+ years Experience working with any configuration management and infrastructure orchestration tools such as Ansible, Cloud Formation or Terraform;
- Understanding of containerization technologies
- Experience with maintaining Kubernetes-based applications on cloud infrastructure
- General scripting and automation skills in at least one language (Bash, Python, Go, Jenkins, Ansible)
- Experience providing 24/7 support duties and with incident response and security-focused mindset
- Familiarity with the usage of one or more Cloud Platforms (IBM Cloud, Amazon Web Services, Microsoft Azure)
- Strong debugging and problem-solving skills
- Passion for building and maintaining reliable and resilient systems
- Certified Redhat openshift or Certified Redhat administrator
- Experience with monitoring tools such as Instana
- Strong problem solving and communication skills
Should have strong understanding of networking and storage domains - Excellent collaboration skills with diverse, multi culture and multi-ethnicity team members and operation teams
- Due diligently Maintain latest security and compliance posture for infrastructure supported.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).