More jobs:
Site Reliability Engineer
Job in
Irvine, Orange County, California, 92606, USA
Listed on 2026-06-06
Listing for:
TP-Link Corp
Full Time
position Listed on 2026-06-06
Job specializations:
-
IT/Tech
Systems Engineer, Cloud Computing
Job Description & How to Apply Below
We're looking for a passionate and experienced Site Reliability Engineer to join our team and play a crucial role in ensuring our cloud platform's security, Reliability, scalability, and operational excellence.
About Us:
Headquartered in the United States, TP-Link Systems Inc. is a global provider of reliable networking devices and smart home products, consistently ranked as the world's top provider of Wi-Fi devices. The company is committed to delivering innovative products that enhance people's lives through faster, more reliable connectivity. With a commitment to excellence, TP-Link serves customers in over 170 countries and continues to grow its global footprint.
We believe technology changes the world for the better! At TP-Link Systems Inc, we are committed to crafting dependable, high-performance products to connect users worldwide with the wonders of technology.
Embracing professionalism, innovation, excellence, and simplicity, we aim to assist our clients in achieving remarkable global performance and enable consumers to enjoy a seamless, effortless lifestyle.
Responsibilities:
* Assist in implementing and operating Microservices on Kubernetes cloud-based platforms.
* Collaborate with the Cloud Technical Development and Dev Ops teams to deploy services to the Multi-Cloud Platform.
* Conduct Load Tests and Chaos Tests to ensure the scalability and reliability of microservices.
* Build observability for Microservices and cloud platforms like AWS, OCI, Azure, and GCP.
* Contribute to writing and executing disaster recovery plans in collaboration with the Development and Dev Ops teams.
* Help analyze and resolve production risks caused by insufficient resources, such as node groups, CPU, memory, HPA scheduling, JVM pre-warming, etc.
* Write and maintain scripts for automation using languages like Python, Go, or Bash.
* Assist in defining and maintaining the KPIs (SLA/SLO/SLI) for all cloud microservices with development teams to better understand the business.
* Create and maintain technical documentation, including architecture diagrams, design documents, and standard operating procedures.
* Ensure adherence to security and compliance standards, including ISO
27001, SOC2, and GDPR.
* Participate in incident response efforts to troubleshoot and resolve production issues quickly.
* Conduct post-incident analysis to identify root causes and potential workarounds/solutions.
* Contribute to product/technology selection, including implementation of POCs.
* Be adaptable to change and evolving processes and tools.
* Participate in mentoring and training less senior members of the team.
* Be part of the on-call rotation and provide support after work hours and on weekends.
* Other duties as assigned.
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×