Site Reliability Engineer
Listed on 2025-12-01
-
IT/Tech
Cloud Computing, Systems Engineer, Systems Administrator, IT Support
Overview
Lakeview IT is passionate about delivering high-quality products and services to our customers. Our technology operations team is committed to ensuring reliable, scalable, and high-performing services for our clients. We’re looking for a talented and motivated Site Reliability Engineer to join our dynamic team and help us continue to build and maintain a world‑class infrastructure.
As a Site Reliability Engineer, you will be responsible for ensuring the stability, reliability, and scalability of our production systems. You will work closely with development, engineering, infrastructure, and operations teams to design and implement solutions that improve system performance, reduce downtime, and automate repetitive tasks. The SRE role is a hybrid of systems engineering and operations engineering, where you’ll enhance operational processes, monitoring systems, and tooling to provide a seamless experience for our customers.
The ideal candidate will have a strong background in system administration, network management, and a keen interest in implementing new technologies to improve overall operational efficiency. Salary range for the role is between $120,000 and $130,000 with an annual bonus. The position can be 100% remote, but if located in the Agoura Hills, CA area the expectation will be that the role is hybrid.
- Proactively identify and resolve incidents before they impact operations.
- Monitor all systems and infrastructure for the highest level of availability.
- Perform routine maintenance tasks, including monitoring, patching, and backups.
- Respond to incidents and outages in a timely and effective manner.
- Collaborate with other teams to diagnose and resolve complex issues.
- Document incident details and implement corrective actions to prevent recurrence.
- Document processes, configurations, and troubleshooting procedures.
- Diagnose and resolve application performance problems or system outages.
- Play the role of Incident Manager during outages.
- Resolve complex hardware and software issues, and work with vendors when necessary.
- Optimize system performance and resource utilization on‑prem and in the cloud.
- Develop and maintain scripts to streamline repetitive tasks.
- Utilize scripting languages (e.g., Power Shell, Python, etc.) to automate system administration.
- Implement configuration management tools to ensure consistency and repeatability.
- Create and maintain comprehensive documentation of IT processes and procedures.
- Other duties as assigned by leadership.
- Strong understanding of IT infrastructure components, including servers, networks, and storage.
- Knowledge in scripting languages (e.g., Power Shell, Python).
- Knowledge of networking concepts and protocols (e.g., TCP/IP, DNS, DHCP).
- Experience with IT service management frameworks.
- Experience with cloud platforms such as AWS and Azure.
- Experience of virtualization technologies such as Azure VDI, AWS Work spaces.
- Experience with monitoring and alerting tools (e.g., New Relic, Datadog).
- Excellent problem‑solving and analytical skills.
- Strong communication and interpersonal skills.
- Extensive expertise in the Windows operating system.
Lakeview is an Equal Employment Opportunity employer. All aspects of consideration for employment and employment with the Company are governed on the basis of merit, competence and qualifications without regard to race, color, religion, sex, national origin, age, disability, veteran status, sexual orientation, or any other category protected by federal, state, or local law.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).