More jobs:
Principal Software Engineer; Storage Cache
Job in
San Francisco, San Francisco County, California, 94199, USA
Listed on 2026-05-29
Listing for:
Roblox
Full Time
position Listed on 2026-05-29
Job specializations:
-
Software Development
Software Engineer, DevOps, Cloud Engineer - Software
Job Description & How to Apply Below
Requirements
- Experience &
Education:
A BS degree in Computer Science (or equivalent professional experience) with at least 8+ years of hands-on software engineering experience , - Distributed Systems Expertise:
Deep domain knowledge in building and operating large-scale distributed systems , - Infrastructure Chops: A strong builder mindset with proven experience running Active/Active distributed systems on container orchestrators like Kubernetes or Nomad ,
- Programming Proficiency:
Strong, hands-on programming experience in Go and C++ , - Problem-Solving Track Record:
Proven success in resolving massive-scale bottlenecks, such as overcoming the limitations of decentralized Gossip protocols or mitigating partial failures in distributed systems , - Observability
Skills:
Hands-on experience with modern telemetry and observability stacks (e.g., Prometheus, Grafana, Alert Manager, Kibana) , - [Bonus] Open Source Contributions: A track record of contributing to or maintaining major open-source caching projects such as Redis, Val Key, or Memcached ,
- [Bonus] Advanced Cache Internals:
Experience extending cache functionality (e.g., writing custom Redis modules in C/Rust, complex Lua scripting) or deep-tuning underlying memory allocators like jemalloc , - [Bonus] Caching Proxies & Topologies:
Experience with caching proxies (e.g., Twemproxy, Envoy Redis filter) and designing complex, multi-tiered caching architectures
- As a Principal Engineer on the Cache team (part of the Infra Storage org), you will innovate and operate large-scale, in-house distributed systems to solve Roblox's ever-growing caching challenges ,
- You will report directly to the Engineering Manager for the Cache team ,
- Lead the architectural transition to a next-generation, multitenant caching service built on Val Key, ensuring strict data, resource, and failure isolation for all tenants ,
- Drive systemic optimizations to mitigate head-of-line blocking, manage hot keys, and maximize CPU and memory utilization across physical machine clusters ,
- Design and build robust frameworks to automate development, chaos testing (fault/latency injection), and monitoring for 24x7 mission-critical services, targeting 99.99%+ availability and elastic scalability ,
- Champion engineering best practices by leading design reviews, performance benchmarking, failure drills, and blameless post-incident retrospectives ,
- Mentor and empower engineers, fostering a culture of deep domain expertise and seamless knowledge sharing across the Storage, Platform, and Product teams
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×