- Company Name
- Avenue Code
- Job Title
- Lead Site Reliability Engineer
- Job Description
-
**Job Title:** Lead Site Reliability Engineer
**Role Summary**
Lead and mentor a growing SRE team, partnering with product engineering to design, build, and operate cloud-native infrastructure. Drive reliability, performance, security, and cost optimization across AWS environments using IaC, CI/CD, observability, and DevOps best practices.
**Expectations**
- Serve as a technical authority and hands‑on leader for production systems.
- Guide the team in adopting engineering excellence, foster knowledge sharing, and remove technical roadblocks.
- Own end‑to‑end reliability metrics, SLIs/SLOs, error budgets, and continuous improvement.
**Key Responsibilities**
- Provide technical leadership and mentorship through code reviews, design discussions, and unblock support.
- Author and maintain runbooks, standards, and best‑practice guides.
- Automate infrastructure provisioning and deployments with Terraform; integrate CI/CD pipelines (GitHub Actions, ArgoCD, Jenkins).
- Define and manage SLIs/SLOs, error budgets, and dashboards for system health.
- Enforce security and compliance: least‑privilege IAM, vulnerability scans, audit logging.
- Implement observability: metrics, logs, distributed tracing; create alerting and custom dashboards.
- Optimize cloud cost: tagging, right‑sizing, data‑driven resource decisions.
**Required Skills**
- Proven experience operating production‑critical systems with deep SRE and DevOps knowledge.
- Strong leadership and mentoring, with a track record of leading technical projects or teams.
- Deep proficiency in AWS Cloud and cloud‑native best practices.
- Kubernetes orchestration (EKS, GKE) at scale; CI/CD pipeline integration (GitHub Actions, ArgoCD, Jenkins).
- Terraform expertise; knowledge of managing state, Terragrunt, and project structure.
- Database debugging: Redis, Postgres.
- Networking: VPC, VPN, load balancing, and cloud networking components.
- Git workflow competency (branching strategies, pull requests).
- Solid understanding of web/network protocols (HTTP, REST, TLS, DNS).
**Required Education & Certifications**
- Bachelor's degree in Computer Science, Engineering, or related field (or equivalent professional experience).
- Relevant certifications (e.g., AWS Solutions Architect, Kubernetes Administrator) are a plus.
Mountain view, United states
Hybrid
Senior
30-09-2025