Skills

Teamwork Ruby Go MongoDB DevOps Docker Kubernetes Monitoring Networking Linux Programming Autonomy Organization Shell Redis Kafka Terraform Infrastructure as Code PostGres

Job Specifications

At Braze, we have found our people. We’re a genuinely approachable, exceptionally kind, and intensely passionate crew.

We seek to ignite that passion by setting high standards, championing teamwork, and creating work-life harmony as we collectively navigate rapid growth on a global scale while striving for greater equity and opportunity – inside and outside our organization.

To flourish here, you must be prepared to set a high bar for yourself and those around you. There is always a way to contribute: Acting with autonomy, having accountability and being open to new perspectives are essential to our continued success.

Our deep curiosity to learn and our eagerness to share diverse passions with others gives us balance and injects a one-of-a-kind vibrancy into our culture.

If you are driven to solve exhilarating challenges and have a bias toward action in the face of change, you will be empowered to make a real impact here, with a sharp and passionate team at your back. If Braze sounds like a place where you can thrive, we can’t wait to meet you.

What You'll Do

Site Reliability Engineers (SREs) are responsible for keeping all internal-facing services and platforms running smoothly. In a nutshell, SREs ensure site uptime. SREs blend sensible system administrators and software engineers who apply sound engineering principles, operational discipline, and mature automation to the environments and infrastructure services we provide. We specialize in systems–whether it be networking, the Linux kernel, or some more specific interest in scaling–algorithms or distributed systems.

Our team helps to improve automation, infrastructure reliability, and empowers Braze’s other engineering teams to leverage the infrastructure products and platforms we create easily. Braze operates at a massive scale with over 3.3 billion monthly active users across our customers, collecting hundreds of billions of data points each month, and sending billions of messages to end-users daily. We use a diverse technology stack rooted in Ruby on Rails, MongoDB, Redis, Kafka, Kubernetes, and more. As a Senior Site Reliability Engineer at Braze, you will collaborate with your team and consumer engineering teams to continuously improve the infrastructure, automation, and tooling that build internal products from these technologies.

Main Responsibilities

Partner with Braze’s engineering teams on:
Architecting products to effectively utilize infrastructure platforms in a scalable, reliable manner
Debugging reliability and scalability issues across all stack layers, including the products built using our infrastructure platforms
Make monitoring and alerting alerts on symptoms and not on outages
Ensure that Braze meets our strict enterprise-grade SLAs with customers
Develop Braze’s internal platform infrastructure:
Create Infrastructure as code using Chef, Terraform, and Kubernetes
Develop deployment pipelines for applications in multiple languages using Docker, Kubernetes, etc
Provide centralized/common tooling, services, and automation frameworks that are critical for scaling operations, capacity management, reducing operational pain, and improving the day-to-day workflow of Braze’s engineering teams
Manage incidents:
Be on a PagerDuty rotation to respond to availability incidents and provide support for other engineers
Use your on-call shift to prevent incidents from ever happening
Retrospect everything that happens to turn lessons into system improvements/changes, automation, etc
WHO ARE YOU

5+ years of experience as a Software, DevOps, or Site Reliability Engineer
You think about systems - interfaces, boundaries, edge cases, failure modes, behaviors, specific implementations
Have an urge to collaborate, document, and deliver quickly
Collaborating across the global remote teams, often working asynchronously
Document everything so you don't need to learn the same thing (or plan the same work) twice
Delivering fast to delight our customers - even internal ones
Have an enthusiastic, go-for-it attitude. When you see something broken, you can't help but fix it
Have a desire to solve everyday challenges facing software engineers and automate their toil away
Have an excellent ability to manage multiple tasks and expectations at once
Know your way around Linux and Unix Shell.
Have strong programming skills - Ruby and/or Go preferred
Have experience with Docker, Kubernetes, Terraform, or similar IaC technologies
Have experience with MongoDB, Redis, Kafka, Postgres, or similar data technologies

For candidates based in the United States, the pay range for this position at the start of employment is expected to be between $ 128,842 and $232,200/year with an expected On Target Earnings (OTE) between $144,000 and $258,000/year (including bonus or commission). Your exact offer may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. In addition to cash compensation, Braze offers ful

About the Company

Braze ($BRZE) is a leading customer engagement platform that powers lasting connections between consumers and brands they love. Braze allows any marketer to collect and take action on any amount of data from any source, so they can creatively engage with customers in real time, across channels from one platform. From cross-channel messaging and journey orchestration to Al-powered experimentation and optimization, Braze enables companies to build and maintain absolutely engaging relationships with their customers that foster ... Know more

Related Jobs

Company Name: SafeRide Health
Job Title: Site Reliability Engineer

Company Name: Zillow
Job Title: Senior Site Reliability Engineer