cover image
Agile Fuel | World-class Dedicated Engineering Teams

Site Reliability Engineer/ Platform Engineer

On site

Mountain view, United states

Mid level

Full Time

27-11-2025

Share this job:

Skills

Python Jira Data Engineering GitHub CI/CD DevOps Docker Monitoring Decision-making Networking Architecture Cloud Architecture Databases Organization Azure Django Analytics Redis CI/CD Pipelines Terraform

Job Specifications

Our client is a fast-growing AI-driven technology company focused on building intelligent, automated solutions that transform how modern engineering teams work. They are committed to creating a development culture where speed, reliability, and data-driven decision-making are at the core. Their product leverages advanced analytics and AI to help organizations improve productivity, enhance visibility, and deliver software more efficiently.

They are seeking a hybrid Site Reliability Engineer / Platform Engineer with strong DevOps expertise and solid Python engineering skills. This person will design, build, and operate the next generation of their cloud infrastructure and internal developer platforms. The ideal candidate is passionate about automation, observability, reliability, and scalable system design. You will drive improvements across cloud architecture, CI/CD workflows, development tooling, and operational excellence — enabling the engineering organization to ship faster and more reliably.

If you thrive in a fast-moving, AI-native environment and enjoy building intelligent, highly automated platforms, this role is an excellent fit.

Responsibilities

Design, build, and maintain highly reliable, scalable Azure infrastructure using Container Apps, ACR, managed databases, serverless components, and other PaaS services;
Own and enhance CI/CD pipelines, deployment workflows, platform automation, and the full observability stack;
Develop Python-based tooling and infrastructure to support a scalable, reliable AI-driven platform;
Architect and maintain secure, fault-tolerant integrations with external systems (GitHub, Jira, Azure, Redis, Sentry, etc.);
Build and operate monitoring, logging, alerting, and SLO/SLA frameworks to ensure reliability and performance;
Partner with backend and data engineering teams to design a scalable infrastructure foundation for high-growth AI products;
Continuously optimize cost efficiency, reliability, and deployment velocity;
Scale AI infrastructure and support the transition to an AI-native engineering organization;
Drive an AI-native culture by leveraging LLM-powered workflows and automation for speed and efficiency.

Requirements

5+ years in DevOps, SRE, Platform Engineering, or similar roles;
Expert-level understanding of cloud infrastructure, ideally Azure, including container services, serverless patterns, networking, and identity;
Strong Python software engineering ability — building platform tools, automation frameworks, or backend services;
Hands-on experience with containerization, Docker, and cloud-native operational patterns;
Strong understanding of external system integrations, how to design around them, and how to build reliable abstractions when they fail;
Experience designing and operating production-grade pipelines, monitoring, alerting, and observability tools;
Practical understanding of resilience engineering: retries, backoff, idempotency, state management, and failure modes;
A bias toward automation: if something can be automated, you automate it;
A startup mindset: ownership, speed, pragmatic decision-making, and willingness to wear multiple hats;
Interest in and excitement about AI-native development workflows using tools like ChatGPT, GitHub Copilot, and automated pipeline orchestration;
Upper-Intermediate English level.

Bonus points for

Experience with Bicep, Terraform or other IaC tools;
Background supporting Python/Django or data pipelines;
Familiarity with Celery, distributed queues, or event-driven systems;
Experience working in SOC2-compliant or enterprise-grade environments;
Experience building internal developer platforms (IDPs) or self-service infrastructure.

We offer excellent benefits, including but not limited to

People-oriented management without bureaucracy;
Flexible schedule (≈ 3 hours overlap with ET);
15 working days of annual paid vacation;
Paid sick-leaves;
Friendly and engaging professional team;
Opportunities for self-realization, career, and professional growth.

About the Company

Great ideas require great execution, and great execution requires a strong team of brilliant people working together to reach the common goal. As we started helping friends (and friends-of-friends) in the software industry build up their own teams based on these learnings, the results were phenomenal, and in 2015, we have launched Agile Fuel as a very selective, high-touch, boutique service to bring this approach to others as well. We know this very well, because we have walked the walk. Since the early 2000s, our two foun... Know more