- Company Name
- ExpertsHub.ai
- Job Title
- Senior Machine Learning Engineer
- Job Description
-
**Job title**
Senior Machine Learning Engineer – AI Operations Platform Consultant
**Role summary**
Lead production‑level support and optimization of large‑scale LLM inference pipelines. Deploy, operate, and troubleshoot containerized LLM services on Kubernetes/Openshift using Triton Inference Server and TensorRT‑LLM. Ensure high availability, performance, and continuous observability for mission‑critical AI workloads.
**Expectations**
- Deliver immediate, hands‑on troubleshooting for LLM model failures in production.
- Optimize inference performance: mixed‑precision, quantization, pruning, sharding, batching.
- Maintain and evolve MLOps/LLMOps pipelines with robust incident, change, and event management.
- Provide scalable GPU‑accelerated infrastructure and real‑time telemetry dashboards.
- Lead model versioning, engine builds, automated rollouts, and secure runtime controls.
**Key responsibilities**
1. Deploy and manage containerized LLM inference services on Kubernetes/OpenShift.
2. Operate Triton Inference Server and TensorRT‑LLM; configure engines, batch policies, and GPU‑aware scheduling.
3. Engineer and maintain load balancing, scaling, and multi‑node cluster orchestration for LLM workloads.
4. Build CI/CD pipelines for model versioning, build, testing, and automated roll‑out/rollback.
5. Implement incident, change, and event management processes for mission‑critical systems.
6. Develop and maintain observability: GPU health, latency, throughput, and availability dashboards.
7. Apply production‑grade model optimization techniques (pruning, quantization, distillation).
8. Collaborate with data scientists and platform teams to integrate new LLMs into the production stack.
9. Ensure secure and compliant runtime controls, including access, encryption, and audit logging.
**Required skills**
- Kubernetes / OpenShift expertise for large‑scale container orchestration.
- Proven experience with Triton Inference Server and TensorRT‑LLM.
- GPU‑accelerated AI platform operations and performance tuning.
- LLMOps pipeline design: model versioning, engine builds, roll‑outs, and rollback strategies.
- Model optimization: mixed precision, quantization, pruning, knowledge distillation, sharding, batching.
- Load balancing, auto‑scaling, and GPU‑aware task scheduling.
- Monitoring, observability tooling (Prometheus, Grafana, custom dashboards).
- Incident, change, and event management for mission‑critical services.
- Strong scripting (Python/Bash) and API integration skills.
- Familiarity with microservices architecture, REST/GraphQL APIs.
**Required education & certifications**
- Bachelor’s degree in Computer Science, Electrical Engineering, or related field (MSc preferred).
- Certifications: Kubernetes Certified Administrator (CKA) or equivalent; NVIDIA Deep Learning Institute certifications (TensorRT, Triton); or cloud provider certifications relevant to GPU workloads (AWS/GCP/Azure).