- Company Name
- Quantexa
- Job Title
- Senior MLOps Engineer
- Job Description
-
Job Title: Senior MLOps Engineer
Role Summary: Design, deploy, monitor, and maintain end‑to‑end machine learning pipelines in production. Collaborate with data scientists to transition models from research to live services, ensuring reliability, scalability, and compliance. Lead model governance, versioning, and lifecycle management, and mentor teammates on MLOps best practices.
Expectations:
- Own full MLOps lifecycle: development, deployment, monitoring, and maintenance of ML models.
- Deliver high‑availability, production‑grade solutions that scale with data and traffic.
- Establish model governance, including version control, drift detection, and reporting.
- Mentor junior engineers and data scientists, sharing best practices and tooling.
Key Responsibilities:
- Integrate ML models into production systems using Docker, Helm, and Kubernetes.
- Automate CI/CD pipelines with Jenkins or equivalent, orchestrating workflow execution.
- Implement observability for model performance, latency, and resource usage; react to alerts and anomalies.
- Build and maintain model registry and versioning using MLflow, Kubeflow, or DVC.
- Deploy and optimize models with ONNX/ONNX Runtime; manage model serving infrastructure.
- Collaborate closely with data engineering teams on data pipelines (Spark, Python) that feed ML workloads.
- Conduct regular model health checks, drift analysis, and impact assessments.
- Develop and enforce MLOps standards, policies, and documentation.
- Provide technical guidance and training to cross‑functional teams.
Required Skills:
- Strong programming in Python; experience with scikit‑learn, PyTorch, TensorFlow, or similar libraries.
- Proficiency in Scala optional but advantageous.
- Expertise in big‑data processing (Apache Spark).
- Containerization & orchestration: Docker, Helm, Kubernetes.
- CI/CD tooling: Jenkins, GitLab CI, GitHub Actions or equivalent.
- MLOps platforms: MLflow, Kubeflow, DVC.
- Model deployment: ONNX, ONNX Runtime.
- Model governance: versioning, monitoring, drift detection.
- Cloud familiarity (AWS, Azure, GCP) and infrastructure‑as‑code preferred.
- Strong analytical and problem‑solving abilities; excellent communication.
Required Education & Certifications:
- Bachelor’s (or higher) degree in Computer Science, Data Engineering, Machine Learning, or related field.
- Certifications in Kubernetes (CKA/CKAD), AWS/Azure/GCP ML, or MLflow preferred.