Job Specifications
Role: Senior Solution Architect – HPC, Cloud-Native Systems
Location: Remote
Position Overview
We are seeking a high-performance Senior Solution Architect to lead the convergence of traditional High-Performance Computing (HPC) environments with modern cloud-native architectures. This position is designated as ITAR-restricted, requiring candidates that are legally authorized to access and handle U.S. export-controlled technical data.
The architect will design, integrate, and optimize large-scale, containerized, hybrid HPC environments using technologies such as Docker, Mirantis, ELK Stack, and advanced batch schedulers. This role requires deep technical leadership, architectural vision, and hands-on experience supporting mission-critical computational workloads in secure, compliant environments.
Core Responsibilities
1. Architecture & Design
Architect end-to-end hybrid cloud solutions integrating Mirantis Container Cloud with dedicated HPC clusters.
Balance performance, elasticity, and compliance requirements across on-prem and cloud environments.
Produce architecture documentation in adherence with ITAR export-controlled standards and review practices.
2. HPC Orchestration
Design and implement HPC job scheduling strategies using Slurm, Volcano, LAVA, or similar technologies.
Support deterministic resource allocation for AI/ML analytics, physics simulations, and scientific workloads.
Ensure schedulers meet ITAR-restricted workload isolation and audit requirements.
3. Optimization & Performance Tuning
Apply best practices for high-performance containerization: multi-stage builds, minimal base images, and resource tuning (CPU, GPU, Memory).
Implement strategies to minimize overhead, ensure stability, and eliminate noisy-neighbor issues.
4. Centralized Observability
Architect and operate an enterprise-grade ELK Stack (Elasticsearch, Logstash, Kibana) tuned for HPC-scale environments.
Manage Index Lifecycle Management (ILM) for massive log throughput while preserving traceability for compliance audits.
5. Full-Stack Automation
Build IaC-driven automation pipelines using Terraform, Ansible, and GitOps workflows.
Automate deployment of Mirantis Kubernetes Engine (MKE) and integrated HPC schedulers within an ITAR-secured environment.
6. CI/CD Automation
Implement robust CI/CD workflows using Jenkins, GitLab CI, Argo Workflows, or similar tools.
Ensure pipelines comply with ITAR policies, including artifact access control, secure registries, and encrypted transport.
7. Hybrid Integration
Architect integration between Kubernetes and traditional HPC schedulers.
Enable advanced workloads requiring high-speed interconnects such as InfiniBand, RDMA, or GPU-accelerated clusters.
Required Technical Skills
Containers & Mirantis
Expertise in Docker Runtime, Mirantis Kubernetes Engine (MKE), and Lens Desktop management.
Deep experience designing containerized workloads for HPC environments.
HPC Schedulers
Hands-on experience with Slurm, PBS, or Kubernetes-native batch schedulers such as Volcano.
Knowledge of hierarchical priority queues, gang scheduling, and resource fairness algorithms.
ELK Stack Mastery
Strong understanding of Logstash pipeline performance optimization, Elasticsearch sharding strategies, and Kibana visualization design.
Performance Tools
Experience with NVIDIA Enroot/Pyxis or equivalent technologies supporting near bare-metal container performance.
Security & Compliance
Implement secure registry solutions, TLS encryption, RBAC, and identity-driven access controls.
Demonstrated experience supporting compliance frameworks including ITAR, NIST 800-53, or similar.
Experience & Qualifications
Professional Background
10+ years in systems architecture or engineering roles.
5+ years in HPC, Cloud Infrastructure, or enterprise-scale DevOps environments.
HPC Knowledge
Understanding of MPI (Message Passing Interface), GPU compute workloads, low-latency networks, and distributed parallel frameworks.
Certifications
Preferred certifications include:
Certified Kubernetes Administrator (CKA)
Mirantis Kubernetes certifications
Relevant security/compliance certifications (a plus)
Cloud Platforms
Experience with AWS HPC environments (EKS, AWS Batch, EKS for Lustre, EC2 GPU-accelerated instances).
About the Company
Instituted in 2002, Techgene is an ISO 9001-2008 certified company that provides innovative mobility solutions for enterprises as well as consumer sector. Techgene is headquartered in Irving, Texas - USA with state of art development center is situated in Hyderabad, India. With high quality expertise R&D and IT across all major web and mobile platforms, Techgene has an overall experience of over 100 person years delivering numerous customer satisfactory solutions with indefinite Client applauses.
Techgene has in-depth exp...
Know more