Encora

HPC & Cloud Infrastructure Engineer

Singapore Full Time

 

HPC & Cloud Infrastructure Engineer

Important Information

Location: Singapore

12 months contract

Job Summary

We’re hiring an HPC & Cloud Infrastructure Engineer to design, deploy, and optimize high-performance computing environments across on-prem and cloud. You’ll manage HPC clusters, interconnects, job schedulers, and enable AI/ML workloads at scale while driving automation and cost efficiency

Job Description

Architect, deploy, and manage HPC clusters with job schedulers, parallel file systems, and cluster management tools

Design, configure, and troubleshoot Infiniband high-throughput, low-latency interconnects for HPC/distributed workloads

Own PBS Professional scheduling: deployment, queue optimization, custom job submission scripts, workload management

Administer RHEL-based systems: performance tuning, package management, security hardening, patching via Red Hat Satellite and Ansible

Build and maintain cloud HPC environments on AWS, Azure, and GCP – provisioning, hybrid setups, migrations, and cost optimization

Implement Infrastructure as Code using Terraform/Ansible and integrate with CI/CD pipelines for reproducible infrastructure

Enable GPU & AI/ML workloads: containers, TensorFlow, PyTorch, scikit-learn, Keras, MXNet; support MLOps pipelines for training and deployment

Optimize parallel applications using MPI and OpenMP; debug and scale distributed/shared memory workloads

Drive monitoring, logging, and alerting for cluster health, job efficiency, and resource utilization

Required Skills and Experience

High-Performance Computing

Hands on experience in managing HPC clusters with job scheduler, cluster management parallel programming libraries, and parallel filesystems.

Knowledge of resource scheduling and job optimization for efficient workload management

 

Infiniband (Networking)

Hands-on experience with high-throughput, low-latency interconnect technologies like Infiniband.

Ability to design, configure, and troubleshoot interconnects in HPC or distributed environments.

 

Operating Systems and Environments 

Administration and configuration of RHEL-based systems.

Performance tuning, package management, and security hardening.

Knowledge of Red Hat Satellite and Ansible for automation.

 

Job Scheduling with PBS Professional 

Experience in deploying and managing PBS Professional for scheduling and workload management in HPC environments.

Customizing job submission scripts and optimizing job queues.

 

Parallel Programming Libraries

MPI (Message Passing Interface) and OpenMP (Open Multi-Processing):

Proficiency in writing, debugging, and optimizing parallelized code.

Experience with scaling applications across HPC systems.

Understanding of distributed memory (MPI) and shared memory (OpenMP)

paradigms.

 

Cloud Platforms

AWS, Azure, Google Cloud:

Expertise in provisioning, configuring, and managing services on all three platforms.

Cross-platform migration and hybrid cloud solutions knowledge.

Proficiency in managing high-performance computing (HPC) clusters on the cloud.

Deep understanding of cost optimization, security, and cloud native development tools (e.g., Kubernetes, Terraform).

 

Infrastructure as Code (IaC) 

Ability to design, deploy, and maintain infrastructure using automation and configuration management tools.

CI/CD pipeline integration for IaC workflows.

 

GPU & AI Libraries and Tools 

Hands-on experience with container technologies.

Hands-on experience with TensorFlow, PyTorch, scikit-learn, Keras, or MXNet.

Familiarity with AI/ML pipelines, model training, and optimization.

Knowledge of MLOps tools for deploying and monitoring models

 

About Encora

Encora is a global company that offers Software and Digital Engineering solutions. Our practices include Cloud Services, Product Engineering & Application Modernization, Data & Analytics, Digital Experience & Design Services, DevSecOps, Cybersecurity, Quality Engineering, AI & LLM Engineering, among others.

At Encora, we hire professionals based solely on their skills and do not discriminate based on age, disability, religion, gender, sexual orientation, socioeconomic status, or nationality.