Squarepoint Services US LLC seeks a Systems Specialist for its New York, New York location.
Duties: Manage the companies’ HPC Slurm clusters, scheduling and configuration. Deploy Prometheus and Grafana Metrics for real-time monitoring. Manage Python/Pandas and kdb/q data analysis for reporting to stakeholders and management efficiency of code, use of hardware. Deploy bespoke plugins in C to improve features for users. Deal with support issues within user code and optimize workloads for hardware. Optimize Nvidia GPU configuration for multi-node machine learning models.
Requirements: Must have a minimum of a Master’s degree or foreign equivalent in any STEM (Science, Technology, Engineering, or Math) field of study and 2 years of experience as a HPC Engineer, Systems Engineer/Specialist, Site Reliability Engineer, or related position for an investment/asset management organization. Will also accept a Bachelor’s degree or foreign equivalent in the above disciplines and 5 years of progressive, post-baccalaureate experience as stated above. Must have at least two (2) years of employment experience with each of the following required skills: Deploying Slurm Clusters and optimizing scheduling of high throughput workloads; Debugging system problems and optimizing kernels in Linux; Data analysis in Python – Pandas; Working with Nvidia GPUs, optimizing workloads for hardware and networking stacks; Using KDB/Q for data analysis and debugging user workflows; Live monitoring of clusters using Prometheus and Grafana.
Salary / Rate Minimum/yr: $200,000
Salary / Rate Maximum/yr: $250000
40 hours/week. The minimum and maximum salary/rate information above include only base salary or base hourly rate. It does not include any other type of compensation or benefits that may be available. Squarepoint is an EEO/AA employer.