20 Oct 2022

Lead High Performance Computing Architect

Website Icahn School of Medicine at Mount Sinai

Ground breaking science. Advancing medicine. Healing made personal.

Job Description

Strength Through Diversity

Ground breaking science. Advancing medicine. Healing made personal.

Roles & Responsibilities: 

The Scientific Computing and Data group at the Icahn School of Medicine at Mount Sinai partners with scientists to accelerate scientific discovery. To achieve these aims, we support a cutting-edge high-performance computing and data ecosystem along with MD/PhD-level support for researchers. The group is composed of a high-performance computing team, the research clinical data warehouse team and a research data services team.

The Lead HPC Architect, High Performance Computational and Data Ecosystem, is responsible for architecting, designing, and leading the technical operations for Scientific Computing’s computational and data science ecosystem. This ecosystem includes high-performance computing (HPC) systems, clinical research databases, and a software development infrastructure for local and national projects. To meet Sinai’s scientific and clinical goals, the Lead brings a strategic, tactical and customer-focused vision to evolve Sinai’s computational and data-rich environment to be continually more resilient, scalable and productive for basic and translational biomedical research. The development and execution of the vision includes a deep technical understanding of the best practices for computational, data and software development systems along with a strong focus on customer service for researchers. The Lead is an expert troubleshooter and productive team member. The incumbent is a productive partner for researchers and technologists throughout the organization and beyond. This position reports to the Director for Computational & Data Ecosystem in Scientific Computing. Specific responsibilities are listed below.

  • Lead the technical operations including the architect, design, expansion, monitoring, support, and maintenance for Scientific Computing’s computational and data science ecosystem consistent with best practices. Key components include a 50,000+ core and 30+ petabyte usable high-performance computing cluster, clinical data warehouse and software development environment.
  • Lead the troubleshooting, isolation and resolution of all technical issues
  • Lead the design, development, implementation and management of all system administration tasks, including hardware and software configuration, configuration management, system monitoring (including the development and maintenance of regression tests), usage reporting, system performance (file systems, scheduler, interconnect, high availability, etc.), security, networking and metrics, etc.
  • Ensures that the design and operation of the HPC ecosystem is productive for research.
  • Collaborates effectively with research and hospital system IT, compliance, HIPAA, security and other departments to ensure compliance with all regulations and Sinai policies.
  • Partners with other peers regionally, nationally and internationally to discover, propose and deploy a world-class research infrastructure for Mount Sinai.
  • Prepares and manages budgets for hardware, software and maintenance. Participates in chargeback/fee recovery analysis and provides suggestions to make operations sustainable.
  • Lead the integration of HPC resources with laboratory equipment such as genomic sequencers, etc.
  • Researches, deploys and optimizes resource management and scheduling software and policies and actively monitoring.
  • Designs, tunes, manages and upgrades parallel file systems, storage and data-oriented resources.
  • Researches, deploys and manages security infrastructure, including development of policies and procedures.
  • Lead and assist the team to resolve user support requests from researchers.
  • Assists in developing and writing system design for research proposals.
  • Lead the development of a framework for effective system documentation.
  • Works effectively and productively with other team members within the group and across Mount Sinai.
  • Provide after-hours support in case of a critical system issue.

Requirements:

  • Bachelor’s degree in computer science, engineering or another scientific field. Master’s or PhD preferred.
  • 8 years of progressive HPC system administration and operations (preferably in a Redhat/CentOS Linux administration, Batch HPC cluster environment)
  • Must be an expert troubleshooter; Must be a team player and customer focused
  • Strong experience with configuration management systems such as xCAT, Puppet and/or Ansible
  • Strong experience with networking and security
  • Strong experience with Infiniband and Gigabit Ethernet
  • Experience with LSF and GPFS Spectrum Scale parallel file systems and storage
  • Experience with providing technical operations leadership
  • Ability to manage a variety of disparate tasks and priorities independently and troubleshoot complex technology problems.
  • Attention to detail; time and project management skills.
  • Excellent communication skills, analytical ability, strong judgment and management skills, and the ability to work effectively as a liaison between both research and technology teams.
  • Strong written, oral, and interpersonal communication skills
  • Script and programming experience

Preferred Experience

  • Experience with archival storage and tape libraries (TSM) is highly preferred.
  • Experience with databases and web services is highly preferred.
  • Compliance, HIPAA, GDPR, FISMA
  • Experience with managing web access to HPC resources (such as Open OnDemand)
  • Experience in a research environment is highly preferred.
  • Experience with financial budgets and providing cost benefit analysis is preferred.
  • Cloud Technology

Strength Through Diversity

The Mount Sinai Health System believes that diversity is a driver for excellence. We share a common devotion to delivering exceptional patient care. Yet we are as diverse as the city we call home- culturally, ethically, in outlook and lifestyle. When you join us, you become a part of Mount Sinai’s unrivaled record of achievement, education and advancement as we revolutionize medicine together.

We work hard to acquire and retain the best people, and to create a welcoming, nurturing work environment where you can develop professionally. We share the belief that all employees, regardless of job title or expertise, can make an impact on quality patient care.

Explore more about this opportunity and how you can help us write a new chapter in our story!

Who We Are

Over 38,000 employees strong, the mission of the Mount Sinai Health System is to provide compassionate patient care with seamless coordination and to advance medicine through unrivaled education, research, and outreach in the many diverse communities we serve.

Formed in September 2013, The Mount Sinai Health System combines the excellence of the Icahn School of Medicine at Mount Sinai with seven premier hospital campuses, including Mount Sinai Beth Israel, Mount Sinai Beth Israel Brooklyn, The Mount Sinai Hospital, Mount Sinai Queens, Mount Sinai West (formerly Mount Sinai Roosevelt), Mount Sinai Morningside, and New York Eye and Ear Infirmary of Mount Sinai.

EOE Minorities/Women/Disabled/Veterans

To apply for this job please visit careers.mountsinai.org.