High Performance Computing — HPC

Shared high performance computing resources for Columbia researchers.

Also known as HPC and Shared HPC.

CUIT’s High Performance Computing service provides a cluster of computing resources that power transactions across numerous research groups and departments at the University, as well as additional projects and initiatives as demand and resources allow. The Shared Research Computing Policy Advisory Committee (SRCPAC) oversees the operation of existing HPC clusters through faculty-led subcommittees. 

    Please note that HPC service is available 24x7. Downtimes for maintenance may be scheduled every 3 months. The duration of these planned outages varies but is typically less than a day and is announced to users in advance.

    April 2024 HPC Trainings

    Intro to Intel GPU, oneAPI tools

    Intel's Developer Tools Team will present on the Intel oneAPI analysis tools, including Intel Advisor, which can help optimize your CPU code for GPU use or adjust Nvidia-code for Intel's new Max GPUs.

    We are looking for GPU and CPU users from our Ginsburg and Insomnia clusters who are interested in trying out a new “seed” 1100 Max GPU unit that has been added to the Insomnia cluster. Dell/Intel ask that you participate in evaluating this system. Note: The number of accounts for this program will be limited.

    Using HPC to Enhance Your Research: A Series of Three Classes

    Join CUIT's HPC Engineers for three introductory training workshops designed to accelerate and enhance your research by teaching you how to access and utilize more powerful computing resources available to you at Columbia University. 

    Each workshop is intended to be part of a series, but can be taken, and must be registered for, individually.

     

    Current HPC Clusters

    Insomnia Shared HPC Cluster

    Insomnia went live in February 2024 and initially was a joint purchase by 21 research groups and departments. Unlike its predecessors, its new design allows researchers to buy not only a node but half or even a quarter of a node on the cluster.

    Insomnia is faculty-governed by the cross-disciplinary SRCPAC and is administered and supported by CUIT’s High Performance Computing team.

    Insomnia is a new type of design intended to be expanded indefinitely, adding new hardware and capabilities as needed. It is a perpetual cluster. Hardware will be retired after five (5) years.

    40 nodes with a total of 6400 cores (80 physical cores per node, doubled via hyperthreading):

    All servers are equipped with Dual Intel Xeon Platinum 8640Y processors (2 GHz):

    • 25 Standard Nodes (512 GB)
    • 9 High Memory Nodes (1 TB)
    • 3 NVIDIA L40 GPU nodes (2 GPU modules per server)
    • 2 NVIDIA A100S GPU nodes (2 GPU modules per server)
    • 1 NVIDIA H100 GPU node (dual GPU modules)
    • 291TB GPFS filesystem
    • HDR Infiniband
    • Red Hat Enterprise Linux 9.3
    • Slurm job scheduler

    Ginsburg Shared HPC Cluster

    Ginsburg went live in February 2021 and is a joint purchase by 33 research groups and departments. 

    The cluster is faculty-governed by the cross-disciplinary SRCPAC and is administered and supported by CUIT’s High Performance Computing team.

    Tentative retirement dates
    • Ginsburg Phase 1 retirement: February 2025
    • Ginsburg Phase 2 retirement: March 2027
    • Ginsburg Phase 3 retirement: December 2027

    286 nodes with a total of 9152 cores (32 cores per node):

    All servers are equipped with Dual Intel Xeon Gold 6226R processors (2.9 GHz):

    • 191 Standard Nodes (192 GB)
    • 56 High Memory Nodes (768 GB)
    • 18 RTX 8000 GPU nodes (2 GPUs modules per server)
    • 4 V100S GPU nodes (2 GPU modules per server)
    • 8 A100 GPU Nodes (2 GPU modules per server)
    • 9 A40 GPU Nodes (2 GPU modules per server)
    • 1PB of DDN ES7790 Lustre storage
    • HDR Infiniband
    • Red Hat Enterprise Linux 8
    • Slurm job scheduler

    Terremoto Shared HPC Cluster

    The Terremoto cluster was launched in December 2018, and is located in the Columbia University Data Center. 

    The cluster is faculty-governed by the cross-disciplinary SRCPAC and is administered and supported by CUIT’s High Performance Computing team.

    Tentative retirement dates
    • Terremoto Phase 1 retirement: December 2023
    • Terremoto Phase 2 retirement: December 2024

    137 nodes with a total of 3288 cores (24 cores per node)

    Dell C6420 nodes with dual Intel Xeon Gold 6126 Processor (2.6 Ghz):

    • 111 Standard Nodes (192 GB)
    • 14 High Memory Nodes (768 GB)
    • 12 GPU Nodes with two Nvidia V100 GPU modules
    • EDR Infiniband
    • Red Hat Enterprise Linux 7
    • Slurm job scheduler

    Retired Clusters

    The Habanero cluster, retired in December 2023, was located in the Shared Research Computing Facility (SRCF), a dedicated portion of the university data center on the Morningside campus.

    Yeti, retired in 2019, was located in the Shared Research Computing Facility (SRCF), a dedicated portion of the university data center on the Morningside campus.

    Hotfoot, now retired, was launched in 2009 as a partnership among: the departments of Astronomy & AstrophysicsStatistics, and Economics plus other groups represented in the Social Science Computing Committee (SSCC); the Stockwell LaboratoryCUIT; and the Office of the Executive Vice President for Research; and Arts & Sciences.

    In later years the cluster ran the Torque/Moab resource manager/scheduler software and consisted of 32 nodes which provided 384 cores for running jobs. The system also included a 72 TB array of scratch storage.

    CUIT offers four ways to leverage the computing power of our High Performance Computing resources.

    Researchers may purchase servers and storage during periodic purchase opportunities scheduled and approved by faculty and administration governance committees. A variety of purchasing options are available with pricing tiers that reflect the level of computing capability purchased. Purchasers receive higher priority than others users.

    For more information on this option, please email [email protected].

    An individual researcher may pay a set fee for a share of the system for one year as a single user with the ability to use additional computing capacity as it is available, based on system policies and availability. The current price is set at $1000/year.

    Submit a request form for an HPC rental request now.

    Researchers, including graduate students, post-docs, and sponsored undergraduates may use the system on a low-priority, as-available basis. User support is limited to online documentation only.

    Submit a request form for free HPC access now.

    Instructors teaching a course or workshop addressing an aspect of computational research may request temporary access for their students. Access will typically be arranged in conjunction with a class project or assignment.

    Submit a request form for HPC Education access now.

    Current HPC contacts can request access to their HPC group for a new user by emailing [email protected]. This option is available to current authorized contacts only.