Summary
Resources per team: Shared access to a GPU accelerated 261 nodes HPC partition.
Resource Access: ssh connection, visualization session also available.
Data cube access: Shared through mounted disk space
Resource management: Accessed through Slurm job scheduler.
Software management: Through regular IDRIS support. Containerisation option available through Singularity.
Documentation: http://www.idris.fr/eng/jean-zay/
Support: Regular IDRIS support for application support. OCA (www.oca.eu) will act as facilitator if necessary.
Resource location: France
Technical Specifications
Overview
IDRIS - Institute for Development and Resources in Intensive Scientific Computing (http://www.idris.fr/eng/info/missions-eng.html) will provide access to the GPU accelerated partition of its main supercomputer called Jean Zay (http://www.idris.fr/eng/jean-zay/). Jean Zay’s computing resources are currently used by 1600 researchers and engineers and almost 625 Projects are running across IDRIS resources (50% HPC projects / 50% AI projects).
Technical specifications
IDRIS will provide access to the 1044 NVIDIA V100 SXM2 GPUs splitted over 261 nodes (4 GPUs per node). Each node includes
2 Intel Cascade Lake SP 6248 sockets
40 cores running at 2.5 GHz each
192 GB of memory
4 GPUs (32 GB HBM per GPU)
The compute nodes have accessed to a shared full-flash parallel file system based on IBM Spectrum Scale (capacity > 2 PB ; Throughput > 450 GB/s).
Users will also have access to visualization and pre/post-processing nodes.
Jean Zay’s full specifications can be found here: http://www.idris.fr/eng/jean-zay/cpu/jean-zay-cpu-hw-eng.html
Per user resource
Every user will have access to the compute nodes available to the SKADC2 project. Every SKA DC2’s project will be allocated 1000h of GPU runtime, shared between all users of one SKA DC2’s project. Each job elapse runtime shouldn’t exceed 100h:
http://www.idris.fr/eng/jean-zay/gpu/jean-zay-gpu-exec_partition_slurm-eng.html
If needed, each team project leader will ask for an extension of computing hours to IDRIS for established process.
Software installed
The cluster nodes are running RedHat. The job scheduler is Slurm. Both Intel and PGI/NVIDIA compilers are provided. Libraries are managed through the module system:
http://www.idris.fr/eng/jean-zay/cpu/jean-zay-cpu-hw-eng.html#basic_software_description
A list of all available Scientific software and libraries is provided through the FAQ. They can be listed with the “module avail” command (http://www.idris.fr/eng/jean-zay/cpu/jean-zay-cpu-doc_module-eng.html).
Volume of resource
IDRIS can accommodate up to 40 user accounts, that can be dispatched on any number of teams.
GPUs if any
The partition provides 1044 Nvidia Tesla V100 SXM2 GPUs with 32 GB HBM each.
User access
Request access
If your project has been allocated resources on GENCI/IDRIS supercomputer, the scientific project's leader (or alternatively the project’s leader) would first need to send an email to skadc2@oca.eu including the following information:
Project name: must start with SKADC2
Project lead: full name, institutional email, organization, country
Project participants: full names, institutional email, organization, country
Project description: Project overview/objectives (Max 10 lines) – refer to any http link if appropriate
OCA will be the entry point and will provide guidelines to the scientific projet’s leader to (1) reference its projet at IDRIS (2) create users account.
Logging in
All users will first connect to skadc2.oca.eu through ssh, and then on jean-zay.idris.fr, still through ssh.
Connection will be possible through one command:
$ ssh -J <login>@skadc2.oca.eu <login>@jean-zay.idris.fr
How to run a workflow
Jobs will be executed on computing resources through Slurm. Full documentation (including examples) is provided at the following address: http://www.idris.fr/eng/jean-zay/
Accessing the data cube
Input Data will be stored on an IBM Spectrum Scale file system mounted on every accelerated node. IDRIS will provide the directory path name where data will be stored.
Different storage spaces are available, as detailed here: http://www.idris.fr/eng/jean-zay/cpu/jean-zay-cpu-calculateurs-disques-eng.html .
Software management
Installed software is managed through the module environment. Installation of missing libraries is handled through regular support request. User can also install their own libraries.
A list of currently available tools and library is available at the following address: http://www.idris.fr/eng/jean-zay/
Containerisation
Single-node and multi-node container usage through singularity (see http://www.idris.fr/eng/jean-zay/cpu/jean-zay-utilisation-singularity-eng.html for the relevant documentation).
Documentation
The documentation is available online at the following address: http://www.idris.fr/eng/jean-zay/
Resource management
The computing resources are accessed through the Slurm job scheduler, each team will be assigned to an account with a predefined number of hours.
Storage is granted on a per user and per project basis, as detailed here: http://www.idris.fr/eng/jean-zay/cpu/jean-zay-cpu-calculateurs-disques-eng.html .
Support
IDRIS support (for applicative troubleshooting) can be contacted either by mail (preferred) at assist@idris.fr or by phone (see below). The mail subject must include SKADC2 and OCA team will be copied skadc2-assist@oca.eu. Support by phone can be provided at: +33 (0)1 69 35 85 55. If support by phone is the method used to declare an issue, skadc2-assist@oca.eu must also be notified.
For more details, please refer to http://www.idris.fr/eng/su/assist-eng.html
The IDRIS HPC community is supported by the following people:
http://www.idris.fr/eng/info/personnel/assistancehpc-eng.html
Credits and acknowledgements
This work was granted access to the HPC resources of IDRIS under the allocation 20XX-[numéro de dossier] made by GENCI