System Configuration
Agave
CPU Resource Overview:
- 800 FP64 CPU Teraflops
- 498 Compute Nodes
- 372 CPU nodes with 2 x Intel Broadwell or newer CPU per node
- 4 CPU nodes with 2 x AMD Epyc CPU per node
- 20 CPU nodes with 2 x Intel Xeon Phi CPU per node
- 128-256GB RAM on most CPU nodes
- 1TB+ RAM on 3 CPU Fat Nodes
- Intel OmniPath and Mellanox EDR InfiniBand
A more detailed description of our CPU resources can be seen in the table below
GPU Resource Overview:
- 889 FP64 GPU Teraflops
- 297 GPUs
- GPUs range from K40s to 32GB V100s
- 12 - 32 GB ram per GPU
A more detailed description of our GPU resources can be seen in the table below
Storage Resource Overview:
- 3.3 PB home directory storage on Qumulo NFS
- 1.2 PB temporary scratch storage on BeeGFS
Agave CPU Resources:
CPU model | CPU Arch | Nodes | Cores / Node | Ram / Node (GB) | CPU cores | FP64 GFLOPS |
---|---|---|---|---|---|---|
Totals | N/A | 498 | N/A | N/A | 18,480 | 808,897 |
Xeon E5-2680 v4 | Broadwell | 297 | 28 | 128 - 256 | 8,316 | 319,334 |
Xeon Phi 7210 | Knights Landing | 20 | 256 | 200 | 5,120 | 212,992 |
Xeon Gold 6230 | CascadeLake | 44 | 20 | 192 | 1,760 | 118,272 |
Xeon Gold 6252 | CascadeLake | 16 | 48 | 192 | 768 | 51,610 |
Xeon E5-2687W v4 | Broadwell | 20 | 24 | 64 - 256 | 480 | 23,040 |
Xeon E5-2640 v0 | SandyBridge | 30 | 12 | 128 | 360 | 7,200 |
Xeon Silver 4214 | CascadeLake | 8 | 24 | 96 - 192 | 192 | 6,700 |
AMD EPYC 7551 | Naples | 4 | 64 | 256 | 256 | 4,096 |
Xeon Silver 4114 | Skylake | 9 | 20 | 96 - 384 | 180 | 6,336 |
Xeon Silver 4110 | Skylake | 19 | 8 | 96 | 152 | 5,107 |
Xeon E5-2650 v4 | Broadwell | 6 | 24 | 64 - 256 | 144 | 5,069 |
Xeon Gold 6136 | Skylake | 5 | 24 | 96 | 120 | 11,520 |
Xeon Gold 6132 | SkyLake | 1 | 112 | 1,500 | 112 | 9,318 |
Xeon Gold 5120 | SkyLake | 3 | 28 | 96 | 84 | 5,914 |
Xeon Gold 6248 | CascadeLake | 2 | 40 | 192 | 80 | 6,400 |
Xeon Gold 6148 | SkyLake | 2 | 40 | 384 | 80 | 6,144 |
Xeon E5-2650 v2 | IvyBridge | 3 | 16 | 256 | 48 | 998 |
Xeon Platinum 8160 | SkyLake | 1 | 48 | 384 | 48 | 3,226 |
Xeon E7- 4860 | Westmere | 1 | 40 | 2,000 | 40 | 363 |
Xeon Gold 6240 | CascadeLake | 1 | 36 | 96 | 36 | 2,995 |
Xeon X7560 | Nehalem | 1 | 32 | 1,000 | 32 | 291 |
Xeon Silver 4116 | SkyLake | 1 | 20 | 196 | 20 | 672 |
Xeon E5-2630 v4 | Broadwell | 1 | 20 | 64 | 20 | 704 |
Xeon E5-2640 v2 | IvyBridge | 2 | 8 | 16 | 16 | 256 |
Xeon E5-2660 v0 | SandyBridge | 1 | 16 | 64 | 16 | 282 |
Agave GPU Resources
GPU Model | GPU Arch | GPU Count | GPU Cores | FP64 GFLOPS | FP32 GFLOPS | FP16 GFLOPS |
---|---|---|---|---|---|---|
Totals | N/A | 297 | 1,140,672 | 889,555 | 3,198,700 | 4,037,600 |
GeForce GTX 1080 | Pascal | 12 | 30,720 | 3,324 | 106,476 | 1,668 |
GeForce GTX 1080 Ti | Pascal | 76 | 272,384 | 26,904 | 806,284 | 13,452 |
GeForce RTX 2080 | Turing | 20 | 58,880 | 5,580 | 178,400 | 356,800 |
GeForce RTX 2080 Ti | Turing | 28 | 121,856 | 10,276 | 329,000 | 658,000 |
Tesla K20m | Tesla | 1 | 2,496 | 1,175 | 3,524 | N/A |
Tesla K40 | Tesla | 8 | 23,040 | 13,456 | 40,368 | N/A |
Tesla K80 | Tesla | 56 | 139,776 | 76,776 | 230,328 | N/A |
Tesla V100 16GB | Tesla | 74 | 378,880 | 579,716 | 1,159,580 | 2,318,420 |
Tesla V100 32GB | Tesla | 22 | 112,640 | 172,348 | 344,740 | 689,260 |
Connect
To log in to Agave, you will need an SSH client application installed on your local system.
Agave uses ASURITE accounts. You will need to use your ASURITE login and password to connect to Agave.
SSH for Windows
The recommended SSH client application for Windows is PuTTY
A tutorial on how to install and use PuTTY can be found here: Install PuTTY SSH (Secure Shell) Client in Windows 7 (YouTube)
Use PuTTY to connect to agave.asu.edu with your ASURITE login and password.
SSH on Mac or Linux
Mac OS X has an SSH client built in. A simple tutorial on how to use this SSH client can be found here: How To Use SSH on Mac OS X
Virtually every Linux distribution includes an SSH client. The process for Linux is very similar to a Mac. You have to open a terminal window from which the SSH client will be launched. However, the process for launching a terminal window varies from one Linux distribution to the next and is beyond the scope of what we can hope to document here.
Once you have a terminal window open on your Mac or Linux system, enter the following command, replacing ASURITE with your own ASURITE login name:
ssh ASURITE@agave.asu.edu
To log in with X11 forwarding turned on, use the -Y option as follows:
ssh -Y ASURITE@agave.asu.edu
I can't log in! What do I do?
If your attempts to log in are not succeeding, and you're sure you are following the instructions above, then our support team will need the following information:
- The cluster you are connecting to : Agave in this case
- The operating system you are using (e.g. Windows, Mac, Linux)
- The SSH client you are using (e.g. Putty, MobaXterm, Native, etc)
- The IP address you are connecting from ( use www.whatsmyip.org to find it )
- Connection type (Wifi, Campus Ethernet, Home connection, etc)
Accessing Compute Nodes
Job accounting
Upon login, a balance of CPU-hours billed is printed to the screen. CPU-hours = ( # CPUs ) x ( job duration in wall clock hours). This can also be shown with the command "mybalance". While the system charges only for the resources actually used, before a job begins, the requested CPU-hours are pre-deducted from the balance to determine if the expected CPU-hours can be covered by the balance. This determines if the job will be launched as non-preemptable (positive pre-deducted balance) or preemptable (negative pre-deducted balance).
Slurm scheduler
The Slurm Workload Manager supports user commands to submit, control, monitor and cancel jobs.
Sbatch scripts
Sbatch scripts are the normal way to submit a non-interactive job to the cluster.
Below is an example of an sbatch script, that should be saved as the file myscript.sh
This script performs performs the simple task of generating a file of random numbers and then sorting it.
#!/bin/bash #SBATCH -n 1 # number of cores #SBATCH -t 0-12:00 # wall time (D-HH:MM) ##SBATCH -A drzuckerman # Account hours will be pulled from (commented out with double # in front) #SBATCH -o slurm.%j.out # STDOUT (%j = JobId) #SBATCH -e slurm.%j.err # STDERR (%j = JobId) #SBATCH --mail-type=ALL # Send a notification when the job starts, stops, or fails #SBATCH --mail-user=myemail@asu.edu # send-to address module load gcc/4.9.2 for i in {1..100000}; do echo $RANDOM >> SomeRandomNumbers.txt done sort SomeRandomNumbers.txt
This script uses the #SBATCH flag to specify a few key options:
- The number of cpu cores the job should use:
- #SBATCH -n 1
- The runtime of the job in Days-Hours:Minutes:
- #SBATCH -t 0-12:00
- The account the job pulls hours from. It has been commented out with an extra # for this example since this account does not actually exist:
- ##SBATCH -A drzuckerman
- A file based on the jobid (%j) where the normal output of the program (STDOUT) should be saved:
- #SBATCH -o slurm.%j.out
- A file based on the jobid (%j) where the error output of the program (STDERR) should be saved:
- #SBATCH -e slurm.%j.err
- That email notifications should be sent out when the job ends, or when it fails:
- #SBATCH --mail-type=END,FAIL
- The address where email should be sent:
- #SBATCH --mail-user=myemail@asu.edu
- The script also uses the Software Modules (see below) system to make gcc version 4.9.2 available:
- module load gcc/4.9.2
Assuming that the above has been saved as the file myscript.sh, and you intend to run it on the conventional x86 compute nodes, you can submit this script with the following command:
sbatch myscript.sh
The job will be run on the first available conventional x86 cluster group partition.
Interactive Sessions
Depending on whether your application is a text based console program, or uses a GUI interface, you'll want to do one of the following:
For text based console programs log in with SSH
For GUI based programs log in with NoMachine
Once logged in, to launch an interactive compute session, simply run the following command:
interactive
This will launch an interactive compute session on one of the conventional x86 compute nodes
Once the session launches, you can begin using the system.
There is no special switch or option needed for X11 forwarding to work, it is always enabled.
So if you need to run an X11 base program, just launch an interactive session and run the program from within it.
However, this will only work if you are logged in through Nomachine or through SSH with X11 forwarding turned on.
Interactive Session Options
The interactive command will work with many of the same options and switches as other slurm job-launching commands, such as salloc or sbatch
In particular, you can specify how many cpu cores your want your interactive session to use with the -n number option.
You can also specify how long you would like your session to run with the -t days-hours option.
If you want the session to run under a different account you can specify it with the -A accountname option.
An example of using these options to launch an interactive session that uses 8 cpu cores on one node, runs for zero days and 4 hours, and uses the "drzuckerman" account can be seen below:
interactive -n 8 -N 1 -t 0-4:00 -A drzuckerman
This session will be launched on one of the conventional x86 compute nodes
Monitoring Jobs and Queues
Below are some sample commands. A more complete list can be found here.
Show all jobs in all queues
squeue
Show all jobs for a specific user
squeue -u <ASURITE>
Cancel a job
scancel <JOBID>
Give detailed information on a job
scontrol show job=<JOBID>
Using Software Modules
There are many software packages ( list as of June 2018 ) installed on Agave that are available through the Software Modules system.
New packages are added regularly.
Listing available modules
The following command will list the software modules that are available on Agave:
module -l avail
Loading a module
The following command will load the module forgcc/4.9.2
module load gcc/4.9.2
Listing Loaded Modules
The following command will list the modules that are currently loaded:
module list
Purging Loaded Modules
To clear out the modules that are currently loaded, perhaps because you want to load others, use the following command:
module purge
Using Modules in SBATCH scripts
Many sbatch scripts will include a module load command as part of the script.
Managing/Transferring Files
scp and rsync
To copy a file from your local machine:
scp file <ASURITE>@agave.asu.edu:/home/<ASURITE>/targetdirectory
To copy an entire directory
scp -r directory <ASURITE>@agave.asu.edu:/home/<ASURITE>/targetdirectory
To transfer large files, or to update a small portion of a large dataset:
rsync -atr bigdir <ASURITE>@agave.asu.edu:/home/<ASURITE>/targetdirectory