What are the specifications of the cluster?
There are 32 nodes with 24 cores each for 768 total computational cores. Each node contains two Intel E5-2650 v4 12-core 2.2 GHz Xeon processors and 128GB of DDR4 ECC RAM (5.33GB of RAM per core). You should design how you split up your jobs accordingly. The system is running CentOS 6.8 with the Intel 2017 Parallel Studio XE compilers as well as MATLAB 2017a.
The nodes are interconnected with a Mellanox SwitchX MSX6036F-1SFS 56Gbps Infiniband switch as well as an EdgeCore 1 Gbps managed L3 ethernet switch for high speed communications between nodes. The cluster is connected via a 1 Gbps link to a university managed Cisco switch in a Washington University private subnet.
Each node also has a 240GB local SSD available for writing temporary files. If you want access to it, ask Hugh and he will create a /scratch/username folder for you to use to write intermediate files. You need to include lines in your batch script to save the files you need and delete the ones you do not need. Files left on /scratch will be purged periodically.
How can I access and login to the cluster?
The Linux cluster is named TELLUS and is only available for SSH access from the Washington University campus network. If you are connecting from off-campus you must login through the campus VPN or via another externally accessible Linux server. You can then connect by using a SSH connection to the cluster’s specific IP address. Mac users can just use a terminal, Windows users can download and use MobaXterm.
Where is my home directory and how much storage do I have?
Your home directory lives on the master node in /home/<username>. Your directory lives on a shared 30TB filesystem on a RAID5 striped volume on the master node. There are no storage limits on this volume, but please do clean up unneeded files periodically. This file system is not backed up so when you delete anything the files are gone forever.
How do I transfer my files to the cluster?
Transfer files using the SFTP protocol, built-in to MobaXterm, CyberDuck, FileZilla or command-line sftp. Users with accounts on the seismology servers will also find /P available via NFS on the cluster for easy transferring of files.
How do I select among the different compilers on the cluster?
Most people will probably use the Intel compilers so you should include the line:
module load intel/parallel-studio-xe-2017
in your .bashrc or .cshrc and in all your sbatch scripts to set your environment. You can type module avail to see available compilers and environments that are available. Make sure the line:
is in your .bashrc or .cshrc
How do I optimize my code for the cluster?
To optimize your code for our specific processors use the Intel flag: -xCORE-AVX2 when compiling your code.
$ mpif90 -xCORE-AVX2 -o myexecutable mycode.f90
How do I run other applications on the cluster?
Additional packages are installed on /opt/local/ on the master and nodes. Add source lines to your .bashrc to set the directories and paths necessary to run them. For example, to run MATLAB add the following line to your .bashrc
How do I submit jobs to the nodes?
All jobs must be submitted using the SLURM command sbatch. You must create a batch script and then submit it using sbatch to one of the available partitions.
We have some example scripts available that people have used in the past. Let Hugh know if you would like to see some of these scripts.
What other SLURM commands are there?
- squeue: Show queue contents (what jobs are running, nodes used)
- sinfo: Show queue information
- scancel: Cancel a queued job
Can I check the status of the CLUSTER remotely?
If you are on campus or have VPN access you can enter the IP address of the cluster in a browser to see the current status.
General users and folks off campus without VPN access can see some general information here.
How do I get more help on the cluster?
For general account and usage information contact Hugh Chou. Other graduate students or post-doctoral associates may be more familiar with the actual code and algorithms for optimizing your particular code.