- Log into the cluster
- Accessing Arm-based nodes
- Compiling a program (ThunderX2)
- Arm Instruction Emulator
- Getting traces in Dibona 2.0
- Power Monitoring tools
Log into the cluster¶
- Connect to the HCA server through ssh using your credentials.
- Connect to the Dibona login node (mb3-host) using the same credentials.
pc:~$ ssh guest_xx@ssh.hca.bsc.es guest_xx@hca-server:~$ ssh guest_xx@92.43.249.196 guest_xx@mb3-host:~$
Warning: The login node mb3-host is an Intel machine, NOT Arm-based.¶
Accessing Arm-based nodes¶
The compute nodes are managed by the SLURM controller. You will need to request hardware resources to SLURM in order to access the compute nodes.
Basic SLURM commands¶
guest_xx@mb3-host:~$ sinfo # List available queues/partitions guest_xx@mb3-host:~$ squeue # List submitted jobs guest_xx@mb3-host:~$ sbatch <jobscript> # Submit job in batch mode guest_xx@mb3-host:~$ srun <command> # Submit job in interactive mode guest_xx@mb3-host:~$ scancel <jobId> # Cancel job by ID
Interactive session example¶
guest_xx@mb3-host:~$ srun --partition=<partitionName> -N 1 --time=00:30:00 --pty bash -i guest_xx@pm-nod145:~$
Jobscript example¶
guest_xx@mb3-host:~$ cat jobscript.sh #!/bin/bash #SBATCH --job-name=my_first_job # Job name #SBATCH --partition=hackathon # Queue #SBATCH --ntasks=64 # Number of MPI processes #SBATCH --nodes=1 # Number of nodes #SBATCH --cpus-per-task=1 # Number of OpenMP threads #SBATCH --time=00:15:00 # Time limit hrs:min:sec #SBATCH --output=%j.out # Standard output #SBATCH --error=%j.err # Error output # Print machine/job information pwd; hostname; date printf "\n" echo $SLURM_JOB_ID printf "Running test\n" # Prepare environment module purge # Clean environment modules module load arm/arm-hpc-compiler/18.4.2 # Load Arm HPC Compiler module load openmpi2.0.2.14/arm18.4 # Load OpenMPI module load arm/armie/18.4 # Load Arm Instruction Emulator # Execute application srun <binary> guest_xx@mb3-host:~$ sbatch jobscript.sh Submitted batch job <jobId> guest_xx@mb3-host:~$
Node allocation example¶
To allocate a node for later use, use the salloc command. There is currently no limit in the allocation, so please use it carefully!
- Allocate one or more specific nodes
$ salloc --nodelist=pm-nod057 --time=00:30:00 salloc: Granted job allocation <jobId>
The command will either exit indicating a successful allocation, or will pause while waits for the required nodes to be available. - Before accessing to the node, refresh the Kerberos ticket. It will ask for your password.
kinit
- Using squeue command will show the allocation as running job. Now it is possible to access the node via SSH, it should not ask for a password (single sign-on)
ssh pm-nod057.bullx
Important: The suffix .bullx must be used! Otherwise the SSH command will not work - After finishing working with the node, exit the allocation using exit at the login node. Otherwise, the node will remain allocated.
[user@mb3-host ~]$ exit salloc: Relinquishing job allocation <jobId>
Compiling a program (ThunderX2)¶
The preferred procedure for compiling a program is as follows (notice that the login node is not Arm):
- Prepare your files in your home directory at the login node.
- Use srun to gain access to a single Dibona node interactively, the command will automatically SSH into the node.
srun -N 1 --time=00:30:00 --pty bash -i
- Compile your programs inside a Dibona node. The module environment is available and the most updated version of the modules are advertised in the message of the day (displayed at login).
Once you are ready, go back to the login node and do a job submission with srun/sbatch.
Compiler optimization flags¶
Arm HPC Compiler¶
armclang -mcpu=native -O3 -ffast-math # Warning: For Cavium ThunderX2, -mcpu=native and -mcpu=thunderx2t99 yield different results # We suggest using -mcpu=thunderx2t99 armclang -mcpu=thunderx2t99 -O3 -ffast-math armclang -mcpu=thunderx2t99 -Ofast # Same as above
If you are using fortran, consider adding ''-fstack-arrays''
Please refer to the Porting and Tuning guides for various packages page of the Arm developer portal for more information.
GCC¶
gcc -march=native -mcpu=thunderx2t99 -O3 -ffast-math -ffp-contract=fast
Arm Instruction Emulator¶
General Information¶
Arm Instruction Emulator supports emulation of all SVE instructions when running on Armv8-A compatible hardware. Note that the emulator does not support emulation of Armv8.x instructions, namely Armv8.1 and Armv8.2.
How to use it¶
NOTE: For the following tutorial, will work on applications that can be compiled with the Arm HPC Compiler (i.e., llvm-based compiler).
Prepare your binary¶
First of all, you need to compile your application with the Arm HPC Compiler. Therefore, we need to load the required modules:
# Make sure you don't have any other module loaded guest_xx@pm-nodxxx:~$ module purge [...] # Load the Arm HPC Compiler and the Arm Instruction Emulator modules guest_xx@pm-nodxxx:~$ module load arm/arm-hpc-compiler/18.4.2 arm/armie/18.4
Now, you need to edit your compilers to use armclang/armclang++/armflang, which are the compilers from the Arm HPC Compiler. We also need to specify to the compiler that we want it to emit SVE instructions. After these additions/modifications, our compiler declaration and its flags should look something like this:
... CXX = armclang++ CXXFLAGS = -O3 -mcpu=native -march=armv8-a+sve -ffp-contract=fast ...
Then compile your binary:
make -j8
At this point, you should have your binary which will use SVE instructions.
MPI Applications¶
If your code uses MPI you will need to compile it with the Arm HPC Compiler version of your MPI library. To display which MPI flavors are available use the //module avail// command.
# Load the Arm HPC Compiler, the MPI library and the Arm Instruction Emulator modules guest_xx@pm-nodxxx:~$ module load arm/arm-hpc-compiler/18.4.2 openmpi2.0.2.14/arm18.4 arm/armie/18.4
The compiler might give an error suggesting //version `GLIBCXX_3.4.21' not found//. If this is your case, you should also load the GCC7 module:
# Load the Arm HPC Compiler, the MPI library and the Arm Instruction Emulator modules guest_xx@pm-nodxxx:~$ module load arm/arm-hpc-compiler/18.4.2 openmpi2.0.2.14/arm18.4 arm/armie/18.4 gcc/7.2.1
Running your binary¶
The first thing to do is to double check that your binary actually includes SVE instructions. The fastest and easiest way to do it is just executing it. Since the SVE extensions are not available on any Armv8-A SoC at this moment, we will see something like this:
guest_xx@pm-nodxxx:~$ <binary> Illegal instruction
Once we know for sure our code has SVE instructions, we can continue to executing it with the Arm Instruction Emulator. It will emulate the SVE instructions performed (therefore, the execution time will be larger).
The Arm Instruction Emulator accepts different options:
guest_xx@pm-nodxxx:~$ armie --help Execute binaries containing SVE instructions on Armv8-A hardware Usage: armie [emulation parameters] -- <command to execute> Examples: armie -msve-vector-bits=256 -- ./sve_program armie -msve-vector-bits=2048 --iclient libinscount.so -- ./sve_program --opt foo armie -e libmemtrace_sve_512.so -i libmemtrace_simple.so -- ./sve_program Flags: -m<string> Architecture specific options. Supported options: -msve-vector-bits=<uint> Vector length to use. Must be a multiple of 128 bits up to 2048 bits -mlist-vector-lengths List all valid vector lengths -e, --eclient <client> An emulation client based on the DynamoRIO API If this is not specified, the default SVE client is used -i, --iclient <client> An instrumentation client based on the DynamoRIO API -x, --unsafe-ldstex Enables a workaround which avoids an exclusive load/store bug on certain AArch64 hardware (See 'Known Issues' in RELEASE_NOTES.txt for details) -s, --show-drrun-cmd Writes the full DynamoRIO drrun command used to execute ArmIE to stderr This can be useful when debugging or developing clients -h, --help Prints this help message -V, --version Prints the version
Now, it is time to execute our application with the Arm Instruction Emulator.
Getting traces in Dibona 2.0¶
Please note that only tracing with ld-preload has been tested
1. Load the modules for your compiler and MPI implementation
module load gcc/7.2.1 openmpi2.0.2.14/gnu7 # for gcc 7.2.1 module load gcc/8.2.0 openmpi2.0.2.14/gnu8 # for gcc 8.2.0 module load arm/arm-hpc-compiler/18.4.2 openmpi2.0.2.14/arm18.4 # for arm hpc compiler 18.4.2 module load arm/arm-hpc-compiler/19.0 openmpi3.1.2/arm19.0 # for arm hpc compiler 19.0.0
2. Create or copy the extrae.xml file, you can find an example at:
/dibona_home_nfs/bsc_shared/apps/extrae/gcc7.2.1_openmpi2.0.2.14/3.5.4/share/example/MPI/extrae.xml # for gcc 7.2.1 /dibona_home_nfs/bsc_shared/apps/extrae/gcc8.2.0_openmpi2.0.2.14/3.5.4/share/example/MPI/extrae.xml # for gcc 8.2.0 /dibona_home_nfs/bsc_shared/apps/extrae/armhpc18.4.2_openmpi2.0.2.14/3.5.4/share/example/MPI/extrae.xml # for arm hpc compiler 18.4.2 /dibona_home_nfs/bsc_shared/apps/extrae/armhpc19.0.0_openmpi3.1.2/3.5.4/share/example/MPI/extrae.xml # for arm hpc compiler 19.0.0
3. Create or copy the trace.sh file, you can find an example at:
/dibona_home_nfs/bsc_shared/apps/extrae/gcc7.2.1_openmpi2.0.2.14/3.5.4/share/example/MPI/ld-preload/trace.sh # for gcc 7.2.1 /dibona_home_nfs/bsc_shared/apps/extrae/gcc8.2.0_openmpi2.0.2.14/3.5.4/share/example/MPI/ld-preload/trace.sh # for gcc 8.2.0 /dibona_home_nfs/bsc_shared/apps/extrae/armhpc18.4.2_openmpi2.0.2.14/3.5.4/share/example/MPI/ld-preload/trace.sh # for arm hpc compiler 18.4.2 /dibona_home_nfs/bsc_shared/apps/extrae/armhpc19.0.0_openmpi3.1.2/3.5.4/share/example/MPI/ld-preload/trace.sh # for arm hpc compiler 19.0.0
4. Run you job using the trace.sh file
Example (arm hpc compiler)¶
trace.sh
#!/bin/bash source /dibona_home_nfs/bsc_shared/apps/extrae/armhpc18.4.2_openmpi2.0.2.14/3.5.4/etc/extrae.sh export EXTRAE_CONFIG_FILE=./extrae.xml export LD_PRELOAD=${EXTRAE_HOME}/lib/libmpitrace.so # For C apps #export LD_PRELOAD=${EXTRAE_HOME}/lib/libmpitracef.so # For Fortran apps ## Run the desired program $*
jobscript.sh
#!/bin/bash #SBATCH --job-name="mb3_wp6_d68.extrae" #SBATCH --time=00:30:00 #SBATCH --ntasks=4 #SBATCH --cpus-per-task=1 #SBATCH --output=%j.out source /usr/share/Modules/init/bash # For module command. Replace with whatever shell you use arm/arm-hpc-compiler/18.4.2 openmpi2.0.2.14/arm18.4 # Run benchmark mpirun -np 2 ./trace.sh ../bin/xhpcg --rt=0 --nx=64
Power Monitoring tools¶
Job script that can retrieve power/energy data of a multi-node job.
Please note that it relies on GPIO power monitoring method, still not completely documented by Bull.
For each compute node it will produce a file with raw power data (hardcore to read!) + a human readable summary.
Raw power data files have are called "${SLURM_JOB_ID}_${NODE_NAME}.pwr" while the power summary is called "${SLURM_JOB_ID}.pwr".
So if you are monitoring the power of job 4646 running on pm-nod046 and pm-nod093 at the end of the execution of your job you will find in your working directory:
4646_pm-nod046.pwr <--- raw power data of first node 4646_pm-nod093.pwr <--- raw power data of second node 4646.pwr <--- human readable power data of both nodes (one below the other)
Production queue (ThunderX2)¶
Please note that you NEED to have an active kerberos key (has to be refreshed daily):
kinit
You also NEED to have your SSH key into your authorized keys (it is needed only once):
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
jobscript.sh example
#!/bin/bash -x #SBATCH --job-name=pwr_test #SBATCH --ntasks=128 #SBATCH --time=05:59:00 #SBATCH --output=%j.out #SBATCH --exclusive #SBATCH --partition=production pwd; hostname; date echo "**** WHO ****" echo $SLURM_JOB_NODELIST echo $SLURM_JOB_ID #Please note that you NEED to have an active kerberos key, use the "kinit" command (this has to be refreshed daily) #You also NEED to have your SSH key into your authorized keys with "cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys" (doing it once is enough) # User prolog, executed on each of the compute nodes # It contains the command to start the power monitoring GPIO_CMD_START="/dibona_home_nfs/bsc_shared/apps/power/pro.sh" # User epilog, executed on each of the compute nodes # It contains the command to stop the power monitoring GPIO_CMD_STOP="/dibona_home_nfs/bsc_shared/apps/power/epi.sh" # Command to retrieve the power data from the FPGA # NOTE: it needs to be executed from mb3-host! RETRIEVE_CMD="/dibona_work_storage/admins/slurm/power-files/read-hdeem-values-from-nodeList.sh" # Command to convert FPGA raw data to human readable data HEADER_CMD="/dibona_home_nfs/bsc_shared/apps/power/print-header.sh" CONVERT_CMD="/dibona_home_nfs/bsc_shared/apps/power/hdeem2csv.sh" # Application that we want to power monitor SCIENTIFIC_PROGRAM="sleep 10" # SLURM command to run the application, including prolog (starting power monitor) and epilog (stopping power monitor) srun --task-prolog=${GPIO_CMD_START} --task-epilog=${GPIO_CMD_STOP} $SCIENTIFIC_PROGRAM # Bookkeeping of power data... touch `pwd`/${SLURM_JOB_ID}.pwr $HEADER_CMD > `pwd`/${SLURM_JOB_ID}.pwr for N in `scontrol show hostname $SLURM_JOB_NODELIST` ; do touch `pwd`/${SLURM_JOB_ID}_${N}.pwr # Retrieving raw power data, one file per compute node ssh mb3-host $RETRIEVE_CMD $N > `pwd`/${SLURM_JOB_ID}_${N}.pwr # Converting raw power data to human readable format, appending into a single file printf "%s," "$N" >> `pwd`/${SLURM_JOB_ID}.pwr ssh mb3-host $CONVERT_CMD `pwd`/${SLURM_JOB_ID}_${N}.pwr >> `pwd`/${SLURM_JOB_ID}.pwr done