User Guide

Kabré Usage Tutorial

Kabré is a word from Ngäbe Language which means a bunch. This fits the current cluster composition, which features multiple parallel architectures. Through this tutorial you will learn how Kabré is composed, how to connect to the cluster, submit jobs, retrieve results and about environment modules.


To complete this tutorial you will require a ssh client. In Unix and Linux platforms, there is commonly a terminal emulator capable of establishing ssh sessions. In Windows plataforms you should download a ssh client programm, like Putty or simmilar.

Also, you will requiere an active account in Kabré and valid credentials. If you don’t have one, please contact us at, explaining your situation.

Testing your credentials

Before proceding, this is a good moment to test your credentials. Open your terminal emulator program and type the following, changing user_name for your Kabre user name. If you are in a *nix environment, you can simply execute the cell, a dialog box will open, requesting your password, if everything is OK, the number in the left (the one around [ ]) should increase in one and a message in red should appear, telling "Pseudo-terminal will not be allocated because stdin is not a terminal."

Sections ahead, we will explain how to setup ssh keys to avoid typing your password everytime you log into Kabré.


Understanding Kabré’s composition

The following image shows a network diagram of Kabré. We will discuss about the mayor components in the diagram.

Meta node

This is a very special node, it supports many services of the cluster. Its existence should be transparent for all users. If, for some reason, you find yourself in Meta-node, please leave and inform that situation, it could mean a problem. Running programs in Meta-node is considered bad behavior.


These nodes are a kind of shared-working area. When you log into Kabré, you will be assigned to one login node. Some common tasks you execute here are:

  • Creating and editing files
  • Creating directories and moving files
  • Coping files to and from your computer
  • Compiling code
  • Submitting jobs
  • Managing your active jobs

Running parallel code or heavy tasks on login nodes is considered bad behavior.

Machine Learning Nodes (Nukwa)

Each of these 4 nodes feature a Nvidia Tesla K40 GPU. The host has an Intel Xeon with 4 cores @ 3.2 GHz without hyperthreading and 16 GB, therefore, only applications with an intensive use of GPUs would get a relevant speed-up.

Simulation Nodes (Nu)

The war horse of Kabré. 20 Intel Xeon Phi KNL nodes, each one with 64 cores @ 1.3 GHz and 96 GB; each core has two AVX units. If your application can be splitted in many smalls pieces and use vectorization, this architecture can provide impressive speed ups. Node nomenclature reflects the fact that nodes are distributed in 5 blades, going from 0 to 4 and nodes a to d, for example: zarate-0c means node c in blade 0, zarate-3a means node a in blade 3.

Big Data Nodes (Andalan)

They are an experimental platform, if you have applications that can be considered big data, please contact

Bioinformatics Nodes (Dribe)

Interacting with Kabré

In this section we will cover ssh connections, ssh keys, and how to copy files.

SSH connections and SSH Keys

To start, open a terminal emulator and open a ssh session by typing

$ ssh

Remember to change user with your user name. Type your password when prompted. You will be logged to some login node. This is a typical linux terminal, so, try out some known commands, like lscdmkdir and so on.

Commonly, you will be using the same computer to interact with Kabré, your laptop or workstation, for example. A SSH Key is a file that will keep your connection secure while avoiding typing your password everytime you log in. To generate one, in your local computer (laptop, workstation…) open a terminal, go to your home directory and type

$ ssh-keygen -t rsa -C ""
 and follow the instructions. If you chose default options, the new key will be in ~/.ssh/, now you have to copy the public key to Kabré, to do so type
$ scp ~/.ssh/

Now, within a ssh session in Kabré, type

$ cat >> .ssh/authorized_keys
$ rm

Alternatively, you may execute this procedure in a single command if it is available in your working station.

$ ssh-copy-id

Now, if you open a new terminal and type ssh you will be logged without prompting for the password. This is because Kabré has the public key of your computer. It is also convenient for your computer to have the public key of Kabré, simply append it to autorized_keys in your local computer.

$ scp . 
$ cat >> ~/.ssh/autorized_keys
$ rm

Copy files between your computer and Kabré

The command scp is similar to cp command, it copies files from origin to destiny through a SSH session. It has the following syntax:

$ scp [user@][host:][path]origin_file [user@][host:][path]destiny_file

Default values are

  • user: your local user
  • host: local host
  • path: current working directory

scp must be executed on your local machine, not in Kabré. Maybe your application generates a lot of visualization files and you want to download those files to your computer, remember that ~ means home directory and * matches any sequence of characters:

$ scp*.viz ~/app/results/visualization

Or maybe you want to upload a parameters file to use in a simulation:

$ scp ~/research/app/parameters.dat

Understanding Kabré’s queues system

Login nodes are suitable for light task, as mentioned before: editing files, compiling, copying files, and similars. Heavy tasks are expected to run on Cadejos, Zarate or Tule nodes. To enforce a fair sharing of resources among users, your task should be submitted to a queues system. It is like forming up at the bank, once your task makes its way to the head of the line, it will be granted all requested resources an will run until completition or until it consumes its time slot.

Currently, there are different queues for every component in Kabré, that means you cannot mix Tule nodes and Zarate nodes in a single job, for example. The following table shows all availble queues:

The process of submitting a job in Kabré could be divided in four steps: writting a SLURM file, queueing your job, monitoring jobs and retrieving results.

Writing a SLURM file

This configuration file tells the queue system all it needs to know about your job, so it can be placed in the right queue and executed. Lets try it out with a minimun working example. Below is a C code that approximates the value of pi using a Montecarlo method. Log into Kabré, copy the text to a file and save it with name pi_threads.c .

#include <pthread.h>
#include <math.h>
#include <stdlib.h>
#include <time.h>
#include <stdio.h>

typedef struct {
 int num_of_points;
 int partial_result;
} pi_params;

void * calc_partial_pi(void * p){
 pi_params * params = p;
 int count = 0;
 double r1, r2;
 unsigned int seed = time(NULL);

for(int i = 0; i < params->num_of_points; ++i){
 r1 = (double)rand_r(&seed)/RAND_MAX;
 r2 = (double)rand_r(&seed)/RAND_MAX;
 if(hypot(r1, r2) < 1)
 params->partial_result = count;

int main(int argc, char * argv[]){

if(argc != 3){
 printf("Usage: $ %s num_thread num_points\n", argv[0]);

int num_threads = atoi(argv[1]);
 int num_points = atoi(argv[2]);
 int num_points_per_thread = num_points / num_threads;

pthread_t threads[num_threads];
 pi_params parameters[num_threads];

for(int i = 0; i < num_threads; i++){
 parameters[i].num_of_points = num_points_per_thread;
 pthread_create(threads+i, NULL, calc_partial_pi, parameters+i);

for(int i = 0; i < num_threads; i++)
 pthread_join(threads[i], NULL);

double approx_pi = 0;
 for(int i = 0; i < num_threads; i++)
 approx_pi += parameters[i].partial_result;
 approx_pi /= (num_threads * num_points_per_thread) / 4;

printf("Result is %f, error %f\n", approx_pi, fabs(M_PI-approx_pi));


Currently, you are in a login node, so it is OK to compile the code there, do so by typing:

$ gcc -std=gnu99 pi_threads.c -lm -lpthread -o pi_threads

The following is an example SLURM file. All lines starting with #SBATCH are configuration commands for the queues system. Options shown are the most common, and possibly the only ones you will need. The body of a SLURM file is bash code. Copy the example in a file named pi_threads.slurm

Configuration Description
–job-name=<job_name> Specify the job’s name
–partition=<partition_name> In which queue should it run
–ntasks=<multiply X*Y> Number of process to run
X Requested number of nodes
Y Process per node
–time=<HH:MM:SS> Approximate job’s duration
#SBATCH --job-name=pi_threads
#SBTACH --partition=phi-n1h72
#SBATCH --ntasks=<multiply X*Y>
#SBATCH --time=00:15:00

time ./pi_threads 64 100000000000

Now, from the command line, invoke the queues submitter:

$ sbatch pi_threads.slurm

And that’s all! Your job will be queued and executed, in this case, on a Xeon Phi node.

Monitoring your active jobs

A pasive way of monitoring your jobs is to indicate SLURM to send an email when done. This can be configured in the SLURM file using the following options:

Configuration Description
–mail-user=<email> Where to send email alerts
–mail-type= <BEGIN|END|FAIL|REQUEUE|ALL> When to send email alerts
–output=<out_file> Name of output file
–error=<error_file> File name if segregated error log is desired
–account=<account_id> Which account to charge cpu time
–job-name=<job_name> Job name
#SBATCH --mail-type=ALL

After submitting your job you can check its status with these commands:

Configuration Description
myqueue Check your queued jobs
squeue <-u username> Check jobs for a specific user
sstat <job_id> Displays resource usage
sinfo Display all nodes (with attribute)
scontrol show job <job_id> Status of particular job
squeue -j <job_id> –start Get estimated start time of job
mybalance [-h] Check remaining core-hour allocation
scancel <job_id> Delete job

If your job lasts only a few minutes, you can actively monitor its progress with qstat command:

$ watch -n 5 qstat -a

This will execute qstat command every 5 seconds. Your job will appear with the name specified in the PBS file. This method requires to keep open a ssh session in Kabré. If you want to check your job’s progress with your cellphone or other device without opening an ssh connection, CeNAT has a webpage that runs qstat command every 5 seconds and shows its output. Just go to

Valid Job States

To understand the Job State Codes that you may encounter check the following:

Code State
CA Canceled
CD Completed
CF Configuring
CG Completing
F Failed
NF Node Fail
PD Pending
R Running
TO Timeout

Retrieving results

By default, every job will generate two output files, corresponding to standard output and standard error, following the name convention below:


You can copy those files to your local computer o run another script for post-processing the output.

Interactive jobs

Sometimes you want to have direct access to some node. Using ssh directly is a bad practice, because the queue system could send someone else’s job to execute on the node you are currently using. The polite way to ask for direct access is through an interactive job. This time you don’t need a SLURM file, the only required information is which queue you want to use. For example:

$ salloc --partition=phi-debug

starts an interactive job in some Zarate node. Interactive jobs are allowed in debug queues only. The next command opens an interactive job on a Tule node and reserves a GPU for your use:

$ salloc --partition=gpu-debug

Environment modules

Different users have different needs, sometimes those needs could be conflincting, for example, multiple versions of the same library. These situations are solved with environment modules. A typical case is different versions of Python. To exemplify, ask for the queue in which it should be run by typing:

$ SBATCH --partition=zarate-0a.cnca

Go ahead and type $ python, you should get into the default pyhton interpreter, the header should be like this:

Python 2.7.5 (default, Nov 6 2016, 00:28:07) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

Besides the default interpreter, you can execute Intel Distribution for Python, a specifically tunned compilation with packages commonly used in scientific computing. To get intel python, type

module load intelpython/3.5

Now, type again $ python, you will get a different header:

Python 3.5.2 |Intel Corporation| (default, Oct 20 2016, 03:10:33) 
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux
Type "help", "copyright", "credits" or "license" for more information.
Intel(R) Distribution for Python is brought to you by Intel Corporation.
Please check out:

To check which modules are already loaded, type

$ module list

To get a list of all available modules, type

$ module avail

Behind scenes, module command is just configuring paths, aliases and other environment variables, so, modules are loaded only for the current shell session. You can request specific modules in you jobs, just add «module load module_name» lines to SLURM file body, below all #SBATCH lines and before runing your program.