BBBcluster

This work is part of an ongoing project to build a scalable micro computing cluster for building energy management research. In fact, this is the first version on the system. In this tutorial, I will explain how to setup a simple computing cluster using BeagleBone Black development boards. This is intended to serve as a demonstration version of a micro cluster and to introduce tools that will be utilized in later builds. Specifically, the tutorial will explain how to configure the boards to provide easy network communication and file sharing, to establish a manager/workers cluster implementation, and to install a message passing interface system for parallel computing. Later builds will focus on improving the usability (e.g. replacing manager with laptop, cluster ssh, torrent file sharing) and scalability (e.g. power rail, case, cooling) of the system. Future posts will dive deeper into parallel computing code (using MPI4PY) and building energy management applications.

Throughout this tutorial, I will use Note to provide additional information/explanations and Optional to indicate steps that may be helpful but are not required for the build. If you find any errors or solve any bugs, please let me know. I would greatly appreciate it.

Last Tested With:

  • BeagleBone Black Rev C
  • Debian 7.5 (2014-05-14)
  • MPICH2 1.4.1

I. Preparation

1.1 Requirements

To setup a cluster, you will need 2 or more BBBs and a router or switch to provide Ethernet access. You will also need a computer with an SSH client to access each of the boards.

1.2 BeagleBone Blacks

At the time of this writing, BeagleBone Blacks (BBB) are being shipped with Debian pre-installed. Throughout this tutorial, I will assume you are using an officially supported Debian image. If you need to install Debian or update to the latest build, pre-built images can be found at: beagleboard.org/latest-images. Follow the tutorial at beagleboard.org/getting-started to update the OS and make sure that you can log into each board. It is up to you whether your flash the eMMC or boot from an SD card, though the eMMC is purportedly faster.

1.3 Troubleshoot

After setting up a 24 node cluster, I have a few troubleshooting tips.

  • When turning on a BBB, if the PWR light turns on but none of the USR lights, disconnect and reconnect the board. 
  • If the Ethernet lights do not turn on, reboot the board.
  • If a BBB hangs on startup (lights are on/blinking but you are never able to access the board over USB/SSH), re-flash the Debian OS to the eMMC. If the problem persists, re-flash the SD card and then re-flash the eMMC.
  • In my experience, it is pretty easy to corrupt an install. You should avoid inserting/removing the SD card with power connected and minimize contact with the board as much as possible.

II. Setup Nodes

Pick the BBB that will serve as the manager (also called the head or master node) and the nodes that will be the workers (also called computational or slave nodes).  Repeat steps 2.1 through 2.6 for each BBB.

2.1 Update

If needed, check that you have Internet access using ping (Ctrl and C to exit ping).

debian@beaglebone:~$ ping google.com

Update and upgrade the boards (this might take a few minutes).

debian@beaglebone:~$ sudo apt-get update
debian@beaglebone:~$ 
sudo apt-get upgrade
debian@beaglebone:~$
sudo apt-get clean

Note: If you are using the Debian Image 2014-05-14, you may need to remove one of the links from the apt sources list (GPG Error).

debian@beaglebone:~$ sudo nano /etc/apt/sources.list

Change:

deb [arch=armhf] http://debian.beagleboard.org/packages wheezy-bbb main

#deb-src [arch=armhf] http://debian.beagleboard.org/packages wheezy-bbb main

To:

#deb [arch=armhf] http://debian.beagleboard.org/packages wheezy-bbb main

#deb-src [arch=armhf] http://debian.beagleboard.org/packages wheezy-bbb main

Note: If you are seeing the error E: Sub-process /usr/bin/dpkg returned an error code (1), I suggest re-flashing the eMMC and starting over.

2.2 Password

SSH into the BBB. Change the debian user password (if not already done).

debian@beaglebone:~$ passwd

2.3 Static IP

Each BBB will need to be assigned a static IP address.

Check the IP configuration:

debian@beaglebone:~$ ifconfig

Find the section labeled eth0 and note the inet addr, Bcast, and Mask. Here is what I see:

eth0     Link encap:Ethernet HWaddr d0:5f:b8:fb:dc:a6

         inet addr:192.168.1.139 Bcast:192.168.1.255 Mask:255.255.255.0

Next, edit the /etc/network/interfaces file using your favorite text editor (I am using nano). Find the lines for the primary network interface (eth0) and comment out the dhcp line. Below this, add lines defining the static IP, as shown below. For each BBB, replace X with a different number (I usually choose a number between 40 and 60 or 200 and 250). The first three numbers in the address, network, broadcast, and gateway should match the inet addr above. The network, broadcast, and gateway should end with 0, 255, and 1, respectively. For more on static IPs, check out this post at portforward.com. If using nano, press Ctrl and X to save the changes, press Y to confirm, and Enter to keep the same filename.

debian@beaglebone:~$ sudo nano /etc/network/interfaces
# The primary network interface

#iface eth0 inet dhcp
auto eth0
    iface eth0 inet static
    address 192.168.1.X
    netmask 255.255.255.0
    network 192.168.1.0
    broadcast 192.168.1.255
    gateway 192.168.1.1

Verify that the static IP is working. Reboot or shutdown/reconnect the board (wait for the lights to turn off).

debian@beaglebone:~$ sudo shutdown -h now

Note: If you are rebooting or shutting down a board via SSH, it is helpful to add && exit to the end of the command to logout before the connection is closed.

sudo shutdown -h now && exit

If you had been connecting to the BBB with USB, now is a good time to connect an external power supply. Power up the board and SSH in using the static IP address assigned above.

2.4 Hostname

Change the board’s hostname (nodem for node manager and node1, node2, etc. for workers).

debian@beaglebone:~$ sudo hostname nodem

And edit the /etc/hostname file. You will see a single line in the file showing the hostname, beaglebone. Change the hostname to nodem (or node1, node2, etc.).

debian@beaglebone:~$ sudo nano /etc/hostname
nodem

Note: If you are seeing the message unable to resolve host nodem, this is because we have not yet update the /etc/hosts file.

2.5 Hosts

To allow each BBB to communicate using hostnames rather than IP addresses, update the /etc/hosts file. Since we have set a static IP, we do not need to define the 127.0.1.1 address (see Debian manual).

debian@beaglebone:~$ sudo nano /etc/hosts

The /etc/hosts file for a 3 node cluster might look like:

127.0.0.1 localhost

192.168.1.40 nodem
192.168.1.41 node1
192.168.1.42 node2

Note: The /etc/hosts file on each node should be identical.

Now is a good time to reboot.

debian@beaglebone:~$ sudo reboot

2.6 Cluster User

Create a new user called cluster and assign sudo privileges. Set a password but feel free to leave the user information blank.

debian@nodem:~$ sudo adduser cluster
debian@nodem:~$ 
sudo adduser cluster sudo

Switch users and user directories:

debian@nodem:~$ su - cluster

III. Setup NFS and SSH Keys

The next few steps are specific to either the manager node or the worker nodes.

3.1 For Manage

For this build, files will be shared using a network file sharing (NFS) server. Install the server, edit /etc/exports to mount the manager node’s cluster directory,  and restart the server. Note: The * indicates that any IP address can mount the directory and tee is a quick method for appending text to the end of a file.

cluster@nodem:~$ sudo apt-get install ssh nfs-kernel-server
cluster@nodem:~$ 
echo '/home/cluster *(rw,sync,no_subtree_check)' | sudo tee -a /etc/exports
cluster@nodem:~$
sudo service nfs-kernel-server restart

Use SSH to generate a key. When asked where to save the file, use the default to store it in the cluster directory (just press Enter).  When asked for a passphrase, leave it blank (just press Enter).

cluster@nodem:~$ ssh-keygen

Note: ssh-keygen generates a public (id_rsa.pub) and a private (id_rsa) key, which are stored in the ~/.ssh directory. Since we have mounted the entire cluster user directory, anyone on the network can access both keys. This makes for easy setup but is a big security problem. The next build will present a better method for generating and sharing keys.

Lastly, add localhost (i.e. the key that every node is sharing) to the list of known hosts (when asked, enter yes) and reboot the system. 

cluster@nodem:~$ ssh-copy-id localhost
cluster@nodem:~$ 
sudo reboot

3.2 For Workers

Install SSH and the network file sharing client. Then, mount the manager node’s cluster directory.

cluster@node1:~$ sudo apt-get install ssh nfs-common
cluster@node1:~$ 
sudo mount nodem:/home/cluster /home/cluster

Check if the directory is mounted (if nothing appears, directory is not mounted).

cluster@node1:~$ mount -l | grep /home/cluster

Edit /etc/fstab to mount the directory on reboots.

cluster@node1:~$ echo 'nodem:/home/cluster /home/cluster nfs defaults 0 0' | sudo tee -a /etc/fstab

A common problem is that directories are mounted before the network connection is established, preventing NFS directories from mounting. This can be fixed by adding mount -a to /etc/rc.local (before exit 0):

cluster@node1:~$ sudo nano /etc/rc.local
mount -a

Reboot the BBB.

cluster@node1:~$ sudo reboot

To verify that the network file sharing is working, create a file in the cluster directory of the manager node and check that it appears in the cluster directory of each worker node.

Also, you should now be able to SSH from the node manager into each of the workers without a password (The SSH key is included in the shared directory. Thus, the node manager can SSH into hosts that have mounted the manager’s cluster directory).

cluster@nodem:~$ ssh cluster@node1

IV. MPI

A popular way of quickly sending and receiving information between nodes in a cluster is with a message passing interface (MPI). I will be using the MPICH implementation of MPI because it is compatible with the BeagleBone Black. Repeat step 4.1 for each BBB and step 4.2 for only the manager node.

Note: At the time of this writing, Debian Wheezy’s repository does not have a recent (3.0+) version of MPICH. However, the repository does have an old version of MPICH (version 1.4.1) that is compatible with BeagleBone Black. The next build will walk through how to build and install the latest version. For the sake of convenience, this build will just use apt-get to install MPI.

4.1 Install

For this step, switch back to the debian user (because the workers do not have rights to the mounted cluster user’s directory).

cluster@nodem:~$ su - debian

Optional: To date, I have been unsuccessful using OpenMPI and so we will begin by making sure it is not installed on the BBB.

debian@nodem:~$ sudo apt-get autoremove openmpi-bin

Now, install MPICH. Note: MPICH2 was remaned simply MPICH in 2012. You will likely find resources that still refer to MPICH2.

debian@nodem:~$ sudo apt-get install mpich2

If needed, you can check for the installed MPICH2 version (I am using 1.4.1). 

debian@nodem:~$ mpich2version

I use Python for most of my work and so I will install the MPI4PY library. Note: Do not install MPI4PY using sudo apt-get install python-mpi4py, because it will install OpenMPI as a dependency. Instead, install the library using pip and pass in the location of the MPI compiler (this will take a few minutes).

debian@nodem:~$ sudo apt-get install python-dev python-pip
debian@nodem:~$ 
sudo env MPICC=/usr/bin/mpicc pip install mpi4py

4.2 Test

To test the cluster, we will only need the manager node (of course, the workers will need to be on and connected to the network). First, switch back to the cluster user and create a machines.txt file in the user directory.

debian@nodem:~$ su - cluster
cluster@nodem:~$ sudo nano machines.txt

In the file, list the hostnames for each node in the cluster. For example:

nodem

node1
node2

Next, create and save the hello.py example.

cluster@nodem:~$ sudo nano hello.py
from mpi4py import MPI

from sys import stdout
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
name = MPI.Get_processor_name()
stdout.write("Helloworld! I am process " \
"%d of %d on %s.\n" % (rank, size, name))

Now, run the program using the following command syntax:

mpiexec –n <number of process> -f <hostlist> python <python script>

For example:

cluster@nodem:~$ mpiexec -n 4 -f machines.txt python hello.py

You should see an output similar to:

Helloworld! I am process 1 of 4 on node1.

Helloworld! I am process 2 of 4 on node2.
Helloworld! I am process 3 of 4 on nodem.
Helloworld! I am process 0 of 4 on nodem.

To truly test that MPI is working, we need to try broadcasting and gathering. The next example will do both. 

cluster@nodem:~$ sudo nano bcast_gather.py
from mpi4py import MPI

from sys import stdout
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
name = MPI.Get_processor_name()
if rank == 0:
    data = { 'key1' : [7, 2.72, 2+3j] }
else:
    data = None
data = comm.bcast(data, root=0) # Process 0 will broadcast
data2 = [rank,size]
data2 = comm.gather(data2, root=0) # Process 0 will gather
stdout.write("Process %d of %d on %s:\n" \
" Received: %s\n Gathered: %s\n" \
% (rank,size,name,data,data2))
cluster@nodem:~$ mpiexec -n 4 -f machines.txt python bcast_gather.py

V. Resources & Credits

If you have made it this far, CONGRATS! You now have a BeagleBone Black computing cluster.

To learn the basics of MPI, I very highly recommend working through the tutorials at mpitutorial.com. The website is completely free, filled with well written tutorials, and includes numerous detailed examples (written in c).

The development of this tutorial was supported as part of a crowdfunding campaign through Experiment.com. I am incredibly thankful for the generous support I have received!

This tutorial took many days of trial and error, debugging, and web searching to complete. I want to particularly thank the following websites, from which much of this tutorial is derived.

 

2 Comments

  1. Abhishek Monas
    April 20, 2016

    Thanks
    It was a great help

    Reply
  2. Ed
    June 2, 2016

    Thanks for this tutorial. It really helped me to get started on building a cluster with the Beagles. Although my language of preference is C. :p

    Reply

Leave a Reply