Tag Archive for 'nodes'

Cluster Installation Finished

The hardware of our cluster is finally installed and ready. All 180 compute nodes (almost) are ready, Infiniband is working and the lustre is mounted.

First Infiniband benchmarks gave us results of about 23 GBit/s which is the expected bandwidth with our QDR network.

As a mirror admin I am bit frustrated that i cannot use the big filesystem which is mounted on every compute node for my mirror server:

172.31.100.222@o2ib,172.30.100.222@tcp:172.31.100.221@o2ib,172.30.100.221@tcp:/lprod
                       29T  819M   28T   1% /lustre/ws1

Now I still need to install the frontend servers. One is used for the users to log in and submit jobs and the other will contain the grid software as this cluster wil be part of the bwGRiD.

80 Nodes Up And Running

80 compute nodes from our cluster are up and running. We are now waiting for more switches and the filesystem servers to finally get the complete cluster (with all compute nodes) operational. To get the remaining nodes operational all I have to do is to add their MAC address to a file and with the magic of some scripts everything else is configured automatically. Unfortunately it all depends on the missing ethernet switches which should arrive any day now.

Cluster Installation: First Nodes Up

Since Monday I am at the High Performance Computing Center Stuttgart (HLRS) and I have started the initial installation of our cluster.The people from the HLRS have offered to support us with the initial installation, which we gladly accepted because they know how to do clusters.

On Monday I installed the three infrastructure servers which are used to control the 180 nodes of the cluster. The cluster is running Scientific Linux and my first task was to get it on those three infrastructure servers.

Those servers have two 500GB disks and they were supposed to be running as software RAID. After the seventh failed attempt to configure the partitions as RAID1 with the Scientific Linux installer we used a Debian install DVD to partition the disks and after the successful configuration of the partitions as RAID1 we installed Scientific Linux on all three systems. Not knowing how to use anaconda to configure a RAID1 (like we wanted to) was a bit embarrassing, but with all the Fedora and CentOS installation I have done I have never configured a software RAID1 from the installer; either the system had only one disk, a hardware RAID controller or I configured the RAID manually after the installation. But at the end of the day all three system were installed and configured for their tasks.

Today (Tuesday) we used the installation to boot the first two nodes of the cluster. All the nodes are running disk-less and are booting over TFTP/NFS from a single read-only image.