Assignment 02

In this assignment, we will conduct a collaborative project testing certain theoretical hypotheses in Deep Learning. In particular, each of you will build your own personal SLURM cluster on Google Compute Engine (GCE) using elasticluster and then run massive computational experiments using clusterjob. We then collect and analyse all the results you will generate and document our observations. Please follow the following step to setup your cluster and run experiments. This documents only contains the detail of setting up your cluster and testing that it works properly with GPUs. Once these steps are completed, you should conduct your experiments as assigned to you on Canvas. The details of the experiment will only be available via Stanford Canvas website to students who are taking this course for credit.

Acknowledgements

FAQ

Please visit the frequently asked questions before you submit a question on our Google group.

Building your cluster on Google Cloud Platform

To create your own cluster on Google Compute Engine, you should take the following 4 steps:

  1. Setup Google Compute Engine
  2. Install Docker
  3. Create your cluster using dockerized ElastiCluster
  4. Test your cluster with ClusterJob

Part-1: Setup Google Compute Engine

For more info on obtaining your Google credentials, you may visit googlegenomics

Part-2: Install Docker

Docker containers provide an easy way for us to use elasticluster. In fact, we have already dockerized elasticluster for Stats285 and so we will use this docker images which comes with elasticluster installed. To use this image on your personal computer, follow the following steps:

Part-3: Create your cluster using ElastiCluster

In this part, you will make a container out of the image you pulled in Part 2. This container has in itself elasticluster installed for easy use. Follow the following steps to launch your own cluster.

Part-4: Test your cluster with ClusterJob

After you have launched your cluster successfully, it is time to test it by running a small job using ClusterJob on it. Follow the instructions below to test your cluster:

If everything makes sense, move on to running your assigned Deep Learning experiments.

Go back