max planck institut
informatik
MPII Max Planck Society

HLCV SS21 - GPU Tutorial




Summary

Connect to the VPN of UdS

The GPU server is only accessible using the UdS network.

If you use the GPU server at home, you need to connect to the VPN of UdS, following this instruction:

https://www.hiz-saarland.de/dienste/vpn

Download the demo of a PyTorch project

As we recommend you to use PyTorch for your final project, we provide a demo of a PyTorch project.

You may use the following link to download it:

https://gitlab.mpi-klsb.mpg.de/yaoyaoliu/hlcv-ss21-gpu-tutorial/-/archive/main/hlcv-ss21-gpu-tutorial-main.zip

Open the terminal

For macOS, Debian, and Ubuntu, you may use the Terminal application provided by the system.

For Windows, you may use PuTTY.

Upload the demo to the GPU server

Unzip the downloaded file.

For macOS, Debian, and Ubuntu, you may use the following command to upload the demo to the server:

scp -r hlcv-ss21-gpu-tutorial-main hlcv_team000@conduit.cs.uni-saarland.de:~

Please replace hlcv_team000 with your own account.

For Windows, you may use WinSCP or FileZilla.

Login to the GPU server

In the Terminal, you may use the following command to login to the GPU server:

ssh hlcv_team000@conduit.cs.uni-saarland.de

Running the PyTorch demo

Open the folder of the PyTorch demo:

cd hlcv-ss21-gpu-tutorial-main

Submit your job

condor_submit pytorch_docker.sub

Check the state of your job in the condor queue:

condor_q

Aanalyze how many machines can run your job or if there are problems:

condor_q -analyze
condor_q -better

Overview of machines in the cluster:

condor_status

Using the interactive job for debugging

Open the .py file with vim:

vim pytorch_classifier.py

Use the Python Debugger library to set up the breakpoint:

import pdb
pdb.set_trace()

Submit an interactive job:

condor_submit -i pytorch_docker_interactive.sub

Open the folder of the PyTorch demo:

cd /home/hlcv_team000/hlcv-ss21-gpu-tutorial-main

Please replace hlcv_team000 with your own account.

Run the PyTorch code for debugging:

CUDA_VISIBLE_DEVICES=0 python pytorch_classifier.py

The interactive jobs are killed automatically after one hour to allow other users to get an interactive slot. So please don’t directly run your code with an interactive job.

Important tips

Contact

For futther questions, you may contact the TAs of HLCV using this mailing list:

hlcv-ss21@lists.mpi-inf.mpg.de.


Copyright © 2021 Max Planck Institute for Informatics | Imprint | Data Protection