Install GPU driver + CUDA + cuDNN + Tensorflow on Ubuntu 18.04

 

Installing Nvidia driver, CUDA, cuDNN, Tensorflow-gpu/Keras is not an easy task. We need to figure out how to match driver with hardware, match cuda/cudnn libraries versions(pretty complicated as known), and also need to make sure ML/DL frameworks(e.g., tensorflow) version can be compatible with the installed cuda version, etc. In this article we will introduce how to install Nvidia driver, CUDA, cuDNN, tensorflow-gpu/keras-gpu in Ubuntu 18.04 LTS. The article will cover two ways: one regular way(Method 1) and one simple, easy way(Method 2). Thanks to Anaconda, which makes our life easier!

1. Install Ubuntu (18.04 LTS)

1.1 Create bootable disk

  • Download Ubuntu ISO File: https://ubuntu.com/download/desktop
  • Rufus (use Rufus to create bootable disk, which can be download from here.
  • Tutorials

1.2 Install Ubuntu system

Note: Ubuntu 18.04 defaults to using a swap file instead of the previous method of having a dedicated swap partition. This makes it easier to partition new installations of 18.04 than it was before. In my case, I didn’t choose swap partition like before, and swap is available using sudo swapon --show

$ sudo swapon --show  # swap
$ cat /proc/meminfo   # memory info
$ df -H               # disk info
$ cat /proc/cpuinfo   # cpu info

2. Network Configuration

2.1 gateway/interface setting

  • Check network status before configration: In Ubuntu 18.04 LTS, net-tools is not installed by default, which means, ifconfig or route cannot be used. Instead, we can use ip -c a check the ip information, such as port name, which port status is up, etc. You can also use ping to check the connection. In my case, before setting the gateway correctly, I cannot use internet.
$ ifconfig 
$ ip -c a
$ ip link show
$ ping www.google.com
  • Configuration: In Ubuntu 18.04 LTS, we use netplan to manage network setting.
$ cd /etc/netplan/
$ sudo cp 01-network-manager-all.yaml 01-network-manager-all.yaml.factory-set
$ sudo nano 01-network-manager-all.yaml
$ sudo netplan apply   # make the change effective

01-network-manager-all.yaml(before)

# Let NetworkManager manage all devices on this system
  version: 2
  renderer: NetworkManager

01-network-manager-all.yaml(after)

# Let NetworkManager manage all devices on this system
network:
  ethernets:
     enp0s31f6:  # interface shown in `ip -c a`
        addresses: [your-ip-address/24] # ip address shown in `ip -c a`, use `/24` instead of mask `255.255.255.0`
        gateway4: your-gateway          # gateway
        nameservers:
           addresses: [8.8.8.8, 8.8.4.4] # DNS
  version: 2
  renderer: NetworkManager
  • Check status after configuration: we can use ip r to check the route information. In the meantime, we can also install net-tools, so we can use familiar commands, such as ifconfig and route.
$ sudo apt-get update
$ sudo apt-get install net-tools
$ ifconfig
$ route
$ netstat -i
$ ip r
$ ping www.google.com

2.2 enable ssh

In Ubuntu 18.04 LTS, openssh-server is not installed by default. To install it:

$ sudo systemctl status ssh.service  # no ssh service 
$ sudo apt-get install openssh-server 
$ sudo systemctl status ssh.service  # activated, running

Enable firewall of ubuntu, and enable ssh rule:

$ sudo ufw status
$ sudo ufw allow ssh
$ sudo ufw enable
$ sudo ufw status

3. Install GPU driver + CUDA + cuDNN + tensorflow-gpu

3.1 Install GPU driver

$ sudo lshw -c display
$ sudo ubuntu-drivers devices
$ sudo ubuntu-drivers autoinstall
$ sudo reboot (need to reboot)

Check:

$ nvidia-smi
$ sudo lshw -c display
$ lsmod | grep nvidia
$ lspci | grep -i nvidia
  • CUDA Compatibility: You need to find compatible driver version for your nvidia graphic card, such as CUDA 10.0 (10.0.130) >= 410.48. More details can be found CUDA Compatibility in Nvidia official docs.

  • Different CUDA versions shown by using nvcc --version and nvidia-smi: CUDA has 2 primary APIs: the runtime and the driver API. Both have a corresponding version. In my case, I installed latest 430 driver, when use nvidia-smi, you can CUDA version is 10.2 and I installed CUDA toolkit 10.0, CUDA version is 10.0 when use nvcc --version. More discussions can be found here.

After install driver, we can either use regular way to install CUDA, cuDNN or tensorflow-gpu one by one, or we can install them together while using anaconda. We will go through regular way first so we get idea about the entire setup work. Or, if you just want to setup env and save, you can skip this part; directly go to Anoconda part.

Method 1

Covering 3.2 Install CUDA, 3.3 Install cuDNN, 3.3 Install cuDNN.

3.2 Install CUDA (toolkit)

$ cat /etc/lsb-release 
$ gcc --version 

Select system, architecture, distribution and version, etc. Then, download CUDA Toolkit

  • Install specific CUDA version as needed
$ https://developer.nvidia.com/cuda-toolkit-archive

For example, to install cuda_10.0.130_410.48:

$ wget https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda_10.0.130_410.48_linux
$ sudo sh cuda_10.0.130_410.48_linux.run

check

cd /usr/local/

If everything is ok you should see a cuda folder in /usr/local/.

sudo nano ~/.bashrc

add at the end of the file:

export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

CTL+X to save and exit

source ~/.bashrc

Note: after install driver, you may meet error/warning when you try to install cuda toolkit by using local cuda runfile, showing ...already existed. You can choose continue, and then skip to install cuda driver (all items ready to install labelled + by default, unselect driver parts +).

  • Install default(latest) version (updated till 11/05/2019)

If you choose to use default lateset version, you can either choose runfile (local) or deb (network)

runfile (local) Installation Instructions:

$ wget http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda_10.1.243_418.87.00_linux.run
$ sudo sh cuda_10.1.243_418.87.00_linux.run 

sudo nano ~/.bashrc

add at the end of the file:

export PATH=/usr/local/cuda-10.1/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

CTL+X to save and exit

source ~/.bashrc

deb (network) Installation Instructions:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo add-apt-repository "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
sudo apt-get update
sudo apt-get -y install cuda

3.3 Install cuDNN

  1. Register at nvidia developers, download cuDNN. Download 10.0 runtime & developer library for 18.04 (Files cuDNN7.6.x Runtime Library for Ubuntu18.04 (Deb) & cuDNN v7.6.x Developer Library for Ubuntu18.04 (Deb)).

  2. Open the files with software manager and install them.

  3. Check:

$ cat /usr/include/x86_64-linux-gnu/cudnn_v*.h | grep CUDNN_MAJOR -A 2
$ whereis cudnn.h
$ nvcc --version

3.4 install tensorflow-gpu / keras-gpu

sudo apt-get install libcupti-dev
pip3 install tensorflow-gpu  

Method 2

3.5 Install CUDA toolkit/cuDNN/tensorflow-gpu using Anaconda

NOTE: Before starting following steps, you need to install nvidia driver first, checking 3.1 Install GPU driver

  • Install Anaconda:
$ cd Downloads/
$ sudo apt install curl (if curl was not installed)
$ curl -O https://repo.anaconda.com/archive/Anaconda3-2019.10-Linux-x86_64.sh
$ ls
$ sha256sum Anaconda3-2019.10-Linux-x86_64.sh 
$ bash Anaconda3-2019.10-Linux-x86_64.sh
$ source ~/.bashrc (you may meet 'conda: command not found')
  • Install CUDA, cuDNN, tensorflow-gpu and keras

Create conda env as needed and test gpu works or not. Note latest python3 is python 3.8 and comes with tensorflow 2.0, many 1.x codes cannot run, need to use compatible setting. In this case, we will use 1.x tensorflow and keras.

$ conda create --name tf-gpu
$ conda activate tf-gpu
$ conda install -c anaconda tensorflow-gpu  (tf default version: 2.0)
or:
$ conda install -c anaconda tensorflow-gpu==1.14 (if choose tf version 1.14)
$ conda install keras
$ python test-gpu.py 
$ python test-keras.py
$ conda deactivate

Example: test-gpu.py (using tensorflow)

# Creates a graph.
#import tensorflow as tf
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
c = []
for d in ['/device:GPU:0']:
  with tf.device(d):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
    c.append(tf.matmul(a, b))
with tf.device('/cpu:0'):
  sum = tf.add_n(c)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(sum))

Example: test-multi-gpus.py

import tensorflow as tf
tf.debugging.set_log_device_placement(True)

gpus = tf.config.experimental.list_logical_devices('GPU')
if gpus:
  # Replicate your computation on multiple GPUs
  c = []
  for gpu in gpus:
    with tf.device(gpu.name):
      a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
      b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
      c.append(tf.matmul(a, b))

  with tf.device('/CPU:0'):
    matmul_sum = tf.add_n(c)

  print(matmul_sum)

We can also use keras-gpu to install tensorflow-gpu and keras together. The tensorflow version is 2.0 and keras version is 2.2.4 (updated till 11/05/2019)

$ conda create --name keras-gpu
$ conda activate keras-gpu
$ conda install -c anaconda keras-gpu 

Example: test-keras.py

from keras import backend as K
print(K.tensorflow_backend._get_available_gpus())

[References]

Install Nvidia Driver

Install CUDA, cuDNN

Method 2: using Anoconda

Check GPU works:

Network configuration:

Others: