Tag - NVIDIA

NVIDIA Tesla P40 显卡 BIOS 报错 PCI OUT OF RESOURCES

报错如下:

!!!!PCI Resource ERROR!!!!

PCI OUT OF RESOURCES CONDITION:

Error: Insufficient PCI Resources Detected!!!

System is running with Insufficient PCI Resources!
In order to display this message some
PCI devices were set to disabled state!
It is strongly recommended to Power Off the system and remove some PCI/PCI Express cards from the system!
To continue booting, proceed to Menu Option and select Boot Device or .

WARNING: If you choose to continue booting some Operating
Systems might not be able to complete boot correctly!

解决方法:

  • 开机进入 BIOS 设置
  • 找到设置项:BIOS > Advanced > PCIe/PCI/PnP Configuration > Above 4G Decoding
  • 设为 Enabled

ref

TensorFlow 实战 01:安装 GPU 版本的开发环境 (Ubuntu)

这里将介绍如何在 Ubuntu 16.04 LTS 系统上搭建 支持 GPU 的 TensorFlow 1.4.0 开发环境。

是否需要 GPU 支持?

这取决于你有没有一块儿支持 CUDA 的 NVIDIA 显卡。如果没有,只能选择 CPU 版本。如果有,继续往下看。

安装 NVIDIA 依赖

  1. 安装 CUDA Toolkit 9.0:在 CUDA Downloads 页面选择操作系统及版本,安装类型选择deb (network),最后会给出一个下载链接和一系列的命令,类似:
    sudo dpkg -i cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
    sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
    sudo apt-get update
    sudo apt-get install cuda
    
    修改 PATH 环境变量
    export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}
    
    修改 LD_LIBRARY_PATH 环境变量
    export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
    
  2. 安装显卡驱动,目前最新的驱动版本是 384
    sudo apt install nvidia-384
    
    驱动安装成功后,可以使用下面的命令查看显卡状态:
    nvidia-smi
    
  3. 安装 cuDNN 7,在 cuDNN下载页面 点击 Download 并填写调查问卷后,根据自己的系统环境下载对应的安装包并安装,以下是 64 位系统的示例:
    sudo dpkg -i libcudnn7_7.0.3.11-1+cuda9.0_amd64.deb
    sudo dpkg -i libcudnn7-dev_7.0.3.11-1+cuda9.0_amd64.deb
    sudo dpkg -i libcudnn7-doc_7.0.3.11-1+cuda9.0_amd64.deb
    
  4. 安装 libcupti-dev库
    sudo apt install libcupti-dev
    
    修改 LD_LIBRARY_PATH 环境变量
    export LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH
    

安装 TensorFlow

安装预编译的包或者从源码编译都是可行的。

原生 pip 安装

python 2.7/3.n 都可以。

  1. 先安装并升级 pip
    # for Python 2.7
    sudo apt-get install python-pip python-dev
    sudo pip install -U pip setuptools
    # for Python 3.n
    sudo apt-get install python3-pip python3-dev
    sudo pip3 install -U pip setuptools
    
  2. 安装 tensorflow,根据需求只执行一条命令即可
    pip install tensorflow      # Python 2.7; CPU support (no GPU support)
    pip3 install tensorflow     # Python 3.n; CPU support (no GPU support)
    pip install tensorflow-gpu  # Python 2.7; GPU support
    pip3 install tensorflow-gpu # Python 3.n; GPU support
    

源码编译安装

  1. git 下载源码仓库,切到 r1.4 分支
    git clone https://github.com/tensorflow/tensorflow
    git checkout r1.4
    
  2. 安装 bazel
    sudo apt-get install openjdk-8-jdk
    echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
    curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add -
    sudo apt-get update && sudo apt-get install bazel
    
  3. 安装 TensorFlow 的 Python 依赖
    sudo apt-get install python-numpy python-dev python-pip python-wheel     # for Python 2.7
    sudo apt-get install python3-numpy python3-dev python3-pip python3-wheel # for Python 3.n
    
  4. 执行安装配置,务必注意每一步的选择
    cd tensorflow # 进入第 1 步克隆的仓库根目录
    ./configure
    Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python2.7
    Found possible Python library paths:
    /usr/local/lib/python2.7/dist-packages
    /usr/lib/python2.7/dist-packages
    Please input the desired Python library path to use.  Default is [/usr/lib/python2.7/dist-packages]
    Using python library path: /usr/local/lib/python2.7/dist-packages
    Do you wish to build TensorFlow with MKL support? [y/N]
    No MKL support will be enabled for TensorFlow
    Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:
    Do you wish to use jemalloc as the malloc implementation? [Y/n]
    jemalloc enabled
    Do you wish to build TensorFlow with Google Cloud Platform support? [y/N]
    No Google Cloud Platform support will be enabled for TensorFlow
    Do you wish to build TensorFlow with Hadoop File System support? [y/N]
    No Hadoop File System support will be enabled for TensorFlow
    Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N]
    No XLA support will be enabled for TensorFlow
    Do you wish to build TensorFlow with VERBS support? [y/N]
    No VERBS support will be enabled for TensorFlow
    Do you wish to build TensorFlow with OpenCL support? [y/N]
    No OpenCL support will be enabled for TensorFlow
    Do you wish to build TensorFlow with CUDA support? [y/N] Y
    CUDA support will be enabled for TensorFlow
    Do you want to use clang as CUDA compiler? [y/N]
    nvcc will be used as CUDA compiler
    Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 8.0]: 9.0
    Please specify the location where CUDA 8.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
    Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
    Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 6.0]: 7
    Please specify the location where cuDNN 6 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
    Please specify a list of comma-separated Cuda compute capabilities you want to build with.
    You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
    Please note that each additional compute capability significantly increases your build time and binary size. [Default is: "3.5,5.2"]: 6.1
    Do you wish to build TensorFlow with MPI support? [y/N] 
    MPI support will not be enabled for TensorFlow
    Configuration finished
    
  5. 编译生成 pip 包
    bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
    bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
    
  6. 安装生成好的 pip 包,具体的 whl 包在 /tmp/tensorflow_pkg 目录下,文件名可能略有不同
    sudo pip install /tmp/tensorflow_pkg/tensorflow-1.4.0-cp27-cp27mu-linux_x86_64.whl
    

验证一下是否装成功了

  1. 启动 Python
    $ python
    
  2. 逐行敲入下面的代码
    # Python
    import tensorflow as tf
    hello = tf.constant('Hello, TensorFlow!')
    sess = tf.Session()
    print(sess.run(hello))
    
    如果能看到下面的输出
    Hello, TensorFlow!
    
    恭喜你,安装成功了……

更多的细节

请参考详细的官方文档

  1. Installing TensorFlow on Ubuntu
  2. Installing TensorFlow from Sources
  3. NVIDIA CUDA Installation Guide for Linux
  4. NVIDIA cuDNN
  5. CUDA GPUs