Python

Python is an interpreted, interactive, object-oriented programming language. It incorporates modules, exceptions, dynamic typing, very high level dynamic data types, and classes. Python combines remarkable power with very clear syntax. It has interfaces to many system calls and libraries, as well as to various window systems, and is extensible in C or C++. It is also usable as an extension language for applications that need a programmable interface. Finally, Python is portable: it runs on many Unix variants, on the Mac, and on PCs under MS-DOS, Windows, Windows NT, and OS/2.

Warning

Starting in May, 2019, the CRC Python module defaults to Python 3. If you are using code based on Python 2, we highly recommend moving to the 3.X branch of Python. Python 2 stopped being maintained in January of 2020 and have been completely removed from RHEL8.

Basic Usage

A default version of Python is available on any of our machines that run Redhat Enterprise Linux. The default version is considered to be extremely stable, but does tend to be several iterations behind the most recent version. You can check the version number with the following command:

$ python3 --version
Python 3.6.8

To provide the additional functionality of more recent versions, the CRC maintains additional Python modules. The current offerings can be seen with this command:

$ module avail python
---------------------------------- /afs/crc.nd.edu/x86_64_linux/Modules/modules/development_tools_and_libraries -----------------------------------
python/3.7.3

To load the default Python module:

$ module load python

In addition, we often install popular packages to go along with the module versions. For example, the packages NumPy and SciPy are available through any of the above listed modules.

Installing Python Packages Locally

If you need a package that is not installed with the CRC version of Python, then you can easily install it locally in your personal AFS space using the following instructions.

‘’pip’’ is a useful tool for installing Python packages, particularly those with many dependencies.

Installing a python package is as easy using the pip3 command:

module load python
pip3 install --user package_name

If the package is distributed as a compressed tar file, such as ‘’package.tar.gz’’:

Download the Python package
Unpack it in your CRC space: tar -xzf package.tar.gz
Change to the unpacked directory: cd package
Install the package: python3 setup.py install --user This will install all of the files in your home directory under ~/.local/
When you load Python, the local package should now be accessible (via ‘’import’’).

Adding ${HOME}/.local/bin to your path

When using the --user option above, a Python package may also install helper applications in addition to source code. By default, these programs will be installed into the directory:

${HOME}/.local/bin

You may always give the full path to the application, but it is often easier to add this directory to the variable $PATH, so that only the name of the program is required to run it. If you are using the BASH shell, the following line will add it:

echo 'export PATH=${HOME}/.local/bin:${PATH}' >> ~/.bashrc

Similarly, for TCSH:

echo 'setenv PATH ${HOME}/.local/bin:${PATH}' >> ~/.cshrc

If you’re not sure which shell you are using, input this command:

echo $0

The next time you log in or source your startup script, programs in ${HOME}/.local/bin will be available without the need for specifying the exact location.

Python Virtual Environment

Having a virtual environment can be useful if you want to install software, but do not want that software to be installed globally. Using virtualenv will allow you to install packages and easily delete them once you are finished.

As an alternative to virtualenv, the CRC supports a Conda module.

To use virtualenv you must first load a python module:

module load python

Next, install the virtualenv package into your user space:

pip3 install --user virtualenv

Note

The virtualenv executable will not be automatically added to your path.

Next, you will need to create a folder named whatever you like and move into it:

mkdir virtualProject
cd virtualProject

Next, create the virtual environment naming it whatever you want:

~/.local/bin/virtualenv NameOfVirtualEnviroment

The virtual environment has been created, but it still needs to be activated:

source NameOfVirtualEnviroment/bin/activate

You will notice that the name of your virtual environment will now appear on the left of the prompt like this:

(NameOfVirtualEnviroment)userName@nameOfMachine:~/virtualProject $

This indicates that your virtual environment is currently active. You are now able to install packages into it without affecting global packages. To deactivate your virtual environment simply by type:

deactivate

If you ever forget which packages you had installed in which virtual environment, type:

pip3 freeze

into an activate virtual environment and the terminal will list which packages are installed. Try creating a virtual environment and installing a package in that environment. Then type pip3 freeze` when the virtual environment is first active and then de-active. You will notice that the packages are different, and that is the whole point of the Python Virtual Environment: to have different packages in different virtual environments without affecting global packages.

When you are finished with a virtual environment and no longer need it or the packages it has installed, simply delete the folder the virtual environment resides in:

rm -rf NameOfFolder

The virtual environment has now been deleted and global packages have not changed at all.

Job Submission Example

The following is a basic template for creating a UGE job submission script for a python job

#!/bin/bash
#$ -M netid@nd.edu      # Email address for job notification
#$ -m abe               # Send mail when job begins, ends and aborts
#$ -q long              # Specify queue
#$ -pe smp 1            # Specify number of cores to use.
#$ -N helloWorld        # Specify job name

module load python

python3 HelloWorld.py

Where HelloWorld.py contains:

#!/usr/bin/env python3

print("Hello World!\n")

Multi-core Jobs

Many python packages are written to take advantage of multiple processing units (cores) to solve problems more quickly by executing code in parallel. Our batch system supports this feature through a two step process. First, you must let the batch system know how many resources you are requesting via the -pe smp flag. Second, and most important, you must also tell Python how many resources to use. By default, Python will assume that it is able to use all resources that it is aware of. In the best case, this can lead to a compute node being overtaxed and slowing down all jobs running. In the worst case, it can crash the node. We tell Python how many cores it is allowed to use through an environment variable named OMP_NUM_THREADS. Typically, we set this variable to the same value as that we set in our #$ -pe smp flag via another variable, $NSLOTS. For example, the following BASH script will run a Python program that is written to take advantage of multiple cores using at most 4 cores:

#!/bin/bash
#$ -q long
#$ -pe smp 4
export OMP_NUM_THREADS=${NSLOTS}
module load python
python3 numpy-test.py

The equivalent (T)CSH script would be:

#!/bin/tcsh
#$ -q long
#$ -pe smp 4
setenv OMP_NUM_THREADS ${NSLOTS}
module load python
python3 numpy-test.py

Further Information

See the official site: Python.org