Department of Engineering

IT Services

Python 1A Lent booster talk

There are several Python features that you might have copied from one example to the next without knowing quite what they're for, or that you used last year and have forgotten. This document strives to fill in some of these gaps in understanding and provide overviews.

Software Overview

Here's the software you're likely to use -

  • python - a program that given appropriate text (python code) will make a computer do things
  • Anaconda - a suite of programs related to Python (including python itself)
  • Spyder - an editor that's part of the Anaconda suite. You can use any editor to write Python code, but Spyder has some helpful Python-specific features.
  • pip - a program that installs python packages
  • git - a "version control system". It lets you save (and recover) versions. It stores files on your machine by default. It's used from the command line. On systems (like Windows) that don't have much of a command-line facility, git sometimes comes packaged with a unix-like command line and terminal program (e.g. git bash)
  • bitbucket - a cloud facility that works with git, letting you share code and have access to it wherever you are

Workflow overview

I've drawn the diagram on the right several times this term already. The idea is that you

  • write your code using spyder on your laptop
  • use git commit ... whenever you reach a minor milestone to store versions on your laptop
  • store files on bitbucket using git push whenever your code is ready for team-partners to use.
  • use git pull to get your team-partners' new code onto your laptop

You should write tests and run them before pushing your code to bitbucket. The tests won't be run unless you use py.test - see the coursework documentation for details.

Your bitbucket account is set up to use Continuous Integration, which means that when you store your newest version on bitbucket, bitbucket will automatically test the code using the tests you've written. Working this way, you'll always have a working (though perhaps incomplete) version of your program on bitbucket. Wherever you are in the world you can download your current version.

Your team-partners will be putting code on bitbucket too. So that you're in sync with them you should download ("pull") code from bitbucket to integrate their work into your local version. A tidier version of the diagram above is on the right - a repository (or repo) is a place where git stores files.

If you're working on your own without the internet, git is still useful.

The command line

Terminal windows and command lines may all look the same, but they're not (especially on Windows, where some will give you access to git or python, and some won't). Here are some useful commands and short-cuts that work in Mac, Linux and "git bash" terminal windows.

  • pwd
    (Print Working Directory) Display on the screen the name of the current folder/directory
  • cd XYZ
    (Change Directory) move to a folder called XYZ within the current folder/directory (it's a bit like clicking on a folder icon)
  • cd
    move to your home folder/directory
  • cd ..
    move "up" into the enclosing folder/directory
  • ls
    LiSt the files and folders in the current folder/directory
  • rm filename[s]
    (ReMove) Remove the files
  • If you press the TAB key, the system will try to complete a filename/foldername. For example, if the only folder you have in the current folder is called 1AComputing then typing
       cd 1A
    
    and pressing the TAB key will give you the completed command. If there's more than one way to complete the command, press the TAB key twice to get a list of alternatives.
  • You can retrieve previous commands using the up arrow key. When you press the Return key the whole line will be executed even if the cursor isn't at the right end of the line.
  • Typing Ctrl r followed by a few characters then Return will recover the last command line containing those characters.
  • You can run system programs by typing their name - e.g. typing firefox runs the Firefox browser. If you run a program and it seems to be stuck, you can kill it by pressing the Ctrl and C keys down at the same time.
  • Type man command to find out more about a command (man is short for "manual").

Anaconda offers an "Anaconda prompt" option on Windows. This starts a window in which you can run python and test python scripts from the command line, but it's a DOS command-line, so it doesn't understand most of the unix commands in the above list. For example, it doesn't understand ls - you need to use dir to find out which files are in the folder.

Paths, and running programs from the command line

When you open a terminal window program, it starts in a particular folder/directory. You can find out what that place is by typing pwd. If you type the name of a program (python, say) the terminal window has to work out which of the potentially many programs called python on the computer to run. If you specify a particular file location by giving the full name of the file, it will run that. For example, typing

  /usr/local/apps/anaconda3/bin/python

on the teaching system runs that particular version of Python. If you just type the short name, the system will look in a pre-set list of folders until it finds a matching program. On unix you can see that list by typing

  echo $PATH

The system's applications folder is on the path, so to run a system command you need only type its name and press Return. Often the current folder isn't on the PATH, so even if you're in the folder where a program is, you can't run it by typing its short name. Typing

    ./program_name

will work though, because a full-stop means "the current folder"

The upshot of all this is that when you open a terminal window, don't expect to be able to run a program just by typing its name. On CUED's linux system, you may well be lucky, because we've installed programs centrally on the system so that they're easy to find. On your machines you may have installed programs in various places, and you may have more than one "terminal window program", so use pwd, cd and ls (or dir on DOS) to find out where you are. Note that git will only work the way you want if the terminal window you're running it in is in the folder where your python files are.

Packages and modules

A module is a single Python file. A package is a directory of Python modules containing an additional __init__.py file. That file distinguishes a package from a directory that just happens to contain a bunch of Python scripts. When a package is imported, the __init__.py file is executed.

import

Many features you want to use aren't built into Python - you have to import them. You can't import them until they're on the machine you're running Python on. Here I'll assume you've already installed the code.

When you run or import a python file, which of the many files with that name in the system will be used? You can find out where python looks for imported files by running

import sys
print(sys.path)

Python will look in these folders in order, looking for a file with the requested name.

In Unix a variable called PYTHONPATH (a list of folders separated by colons) shows extra folders where python will look for imported files. This variable belongs to the shell (the command-line interpreter) From the command line you can use echo $PYTHONPATH to see if it's set to anything. If you keep all your python modules in a folder call pythonstuff in your home folder, you can make the modules importable from anywhere using

export PYTHONPATH=${PYTHONPATH}:~/pythonstuff

at the start of your session. This will affect everything subsequently done from the window you type the command in. It won't affect what happens in a completely separate window.

When you have lots of python files in several folders, you need to be careful about importing. An import command that works in one file may well not work in another that's further down the file system because by default the module being imported will not be looked for in children or parent folders.

Modules often need to refer to each other. Such references are so common that the import statement first looks in the containing package before looking in the standard module search path. Dots are used with the imported filenames to specify where Python should look. A single dot represents the current package. More dots refer to parent(s) of the current package. So if pack1 is imported by program1, and pack1.py has

from .pack2 import fun2

(note the extra dot) then pack2 will be sought first in the same folder as pack1.py

The files provided for the 1A Lent work are written on the assumption that you'll be running Task1A.py, etc directly but not geo.py directly. If you're having problems with packages not being found, search for "python relative import" online details, or better still just run files that are in your top folder.

A Python Import Tutorial for Beginners is friendly, but doesn't tell you everything.

Running/importing scripts

Scripts often end with something like

   if __name__ == '__main__':
      ...

Python scripts can be imported so that the functions in them can be used by other scripts. Python scripts can also be run directly from the command line. Ideally almost every importable Python module can do something useful if called from the command line. The special line makes this possible. __name__ == '__main__' is only true if the file is called rather than imported, so if you wanted function fun1 to be called when the file it's in is executed but not when the file's imported, use

   def fun1():
      print("hello");

   if __name__ == '__main__':
       fun1()

Comprehensions

Many tasks involve going through a list and picking out certain items of interest, creating a list of those items. Here's a simplified example, putting odd numbers into a list called answers

numbers=range(10)
answers=[]
for i in numbers:
  if i%2 == 1:
    answers.append(i)

Comprehensions were designed to deal with this common task. You don't have to use them, but they'll make your code shorter and faster. The following code does the same as the fragment above.

numbers=range(10)
answers=[ i for i in numbers if i%2 == 1 ]

git

git is too big a topic to deal properly with today (which is why I did a talk about it last term - see the notes) but I'll mention a few issues.

Why use version control?

Well, it's useful to be able go back to earlier versions of a program when faults emerge in new ones. And future courses here use git. According to Wikipedia, about 30% of UK permanent software development job ads mention git.

I think most people who write programs or documents use version control even if they do so informally, saving versions as they go along. Maybe they create program_old.py or better still add the date and time to the end of the filename. Their folders might get a bit cluttered, but at least they don't have to learn a new version control system.

With small, linear, one-file programs this method might be ok. Things become for complicated when the program involves multiple files or collaboration. As soon as there's branching too, informal version control becomes error prone - quite possibly the weak-point of the whole project.

Tips

The "git: Trouble-shooting" document may be useful. Be sure you've set up your editor choice appropriately. Merging might be an issue when your changes overlap those of your colleague. Here's one approach -

  • do "git pull" to download the latest version of the project repository. If git still tells you that you need to merge, you might try "git diff" which should show you differences between the current state of the file and the most recently saved version.
  • If you've made few changes locally, and you just want to get rid of them, you can do "git checkout the_filename" on each file that you want to revert so that your disc version matches the version in your private repository.
  • Alternatively, if you want to merge the changes, you need to use "git merge" then look for lines like
    <<< HEAD
    ...
    =======
    ...
    >>> master 
    
    or similar. This shows the regions where there are differences (the lines between <<< HEAD and ======= come from one version, and the code between ======= and >>> master is the corresponding section from the other version).
  • When you've edited to resolve the differences, you can do "git commit ..." and then "git push" to update the project repository.

Testing

Testing is an integral part of program-writing. In a style of programming called "Test Driven Development" the tests are typically written before the code. Here's a description of a typical program produced that way - "FITNESSE is 64,000 lines of code, of which 28,000 are contained in just over 2,200 individual unit tests. These tests cover at least 90% of the production code and take about 90 seconds to run".

It's up to you how many tests you write. See the coursework documentation for how to run them on your machine using py.test (which will run from any commmand-line that you can run python from).

Note that whether or not you run tests on your own machine, they'll be run automatically by bitbucket when you push your files. The environment on bitbucket in which your code will run is determined by how your pipeline is configured by your bitbucket-pipelines.yml file. Differences between the default bitbucket environment and the environment on your machines are

  • The python environment on bitbucket is installed fresh each time, so you'll have to install extra packages like haversine each time
  • bitbucket has no graphics display. matplotlib expects one.

Versions

Sooner or later you'll come up against problems caused by Python existing in several versions. The main split is between Python 2 and Python 3, but there are also differences between distributions (e.g. which packages and editors are provided). There are also differences in behaviours depending on the operating system used (file-opening for example may differ between Windows and MacOS). So when you report a problem it's worth including information about which Python you're using, and on what type of machine. The following code gives some system information that may help

import sys
sys.executable
sys.version_info
sys.path
sys.platform

If you're searching for information about (say) the len command in Python 3, it's worth searching for python 3 len rather than python len.

Learning more

There are many good Python and git tutorials online. Choose one that suits the speed you want to work at, and has the type of exercises you want. It's easy to think up exercises - brain-teasers, maths and games are useful sources. The Univerversity provides some bookable courses.