Department of Engineering

IT Services

Python 1A Lent booster talk

There are several Python features that you might have copied from one example to the next without knowing quite what they're for, or that you used last year and have forgotten. This document strives to fill in some of these gaps in understanding and provide overviews. It's particularly aimed at 1st year CUED Lent students, whose recent questions have inspired this document.

Software Overview

Here's the software you're likely to use -

  • python - a program that given appropriate text (python code) will make a computer do things. Linux machines and Macs have it installed by default - just open a terminal window and type python
  • Jupyter Notebooks - lets you use documents that contain live code, equations, visualizations and text. You can use them "in the cloud" (Microsoft Azure Notebooks) or install the facility on your own machine.
  • Anaconda - a suite of programs related to Python (including python itself and Jupyter Notebooks)
  • Visual Studio code - an editor that's part of the Anaconda suite. You can use any editor to write Python code, but VS Code has some helpful Python-specific features.
  • pip - a program that installs python packages (packages are add-ons, providing extra functions)
  • git - a "version control system". It lets you save (and recover) versions of files. It stores files on your machine by default. It's used from the command line. On systems (like Windows) that don't have much of a command-line facility, git sometimes comes packaged with a unix-like command line and terminal program (e.g. git bash)
  • GitLab - a cloud facility that works with git, letting you share code and have access to it wherever you are

Workflow overview

For CUED 1st year Lent students working in pairs the idea is that you

  • write your code using VS Code on your laptop
  • use git commit ... whenever you reach a minor milestone, to store versions on your laptop
  • store files on GitLab using git push whenever your code is ready for team-partners to use.
  • use git pull to get your team-partners' new code onto your laptop

You should write tests and run them locally before pushing your code to GitLab. The tests won't be run unless you use py.test - see the coursework documentation for details.

Your GitLab account is set up to use Continuous Integration, which means that when you store your newest version on GitLab, GitLab will automatically test the code using the tests you've written. Working this way, you'll always have a working (though perhaps incomplete) version of your program on GitLab. Wherever you are in the world you can download your current version.

Your team-partners will be putting code on GitLab too. So that you're in sync with them you should download ("pull") code from GitLab to integrate their work into your local version. The diagram on the right shows the places where git stores files.

If you're working on your own without the internet, git is still useful - you don't have to use GitLab.

The command line

Terminal windows and command lines may all look the same, but they're not (especially on Windows, where some will give you access to git or python, and some won't). Here are some useful commands and short-cuts that work in Mac, Linux and "git bash" terminal windows.

  • pwd
    (Print Working Directory) Display on the screen the name of the current folder/directory
  • cd XYZ
    (Change Directory) move to a folder called XYZ within the current folder/directory (it's a bit like clicking on a folder icon)
  • cd
    move to your home folder/directory
  • cd ..
    move "up" into the enclosing folder/directory
  • ls
    LiSt the files and folders in the current folder/directory
  • rm filename[s]
    (ReMove) Remove the files
  • If you press the TAB key, the system will try to complete a filename/foldername. For example, if the only folder you have in the current folder is called 1AComputing then typing
       cd 1A
    
    and pressing the TAB key will give you the completed command. If there's more than one way to complete the command, press the TAB key twice to get a list of alternatives.
  • You can retrieve previous commands using the up arrow key. When you press the Return key the whole line will be executed even if the cursor isn't at the right end of the line.
  • Typing Ctrl r followed by a few characters then Return will recover the last command line containing those characters.
  • You can run system programs by typing their name - e.g. typing firefox runs the Firefox browser. If you run a program and it seems to be stuck, you can kill it by pressing the Ctrl and C keys down at the same time.
  • Type man command to find out more about a command (man is short for "manual").

Anaconda offers an "Anaconda prompt" option on Windows. This starts a window in which you can run python and test python scripts from the command line, but it's a DOS/Windows command-line, so it doesn't understand most of the unix commands in the above list. For example, it doesn't understand ls - you need to use dir to find out which files are in the folder.

Paths, and running programs from the command line

Students frequently have problems accessing programs and packages that they think they've installed. Terminal windows, python and package installers each have different ways of trying to find what you ask for, and if you have more than one version of python on your machine, it's easy to install a python package for one version but not the other. The advantage of using a suite like anaconda is that anaconda's pip installer and editors know where anaconda's python program is, and behaves accordingly.

In the next few sections I'll deal with how the system looks for things. You'll only need to know about this if things go wrong. Alas, they sometimes do.

When you open a terminal window program, it starts in a particular folder/directory. You can find out what that place is by typing pwd. If you type the name of a program (python, say) the terminal window has to work out which of the potentially many programs called python on the computer to run. If you specify a particular file location by giving the full name of the file, it will run that. For example, typing

  /usr/local/apps/anaconda3/bin/python

on the teaching system runs that particular version of Python. If you just type the short name, the system will look in a pre-set list of folders until it finds a matching program. On unix you can see that list by typing

  echo $PATH

The system's applications folder is on the path, so to run a system command you need only type its name and press Return. Often the current folder isn't on the PATH, so even if you're in the folder where a program is, you can't run it by typing its short name. Typing

    ./program_name

will work though, because a full-stop means "the current folder"

The upshot of all this is that when you open a terminal window, don't expect to be able to run a program just by typing its name. On CUED's linux system, you may well be lucky, because we've installed programs centrally on the system so that they're easy to find. On your machines you may have installed programs in various places, and you may have more than one "terminal window program", so use pwd, cd and ls (or dir on DOS/Windows) to find out where you are. Note that git will only work the way you want if the terminal window you're running it in is in the top folder where your python files are.

If you have spaces in the name of the folder where you've installed anaconda, or the name uses unicode (i.e. non-Western-European fonts) you may find that parts of the anaconda suite fail - see Windows installation unstable ...

Further details are on Correcting Install Paths.

Packages and modules

A module is a single Python file. A package is a directory of Python modules containing an additional __init__.py file. That file distinguishes a package from a directory that just happens to contain a bunch of Python scripts. When a package is imported, the __init__.py file is executed.

Functions, Signatures and Methods

A function is a callable bit of code. A method is a function that belongs to a class (i.e. it's part of something else). A function signature describes the inputs a function expects (and maybe the return value and type, and exceptions that might be thrown, etc).

What is self?

Let's suppose you wanted to create a new type of thing in Python - a person. We'll give each person a name. Here's some code to create the Class, then create a variable of that type.

class person:
   def __init__(self,thename):
      self.name=thename

p1=person("ali")

Suppose now that we wanted to write a function to print a particular person's name. There are 2 ways we can go about this

  • Write a function that is given the person as an input parameter -
    def printname(theperson):
       print(theperson.name)
    
    printname(p1)
    
  • Write a method - a function that belongs to the class. Note that the method goes inside the class definition and has in the definition an input parameter self, which refers to the person whose printname method is being called. So in the example below the class has been modified. When p1.printname() runs, self will refer to p1
    class person:
       def __init__(self,thename):
          self.name=thename
       def printname(self):
          print(self.name)
    
    p1=person("ali")
    
    p1.printname()
    

Which is preferable? It's not always clear. In the Lent exercise you're told when to write a method and when to write a function.

Installing with pip

Note that when you install packages with pip you can try to install the code in amongst the standard installed packages or (if you use the --user option) in your own filespace. On the standard CUED machines that are centrally managed you can't add system files, so you'll have to use the --user option. The files are installed under your home folder, and python should automatically find them (for me at the moment they're installed in in my home folder in .local/lib/python3.6/site-packages). If you start to use a different version of python on a machine you might have to re-install the extra packages.

import

Many features you want to use aren't built into Python - you have to import them. You can't import them until they're on the machine you're running Python on. Depending on how you import, you sometimes have to mention the module/package when you use a function. A example should clarify the issue. The supplied Task2D.py file begins with

import datetime
from floodsystem.stationdata import build_station_list

The first line lets you use functions in datetime but only if you say where they came from - e.g. to use datetime's timedelta function you have to use datetime.timedelta, whereas you just use build_station_list to use that function - you don't use floodsystem.stationdata.build_station_list because the "import" line's already told python where to get the function from.

There's another way to use import. Suppose you had a file called data.py containing

x=13
y=7

You could then write a file containing

from data import *
print(x,y)

This works because the * gives you access to everything in data.py. It's not recommended though because in general you don't know what's in the file you're importing - its variables might interfere with yours.

"import" not finding packages

Sometimes import won't find a package even though you know it's on your machine. The most likely reasons are -

  • You installed in a folder where your version of python doesn't look (perhaps because you used a version of pip that didn't come with your python)
  • You're importing packages from various folders without being careful enough about which files you are running (e.g. in the 1st year Lent exercise, don't run geo.py directly - run code in the folder above.)

If problems persist, you'll need to investigate your set-up. When you run or import a python file, which of the many files with that name in the system will be used? You can find out where python looks for imported files by running

import sys
print(sys.path)

Python will look in these folders in order, looking for a file with the requested name.

In Unix a variable called PYTHONPATH (a list of folders separated by colons) shows extra folders where python will look for imported files. This variable belongs to the shell (the command-line interpreter) From the command line you can use echo $PYTHONPATH to see if it's set to anything. If you keep all your python modules in a folder call pythonstuff in your home folder, you can make the modules importable from anywhere using

export PYTHONPATH=${PYTHONPATH}:~/pythonstuff

at the start of your session. This will affect everything subsequently done from the window you type the command in. It won't affect what happens in a completely separate window.

When you have lots of python files in several folders, you need to be careful about importing. An import command that works in one file may well not work in another that's further down the file system because by default the module being imported will not be looked for in children or parent folders.

Modules often need to refer to each other. Such references are so common that the import statement first looks in the containing package (i.e. the current folder) before looking in the standard module search path. Dots are used with the imported filenames to specify where Python should look. A single dot represents the current package. More dots refer to parent(s) of the current package. So if pack1 is imported by program1, and pack1.py has

from .pack2 import fun2

(note the extra dot) then pack2 will be sought first in the same folder as pack1.py

The files provided for the 1A Lent work are written on the assumption that you'll be running Task1A.py, etc directly but not geo.py directly. If you're having problems with packages not being found, search for "python relative import" online details, or better still just run files that are in your top folder.

A Python Import Tutorial for Beginners is friendly, but doesn't tell you everything.

Running/importing scripts

Scripts often end with something like

   if __name__ == '__main__':
      ...

Python scripts can be imported so that the functions in them can be used by other scripts. Python scripts can also be run directly from the command line. Ideally almost every importable Python module can do something useful if called from the command line. The special line makes this possible. __name__ == '__main__' is only true if the file is called rather than imported, so if you wanted function fun1 to be called when the file it's in is executed but not when the file's imported, use

   def fun1():
      print("hello");

   if __name__ == '__main__':
       fun1()

Comprehensions

Many tasks involve going through a list and picking out certain items of interest, creating a list of those items. Here's a simplified example, putting odd numbers into a list called answers

numbers=range(10)
answers=[]
for i in numbers:
  if i%2 == 1:
    answers.append(i)

Comprehensions were designed to deal with this common task. You don't have to use them, but they'll make your code shorter and faster. The following code does the same as the fragment above.

numbers=range(10)
answers=[ i for i in numbers if i%2 == 1 ]

Understanding python error messages

The error messages that Python prints aren't always easy to understand. At least you'll be told which line in which file the error was detected. You may also be able to deduce what the problem is, but you'll need to understand the jargon. Here's an example of a common bug. Suppose this is in ex1.py

def trysorting():
  numbers=[1,3,2]
  sortednumbers=numbers.sort()
  print(sortednumbers[1])

trysorting()

This defines a function trysorting then calls it. Running this file produces the following error message

Traceback (most recent call last):
  File "ex1.py", line 7, in 
    trysorting()
  File "ex1.py", line 4, in trysorting
    print(sortednumbers[1])
TypeError: 'NoneType' object is not subscriptable

Note that this a traceback - it shows you not just the line where the error was detected, but any function calls that led to the error. In this case trysorting() was run which eventually caused line 4 to be run. Even though line 4 is singled out, the real cause might be earlier. The error message 'NoneType' object is not subscriptable sounds rather cryptic. A "subscriptable" object is an object that can be subscripted (i.e. something like [1] can be mentioned after its name). sortednumbers isn't a list (which is subscriptable). It's of type None.

The programmer probably thought that the numbers.sort() method would put a sorted list of numbers into sortednumbers. It doesn't. It sorts the values in numbers "in place" - i.e. the order of the values in numbers is changed - and returns None. Replacing numbers.sort() by sorted(numbers) solves the problem - i.e. there was nothing wrong with line 4 after all.

git

git is too big a topic to deal properly with today (which is why I did a talk about it last term - see the notes) but I'll mention a few issues.

Why use version control?

Well, it's useful to be able go back to earlier versions of a program when faults emerge in new ones. And future courses here use git. According to Wikipedia, about 30% of UK permanent software development job ads mention git.

I think most people who write programs or documents use version control even if they do so informally, saving versions as they go along. Maybe they create program_old.py or programApr19.py. Their folders might get a bit cluttered, but at least they don't have to learn a new version control system.

With small, linear, one-file programs this method might be ok. Things become for complicated when the program involves multiple files or collaboration. As soon as there's branching too, informal version control becomes error prone - quite possibly the weak-point of the whole project.

Tips

If your partner created your GitLab site, you may find that you can't push updates, but your partner can. The problem might be that the original site included a protected branch, and your copy has one too. Under the "Settings/Repository" submenu there's a section entitled "Protected Branches" that lists such branches so their permissions can be changed.

In general, if something's not working and you don't understand why, try searching for a fragment of the error message on the web. The "git: Trouble-shooting" document may be useful. Be sure you've set up your editor choice appropriately. Merging might be an issue when your changes overlap those of your colleague. Here's one approach -

  • do "git pull" to download the latest version of the project repository. If git still tells you that you need to merge, you might try "git diff" which should show you differences between the current state of the file and the most recently saved version.
  • If you've made few changes locally, and you just want to get rid of them, you can do "git checkout the_filename" on each file that you want to revert so that your disc version matches the version in your private repository.
  • Alternatively, if you want to merge the changes, you need to use "git merge" then look in your files for lines like
    <<< HEAD
    ...
    =======
    ...
    >>> master 
    
    or similar. This shows the regions where there are differences (the lines between <<< HEAD and ======= come from one version, and the code between ======= and >>> master is the corresponding section from the other version).
  • When you've edited to resolve the differences, you can do "git commit ..." and then "git push" to update the project repository.

If you use VS Code there's an icon (the 3rd one down on the left) that gives access to git if you have it installed. It will show you which files have been changed since you last committed, and offer a menu of commonly used git commands - push, etc

Testing

Testing is an integral part of program-writing. In a style of programming called "Test Driven Development" the tests are typically written before the code. Here's a description of a typical program produced that way - "FITNESSE is 64,000 lines of code, of which 28,000 are contained in just over 2,200 individual unit tests. These tests cover at least 90% of the production code and take about 90 seconds to run".

It's up to you how many tests you write. See the coursework documentation for how to run them on your machine using py.test (which will run from any commmand-line that you can run python from).

Note that whether or not you run tests on your own machine, they'll be run automatically by GitLab when you push your files. The environment on GitLab in which your code will run is determined by how your pipeline is configured by your .gitlab-ci.yml file. Differences between the default GitLab environment and the environment on your machines are

  • The python environment on GitLab is installed fresh each time, so you'll have to install extra packages like haversine each time
  • GitLab has no graphics display. matplotlib expects one.

Versions

Sooner or later you'll come up against problems caused by Python existing in several versions. The main split is between Python 2 and Python 3, but there are also differences between distributions (e.g. which packages and editors are provided).

If you install anaconda on Macs or Linux you'll have at least 2 python packages on your machine, which can become rather confusing. In particular you might find that installing a package for one python program doesn't made it available for other python programs.

There are also differences in behaviours depending on the operating system used (file-opening for example may differ between Windows and MacOS). So when you report a problem it's worth including information about which Python you're using, and on what type of machine. The following code gives some system information that may help

import sys
sys.executable
sys.version_info
sys.path
sys.platform

If you're searching for information about (say) the len command in Python 3, it's worth searching for python 3 len rather than python len.

Learning more

There are many good Python and git tutorials online. Choose one that suits the speed you want to work at, and has the type of exercises you want. It's easy to think up exercises - brain-teasers, maths and games are useful sources. The University provides some bookable courses.