Department of Engineering

IT Services

Python 1A Lent booster talk

There are several Python features that you might have copied from one example to the next without knowing quite what they're for, or that you used last term and have forgotten. This document strives to fill in some of these gaps in understanding and provide overviews. It's particularly aimed at 1st year CUED Lent students, whose recent questions have inspired this document (which is a FAQ list, so some details are repeated).

Software Overview

Here's the software you're likely to use -

  • python - a program that given appropriate text (python code) will make a computer do things. Linux machines and Macs have it installed by default - just open a terminal window and type python
  • Jupyter Notebooks - lets you use documents that contain live code, equations, visualizations and text. You can use them "in the cloud" (Microsoft Azure Notebooks) or install the facility on your own machine.
  • Anaconda - a suite of programs related to Python (including python itself and Jupyter Notebooks)
  • Visual Studio code - an editor that's part of the Anaconda suite. You can use any editor to write Python code, but VS Code has some helpful Python-specific features.
  • pip - a program that installs python packages (packages are add-ons, providing extra functions)
  • pytest - a program to run tests. It's also available as a python module, so "python -m pytest" might work.
  • git - a "version control system". It lets you save (and recover) versions of files. It stores files on your machine by default. It's used from the command line. On systems (like Windows) that don't have much of a command-line facility, git sometimes comes packaged with a unix-like command line and terminal program (e.g. git bash)
  • GitLab, GitHub - a cloud facility that works with git, letting you share code and have access to it wherever you are

Workflow overview

For CUED 1st year Lent students working in pairs the idea is that you

  • write your code using VS Code on your laptop
  • use git commit ... whenever you reach a minor milestone, to store versions on your laptop
  • store files on GitLab or GitHub using git push whenever your code is ready for team-partners to use.
  • use git pull to get your team-partners' new code onto your laptop

You should write tests and run them locally using pytest or python -m pytest.

Your GitLab/GitHub account is set up to use Continuous Integration, which means that when you store your newest version on GitLab/GitHub, GitLab/GitHub will automatically test the code using the tests you've written. Working this way, you'll always have a working (though perhaps incomplete) version of your program on GitLab/GitHub. Wherever you are in the world you can download your current version.

When you do "git push" (vscode calls it "sync") your files go to the github site. We've set things up so that more than just file copying happens when you push. The extra behaviour is defined by the .github/workflows/pythonapp.yml we've given you. You don't need to understand all that file, but you will need to change a line or so.

When you push, a [virtual] computer is started up for you on github, running an ubuntu operating system (Linux). Then python is installed, then pip is run to install some python modules, then pytest is run. The outcome of this is recorded under the "workflows" menu of github. If you're getting red crosses, then something has gone wrong. Click until you see the detailed error messages.

Your team-partners will be putting code on GitLab/GitHub too. So that you're in sync with them you should download ("pull") code from GitLab/GitHub to integrate their work into your local version.

If in future projects you're working on your own without the internet, git is still useful - you don't have to use GitLab/GitHub.

The command line

Terminal windows and command lines may all look the same, but they're not (especially on Windows, where some will give you access to git or python, and some won't). Here are some useful commands and short-cuts that work in Mac, Linux and "git bash" terminal windows.

  • pwd
    (Print Working Directory) Display on the screen the name of the current folder/directory
  • cd XYZ
    (Change Directory) move to a folder called XYZ within the current folder/directory (it's a bit like clicking on a folder icon)
  • cd
    move to your home folder/directory
  • cd ..
    move "up" into the enclosing folder/directory
  • ls
    LiSt the files and folders in the current folder/directory
  • rm filename[s]
    (ReMove) Remove the files
  • If you press the TAB key, the system will try to complete a filename/foldername. For example, if the only folder you have in the current folder is called 1AComputing then typing
       cd 1A
    
    and pressing the TAB key will give you the completed command. If there's more than one way to complete the command, press the TAB key twice to get a list of alternatives.
  • You can retrieve previous commands using the up arrow key. When you press the Return key the whole line will be executed even if the cursor isn't at the right end of the line.
  • Typing Ctrl r followed by a few characters then Return will recover the last command line containing those characters.
  • You can run system programs by typing their name - e.g. typing firefox runs the Firefox browser. If you run a program and it seems to be stuck, you can kill it by pressing the Ctrl and C keys down at the same time.
  • Type man command to find out more about a command (man is short for "manual").

Anaconda offers an "Anaconda prompt" option on Windows. This starts a window in which you can run python and test python scripts from the command line, but it's a DOS/Windows command-line, so it doesn't understand most of the unix commands in the above list. For example, it doesn't understand ls - you need to use dir to find out which files are in the folder.

Integrating VScode, git, pip and python

Several students have trouble integrating VScode, git, pip (the package-installer for Python) and python. Here are some tips

  • If setting up VScode is causing a lot of trouble, remember that it's not a necessity. Spyder (usually available as part of anaconda) is an alternative. Actually any simple text editor (or even Word) can be used instead. You can install, test, and run the programs from the same command line where you run git.
  • You may have more than one version of Python on your system. Type python on the command line to find out what version that is. To find out how VScode or Spyder is running your programs, run this python program -
    import sys
    print("Executable:",sys.executable)
    print("Version:",sys.version_info)
    print("Path:",sys.path)
    print("Platform:",sys.platform)
    
    The output will help others diagnose your problems.
  • You may get into a situation where you're installing packages for one python version, but trying to run programs using a different python. If you're running programs via VScode you may need to tell VScode to run a different Python version - search online for terms like "correcting vscode anaconda path".
  • If you have VSCode installed and want to run it from anaconda navigator (the front page of the anaconda suite) look at VS Code missing from Home Page
  • If pip isn't behaving but python is, try "python -m ensurepip"

Paths, and running programs from the command line

Students frequently have problems accessing programs and packages that they think they've installed. Terminal windows, python and package installers each have different ways of trying to find what you ask for, and if you have more than one version of python on your machine, it's easy to install a python package for one version but not the other. The advantage of using a suite like anaconda is that anaconda's pip installer and editors know where anaconda's python program is, and behaves accordingly.

In the next few sections I'll deal with how the system looks for things. You'll only need to know about this if things go wrong. Alas, they sometimes do.

When you open a terminal window program, it starts in a particular folder/directory. You can find out what that place is by typing pwd. If you type the name of a program (python, say) the terminal window has to work out which of the potentially many programs called python on the computer to run. If you specify a particular file location by giving the full name of the file, it will run that. For example, typing

  /usr/local/apps/anaconda3/bin/python

on the teaching system runs that particular version of Python. If you just type the short name, the system will look in a pre-set list of folders until it finds a matching program. On unix you can see that list by typing

  echo $PATH

The system's applications folder is on the path, so to run a system command you need only type its name and press Return. Often the current folder isn't on the PATH, so even if you're in the folder where a program is, you can't run it by typing its short name. Typing

    ./program_name

will work though, because a full-stop means "the current folder"

The upshot of all this is that when you open a terminal window, don't expect to be able to run a program just by typing its name. On CUED's linux system, all will be ok if you use Anaconda, or even just the "Anaconda Shell" in the Applications/Programming menu (the paths are all correctly set up for you). On your machines you may have installed programs in various places, and you may have more than one "terminal window program", so use pwd, cd and ls (or dir on DOS/Windows) to find out where you are. Note that git will only work the way you want if the terminal window you're running it in is in the top folder where your python files are.

If you have spaces in the name of the folder where you've installed anaconda, or the name uses unicode (i.e. non-Western-European fonts) you may find that parts of the anaconda suite fail - see Windows installation unstable ...

For further details, search for terms like "correcting vscode anaconda path".

If vscode can't find git, search for terms like "vscode can't find git", or use git from a terminal window.

Packages and modules

A module is a single Python file. A package is a directory of Python modules containing an additional __init__.py file. That file distinguishes a package from a directory that just happens to contain a bunch of Python scripts. When a package is imported, the __init__.py file is executed.

Functions, Signatures and Methods

A function is a callable bit of code. A method is a function that belongs to a class (i.e. it's part of something else). A function signature describes the inputs a function expects (and maybe the return value and type, and exceptions that might be thrown, etc).

Installing with pip

Note that when you install packages with pip you can try to install the code in amongst the standard installed packages or (if you use the --user option) in your own filespace. On the standard CUED machines that are centrally managed you can't add system files, so you'll have to use the --user option. The files are installed under your home folder, and python should automatically find them (for me at the moment they're installed in in my home folder in .local/lib/python3.6/site-packages). If you start to use a different version of python on a machine you might have to re-install the extra packages.

import

Many features you want to use aren't built into Python - you have to import them. You can't import them until they're on the machine you're running Python on. Depending on how you import, you sometimes have to mention the module/package when you use a function. A example should clarify the issue. The supplied Task2D.py file begins with

import datetime
from floodsystem.stationdata import build_station_list

The first line lets you use functions in datetime but only if you say where they came from - e.g. to use datetime's timedelta function you have to use datetime.timedelta, whereas you just use build_station_list to use that function - you don't use floodsystem.stationdata.build_station_list because the "import" line's already told python where to get the function from.

There's another way to use import. Suppose you had a file called data.py containing

x=13
y=7

You could then write a file containing

from data import *
print(x,y)

This works because the * gives you access to everything in data.py. It's not recommended though because in general you don't know what's in the file you're importing - its variables might interfere with yours.

"import" not finding packages

Some of you are getting ModuleNotFoundError or ImportError: attempted relative import ... complaints when you run your code in geo.py. The documentation about Task1B says "In the submodule geo implement a function ... Provide a program file Task1B.py that uses geo.stations_by_distance" which means put your stations_by_distance function in geo.py but don't run geo.py directly to test it - run Task1B.py. The files provided for the 1A Lent work are written on the assumption that you'll be running Task1A.py, etc directly but not geo.py directly.

If you're still having problems with packages not being found, search for "python relative import" online details, or read on ...

Sometimes import won't find a package even though you know it's on your machine. The most likely reasons are -

  • You installed in a folder where your version of python doesn't look (perhaps because you used a version of pip that didn't come with your python)
  • You're importing packages from various folders without being careful enough about which files you are running (e.g. in the 1st year Lent exercise, don't run geo.py directly - run code in the folder above.)

If problems persist, you'll need to investigate your set-up. When you run or import a python file, which of the many files with that name in the system will be used? You can find out where python looks for imported files by running

import sys
print(sys.path)

Python will look in these folders in order, looking for a file with the requested name.

In Unix a variable called PYTHONPATH (a list of folders separated by colons) shows extra folders where python will look for imported files. This variable belongs to the shell (the command-line interpreter) From the command line you can use echo $PYTHONPATH to see if it's set to anything. If you keep all your python modules in a folder call pythonstuff in your home folder, you can make the modules importable from anywhere using

export PYTHONPATH=${PYTHONPATH}:~/pythonstuff

at the start of your session. This will affect everything subsequently done from the window you type the command in. It won't affect what happens in a completely separate window.

When you have lots of python files in several folders, you need to be careful about importing. An import command that works in one file may well not work in another that's further down the file system because by default the module being imported will not be looked for in children or parent folders.

Modules often need to refer to each other. Such references are so common that the import statement first looks in the containing package (i.e. the current folder) before looking in the standard module search path. Dots are used with the imported filenames to specify where Python should look. A single dot represents the current package. More dots refer to parent(s) of the current package. So if pack1 is imported by program1, and pack1.py has

from .pack2 import fun2

(note the extra dot) then pack2 will be sought first in the same folder as pack1.py

A Python Import Tutorial for Beginners is friendly, but doesn't tell you everything. It's easy to experiment. Here's an example -

Create a main.py file containing

import folder.sub1

folder.sub1.function_in_sub1()

Create a folder called folder. Inside that put a file called sub1.py containing

from .sub2 import function_in_sub2

def function_in_sub1():
   function_in_sub2()
   print("function_in_sub1")

and a file called sub2.py containing

def function_in_sub2():
  print("function_in_sub2")

Think about what should happen when main.py is run, then run it. Change the code, making deliberate mistakes with the importing, and run again.

Running/importing scripts

Scripts often end with something like

   if __name__ == '__main__':
      ...

Python scripts can be imported so that the functions in them can be used by other scripts. Python scripts can also be run directly from the command line. Ideally almost every importable Python module can do something useful if called from the command line. The special line makes this possible. __name__ == '__main__' is only true if the file is called rather than imported, so if you wanted function fun1 to be called when the file it's in is executed but not when the file's imported, use

   def fun1():
      print("hello");

   if __name__ == '__main__':
       fun1()

Comprehensions

Many tasks involve going through a list and picking out certain items of interest, creating a list of those items. Here's a simplified example, putting odd numbers into a list called answers

numbers=range(10)
answers=[]
for i in numbers:
  if i%2 == 1:
    answers.append(i)

Comprehensions were designed to deal with this common task. You don't have to use them, but they'll make your code shorter and faster. The following code does the same as the fragment above.

numbers=range(10)
answers=[ i for i in numbers if i%2 == 1 ]

git

git is too big a topic to deal properly with today (which is why there's a separate talk about it - see the notes) but I'll mention a few issues. If you get yourself in a tangle, don't panic - there's always a way out

Many problems that people have with git are a consequence of being unfamilar with command line usage. If you try git add Task1A.py and git says that it can't find Task1A.py, the odds are that your command line isn't in the folder where Task1A.py is. Use cd.

Why use version control?

Well, it's useful to be able go back to earlier versions of a program when faults emerge in new ones. And future courses here use git. According to Wikipedia, about 30% of UK permanent software development job ads mention git.

I think most people who write programs or documents use version control even if they do so informally, saving versions as they go along. Maybe they create program_old.py or programApr19.py. Their folders might get a bit cluttered, but at least they don't have to learn a new version control system.

With small, linear, one-file programs this method might be ok. Things become for complicated when the program involves multiple files or collaboration. As soon as there's branching too, informal version control becomes error prone - quite possibly the weak-point of the whole project.

Tips

If your partner created your GitLab/GitHub site, you may find that you can't push updates, but your partner can. The problem might be that the original site included a protected branch, and your copy has one too. Under the "Settings/Repository" submenu there's a section entitled "Protected Branches" that lists such branches so their permissions can be changed.

In general, if something's not working and you don't understand why, try searching for a fragment of the error message on the web. The "git: Trouble-shooting" document may be useful. Be sure you've set up your editor choice appropriately. Merging might be an issue when your changes overlap those of your colleague. Here's one approach -

  • do "git pull" to download the latest version of the project repository. If git still tells you that you need to merge, you might try "git diff" which should show you differences between the current state of the file and the most recently saved version.
  • If you've made few changes locally, and you just want to get rid of them, you can do "git checkout the_filename" on each file that you want to revert so that your disc version matches the version in your private repository.
  • Alternatively, if you want to merge the changes, you need to use "git merge" then look in your files for lines like
    <<< HEAD
    ...
    =======
    ...
    >>> master 
    
    or similar. This shows the regions where there are differences (the lines between <<< HEAD and ======= come from one version, and the code between ======= and >>> master is the corresponding section from the other version).
  • When you've edited to resolve the differences, you can do "git commit ..." and then "git push" to update the project repository.

A gentle, very introductory talk about git is online.

If you use VS Code there's an icon (the 3rd one down on the left) that gives access to git if you have it installed. It will show you which files have been changed since you last committed, and offer a menu of commonly used git commands - push, etc

Testing

Testing is an integral part of program-writing (and games writing, as the credits on the right show). In a style of programming called "Test Driven Development" the tests are typically written before the code. Here's a description of a typical program produced that way - "FITNESSE is 64,000 lines of code, of which 28,000 are contained in just over 2,200 individual unit tests. These tests cover at least 90% of the production code and take about 90 seconds to run".

It's up to you how many tests you write, though you should have at least one per function you write. See the coursework documentation for how to run them on your machine. Note that your test code should be run using pytest, not by running them like normal python files. There are various ways to run pytest on your local machine (vscode may offer the option), but you may need to install more things. Typing "pytest" or "python -m pytest" might work. pytest will run all your tests, as long as you name them correctly.

Note that whether or not you run tests on your own machine, they'll be run automatically by GitLab/GitHub when you push your files. The environment on GitHub in which your code will run is determined by how your pipeline/workflow is configured by your .github/workflows/pythonapp.yml file. Differences between the default GitLab/GitHub environment and the environment on your machines are

  • The python environment on GitHub is installed fresh each time, so you'll have to install extra packages like haversine each time
  • GitHub has no graphics display. matplotlib expects one.

Versions

Sooner or later you'll come up against problems caused by Python existing in several versions. The main split is between Python 2 and Python 3, but there are also differences between distributions (e.g. which packages and editors are provided).

If you install anaconda on Macs or Linux you'll have at least 2 python packages on your machine, which can become rather confusing. In particular you might find that installing a package for one python program doesn't made it available for other python programs.

From the command line you want to use, typing

pip --version
pip3 --version
python --version
python3 --version

will tell you whether these programs are available, and which versions you have. The output from pip will tell you which version of Python it installs packages for.

There are also differences in behaviours depending on the operating system used (file-opening for example may differ between Windows and MacOS). So when you report a problem it's worth including information about which Python you're using, and on what type of machine. The following code run from inside python gives some system information that may help if there are problems.

import sys
sys.executable
sys.version_info
sys.path
sys.platform

If you're searching for information about (say) the len command in Python 3, it's worth searching for python 3 len rather than python len.

Learning more

See the Python 1A Mich booster talk or our Advanced Python page.