Further Python Features

Sets

Easy - e.g.

A = {1, 2, 3, 3}
B = {3, 4, 5, 6, 7}

A & B

Comprehension

A list comprehension is a way to create a list by a process that includes a test to filter values, and a map to transform values

list1 = [1, 2, 3, 4, 5]
print([x*2 for x in list1 if x < 4])

Here's a longer example -

def quicksort(aList):
   if aList: # i.e., not an empty list
      pivot = aList[len(aList)//2] # In Python 3, // is integer divide.
      return (quicksort([x for x in aList if x  < pivot]) +
              [x for x in aList if x == pivot] +
              quicksort([x for x in aList if x  > pivot]))
   else:
      return []

p = [3, 5, 6 , 7 , 1, 12, 9]
q = quicksort(p)
print(q)

You need to be careful with the notation.

my_list = [i * i for i in range(10)] 
print(my_list)

produces a list

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

my_dict = {i: i * i for i in range(10)}
print(my_dict)

produces a dictionary

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}

and

my_set = {i * i for i in range(10)} 
print(my_set)

produces a set

{0, 1, 64, 4, 36, 9, 16, 49, 81, 25}

Using defaultdict

Suppose you have this dictionary

citycountries={"Milano":"Italy","London":"England","Manchester":"England","Brighton":"England","Roma":"Italy"}

To find all the cities in each country you could to this, creating a new dictionary containing a list of cities for each country -

country_to_stationsdict={} # create empty dictionary
for city in citycountries:
  country=citycountries[city]
  if country not in country_to_stationsdict:
     # create a new entry containing a list with one city in it
     country_to_stationsdict[country]=[city] 
  else:
      country_to_stationsdict[country].append(city)

The following is tidier

from collections import defaultdict
# The next line says that the value of each dictionary entry is a list,
# which avoids having to treat the first mention of a country as a special case  
country_to_stations=defaultdict(list)

for city in citycountries:
  country=citycountries[city]
  country_to_stations[country].append(city)

In each case, the resulting dictionary can be printed using

for country in country_to_stationsdict:
   print(country + "- ",end="")
   for city in  country_to_stationsdict[country]:
      print(city + " ",end="") 
   print()

Default values when accessing dictionaries

The following creates a dictionary defining the 3 entries of a sparse 2D array

matrix = {(0, 3): 1, (2, 1): 2, (4, 3): 3}

so that matrix would look something like this -

			1

	2

			3

You'd like matrix[(1, 3)] (accessing an entry that hasn't been defined) to have the value 0. Alas, it crashes. Fortunately the following is possible, returning 0 if the entry doesn't exist

matrix.get((1, 3), 0)

Collections

These offer some facilities rather like those of the C++ Standard Templates. Here's how to create and use a double-ended queue with maximum length of 3

import collections
last_three = collections.deque(maxlen=3)
for i in range(10):
    last_three.append(i)
    print (', '.join(str(x) for x in last_three))

And here's how to find the most common number in a list

import collections
A = collections.Counter([1, 1, 2, 2, 3, 3, 3, 3, 4, 5, 6, 7])
A.most_common(1)

outputs

[(3, 4)]

i.e. 3 is the most common number. It appears 4 times

Iterators

Iterators are variables that can be used to access a sequence of values. A simple example is

x=[0, 1, 2, 3, 4]
i=iter(x)
next(i)
next(i)
next(i)

itertools

Some examples -

import itertools

# 0 to 16 in binary
for p in itertools.product([0, 1], repeat=4):
    print (''.join(str(x) for x in p))

# permutations
for p in itertools.permutations([1, 2, 3, 4]):
    print (''.join(str(x) for x in p))

# Sliding windows - from 
# http://sahandsaba.com/thirty-python-language-features-and-tricks-you-may-not-know.html
a = [1, 2, 3, 4]
from itertools import islice
def n_grams(a, n):
    z = (islice(a, i, None) for i in range(n))
    return list(zip(*z))

n_grams(a, 3)
n_grams(a, 2)
n_grams(a, 4)

An example of using iter and lambda functions

a = [1, 2, 3, 4]
# display non-overlapping sequences of length k
group_adjacent = lambda a, k: list(zip(*([iter(a)] * k)))
group_adjacent(a, 3)
group_adjacent(a, 2)
group_adjacent(a, 1)

With old versions of Python, range(0,3) directly produced a list - [0, 1, 2]. Nowadays it produces a special object. To get a list you need to use list(range(0,3)). The reason for this is efficiency. Consider this rather artificial piece of code

target=3
for i in list(range(0,1000)):
   if i==target:
      print("target found!")
      break

It creates a list of a 1000 numbers and looks through them until it finds what it's looking for. Compare that with

target=3
for i in range(0,1000):
   if i==target:
      print("target found!")
      break

This finds the target too. The difference is that a long list isn't created first - each i is given a value only when the comparison with target is about to happen.

It's for this reason that generators are useful too - they generate the values in a sequence only if they're required.

Generators

Doing

mygenerator1 = (x*2 for x in range(3))

creates a generator, not a list. So does the following, because it uses yield rather than return

def createGenerator():
   mylist = range(3)
     for i in mylist:
        yield i*2

mygenerator1 = createGenerator()

Generators are iterators, but they calculate the values on the fly.

for i in mygenerator1:
    print(i)

works the first time it's tried, but generators can only be used once.

Closures

These are functions that return functions. In C++ you might use templates instead. Suppose you want to create a function add10 to return 10 more than its argument, and add5 to return 5 more than its argument. These 2 functions will be very similar, so rather than duplicate code, a closure can be used.

def add_number(num):
     def adder(number):
         return num + number
     return adder

add10=add_number(10)
add5=add_number(5)
add10(37)
add5(37)

Maps, filters, vectorising etc

Many of these examples are from http://www.eg.bucknell.edu/~hyde/Python3/IntroductionToPython.pdf

The following finds the absolute value of the elements (note that abs([-5,-42, 20, -1]) doesn't work).

print(list(map(abs, [-5,-42, 20, -1])))

The filter command is also useful -

def even(x):
    return x % 2 == 0

a = [1, 2, 3, 4, 5]
print(list(filter(even, a)))

Here's another way to run a function on each of the elements in a list.

import numpy as np

def H(x): return (0 if x<0 else 1)

x=np.linspace(-10, 10, 5)
H_vec=np.vectorize(H)
print(H_vec(x))

reduce repeatly uses the provided function on 2 arguments to produce a result that will be used as an argument for the next iteration. Both the uses of reduce below produce 10 - the acculumated total.

def add2(x, y):
   return x + y

import functools

a = [1, 2, 3, 4]
print(functools.reduce(add2, a))
print(functools.reduce(lambda x, y : x + y, a))

functools

The functools package has some useful features -

cache

from functools import cache, lru_cache 

@cache
def myfun(num):
    print("calculating")
    return num*2

@lru_cache(maxsize=5)
def myfun2(num):
    print("calculating")
    return num*2

range1=range(10)
foo=list(range(10))
foo.reverse()
range2=foo
print("Call myfun with a list of numbers using a cache")
for i in range1:
    print(myfun(i))

print("Call myfun with the same list of numbers, reversed")
print("Note that because of caching, myfun isn't called,")
print("which might save a lot of time if myfun is slow")
for i in range2:
    print(myfun(i))


print("Call myfun2 with a list of numbers using an lru_cache, size 5")
for i in range1:
    print(myfun2(i))

print("Call myfun2 with the same list of numbers, reversed.")
print("Note that because of the smaller cache, behaviour is different")
for i in range2:
    print(myfun2(i))

print(myfun2.cache_info())

Eval

How do you make Python process a stored string as if you'd typed it in? Like this

s="3*7-2"
eval(s)

outputs

Else

The use of else isn't restricted to being used with if. In

n = 5
while n != 0:
    print n
    n -= 1
else:
    print "what the..."

the else clause is executed

by hitting the loop condition
by falling off the bottom of a try block.

It is not executed if

you break or return out of a block
you raise an exception.

It works for not only while and for loops, but also try blocks. See https://docs.python.org/3/reference/compound_stmts.html#the-while-statement

Finding out about an object

test = [1, 3, 5, 7]
print(dir(test))

will display the methods available for test. E.g. it shows you that test.reverse() is possible.

How much memory does an object use?

import sys
x=1
print(sys.getsizeof(x))

will display 28, meaning that the x variable uses 28 bytes.

File handling

# this closes the file at the end of the 'with' scope
with open('notes') as fp:
    for line in fp:
       print(line);

# json - Javascript Object Notation

import json
x=[1,2,3];
with open('test.json','w') as fp:
   json.dump(x,fp);

Performance

There are often several ways to perform a task. Especially with large data sets, there may be great differences in speed. Here's an example adapted from "Python for Finance".

import time
from math import *
import numpy as np
import numexpr 

abignumber=30000000
a=range(1,abignumber)
aa=np.arange(1,abignumber)
numexpr.set_num_threads(8)

def f(x):
  return 3*log(x)+cos(x)**2

#Method 1
start_time = time.time()
r=[f(x) for x in a]
end_time = time.time()
print("Time taken=",end_time - start_time, " seconds")

#Method 2
start_time = time.time()
rr= 3*np.log(aa)+np.cos(aa)**2
end_time = time.time()
print("Time taken=",end_time - start_time, " seconds")

#Method 3
start_time = time.time()
f='3*log(aa)+cos(aa)**2'
rrr=numexpr.evaluate(f)
end_time = time.time()
print("Time taken=",end_time - start_time, " seconds")

When I tried this on one of our terminal terminals (in the "DPO") and on a multi-CPU machine ("ts-access") I got these results

Method	DPO time	ts-access time
1	51	34
2	25	3
3	4	0.5

Pandas

The pandas package is installed for data manipulation, analysis and data visualization (of numerical tables and time series). Here's an example adapted from Minutes to pandas

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# create some data
s = pd.Series([1,3,5,np.nan,6,8])
dates = pd.date_range('20130101', periods=6)
# Create a dataframe
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))
# View the data that's in the dataframe
df
# Get some stats and do some sorting
df.describe()
df.sort_index(axis=1, ascending=False)
df.sort_values(by='B')
# Create more data
ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
ts = ts.cumsum()
ts.plot()
plt.show()

Department of Engineering

IT Services