Further Python Features

Sets

Easy - e.g.

```A = {1, 2, 3, 3}
B = {3, 4, 5, 6, 7}

A & B
```

Comprehension

A list comprehension is a way to create a list by a process that includes a test to filter values, and a map to transform values

```list1 = [1, 2, 3, 4, 5]
print([x*2 for x in list1 if x < 4])
```

Here's a longer example -

```def quicksort(aList):
if aList: # i.e., not an empty list
pivot = aList[len(aList)//2] # In Python 3, // is integer divide.
return (quicksort([x for x in aList if x  < pivot]) +
[x for x in aList if x == pivot] +
quicksort([x for x in aList if x  > pivot]))
else:
return []

p = [3, 5, 6 , 7 , 1, 12, 9]
q = quicksort(p)
print(q)
```

You need to be careful with the notation.

```my_list = [i * i for i in range(10)]
print(my_list)
```

produces a list

```[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
```
```my_dict = {i: i * i for i in range(10)}
print(my_dict)
```

produces a dictionary

```{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}
```

and

```my_set = {i * i for i in range(10)}
print(my_set)
```

produces a set

```{0, 1, 64, 4, 36, 9, 16, 49, 81, 25}
```

Using defaultdict

Suppose you have this dictionary

```citycountries={"Milano":"Italy","London":"England","Manchester":"England","Brighton":"England","Roma":"Italy"}
```

To find all the cities in each country you could to this, creating a new dictionary containing a list of cities for each country -

```country_to_stationsdict={} # create empty dictionary
for city in citycountries:
country=citycountries[city]
if country not in country_to_stationsdict:
# create a new entry containing a list with one city in it
country_to_stationsdict[country]=[city]
else:
country_to_stationsdict[country].append(city)
```

The following is tidier

```from collections import defaultdict
# The next line says that the value of each dictionary entry is a list,
# which avoids having to treat the first mention of a country as a special case
country_to_stations=defaultdict(list)

for city in citycountries:
country=citycountries[city]
country_to_stations[country].append(city)
```

In each case, the resulting dictionary can be printed using

```for country in country_to_stationsdict:
print(country + "- ",end="")
for city in  country_to_stationsdict[country]:
print(city + " ",end="")
print()
```

Default values when accessing dictionaries

The following creates a dictionary defining the 3 entries of a sparse 2D array

```matrix = {(0, 3): 1, (2, 1): 2, (4, 3): 3}
```

so that `matrix` would look something like this -

 1 2 3

You'd like `matrix[(1, 3)]` (accessing an entry that hasn't been defined) to have the value 0. Alas, it crashes. Fortunately the following is possible, returning 0 if the entry doesn't exist

```matrix.get((1, 3), 0)
```

Collections

These offer some facilities rather like those of the C++ Standard Templates. Here's how to create and use a double-ended queue with maximum length of 3

```import collections
last_three = collections.deque(maxlen=3)
for i in range(10):
last_three.append(i)
print (', '.join(str(x) for x in last_three))
```

And here's how to find the most common number in a list

```import collections
A = collections.Counter([1, 1, 2, 2, 3, 3, 3, 3, 4, 5, 6, 7])
A.most_common(1)
```

outputs

```[(3, 4)]
```

i.e. 3 is the most common number. It appears 4 times

Iterators

Iterators are variables that can be used to access a sequence of values. A simple example is

```x=[0, 1, 2, 3, 4]
i=iter(x)
next(i)
next(i)
next(i)
```

itertools

Some examples -

```import itertools

# 0 to 16 in binary
for p in itertools.product([0, 1], repeat=4):
print (''.join(str(x) for x in p))

# permutations
for p in itertools.permutations([1, 2, 3, 4]):
print (''.join(str(x) for x in p))

# Sliding windows - from
# http://sahandsaba.com/thirty-python-language-features-and-tricks-you-may-not-know.html
a = [1, 2, 3, 4]
from itertools import islice
def n_grams(a, n):
z = (islice(a, i, None) for i in range(n))
return list(zip(*z))

n_grams(a, 3)
n_grams(a, 2)
n_grams(a, 4)
```

An example of using iter and lambda functions

```a = [1, 2, 3, 4]
# display non-overlapping sequences of length k
group_adjacent = lambda a, k: list(zip(*([iter(a)] * k)))
```

Generators

Doing

```mygenerator1 = (x*2 for x in range(3))
```

creates a generator, not a list. So does the following, because it uses `yield` rather than `return`

```def createGenerator():
mylist = range(3)
for i in mylist:
yield i*2

mygenerator1 = createGenerator()
```

Generators are iterators, but they calculate the values on the fly.

```for i in mygenerator1:
print(i)
```

works the first time it's tried, but generators can only be used once.

Closures

These are functions that return functions. In C++ you might use templates instead. Suppose you want to create a function `add10` to return 10 more than its argument, and `add5` to return 5 more than its argument. These 2 functions will be very similar, so rather than duplicate code, a closure can be used.

```def add_number(num):
return num + number

```

Maps, filters, vectorising etc

Many of these examples are from http://www.eg.bucknell.edu/~hyde/Python3/IntroductionToPython.pdf

The following finds the absolute value of the elements (note that `abs([-5,-42, 20, -1])` doesn't work).

```print(list(map(abs, [-5,-42, 20, -1])))
```

The `filter` command is also useful -

```def even(x):
return x % 2 == 0

a = [1, 2, 3, 4, 5]
print(list(filter(even, a)))
```

Here's another way to run a function on each of the elements in a list.

```import numpy as np

def H(x): return (0 if x<0 else 1)

x=np.linspace(-10, 10, 5)
H_vec=np.vectorize(H)
print(H_vec(x))
```

`reduce` repeatly uses the provided function on 2 arguments to produce a result that will be used as an argument for the next iteration. Both the uses of `reduce` below produce 10 - the acculumated total.

```def add2(x, y):
return x + y

import functools

a = [1, 2, 3, 4]
print(functools.reduce(lambda x, y : x + y, a))
```

Eval

How do you make Python process a stored string as if you'd typed it in? Like this

```s="3*7-2"
eval(s)
```

outputs

```19
```

Else

The use of `else` isn't restricted to being used with `if`. In

```n = 5
while n != 0:
print n
n -= 1
else:
print "what the..."
```

the else clause is executed

• by hitting the loop condition
• by falling off the bottom of a try block.

It is not executed if

• you break or return out of a block
• you raise an exception.

It works for not only while and for loops, but also try blocks. See https://docs.python.org/3/reference/compound_stmts.html#the-while-statement

```test = [1, 3, 5, 7]
print(dir(test))
```

will display the methods available for test. E.g. it shows you that `test.reverse()` is possible.

How much memory does an object use?

```import sys
x=1
print(sys.getsizeof(x))
```

will display `28`, meaning that the `x` variable uses 28 bytes.

File handling

```# this closes the file at the end of the 'with' scope
with open('notes') as fp:
for line in fp:
print(line);
```
```# json - Javascript Object Notation

import json
x=[1,2,3];
with open('test.json','w') as fp:
json.dump(x,fp);
```

Performance

There are often several ways to perform a task. Especially with large data sets, there may be great differences in speed. Here's an example adapted from "Python for Finance".

```import time
from math import *
import numpy as np
import numexpr

abignumber=30000000
a=range(1,abignumber)
aa=np.arange(1,abignumber)

def f(x):
return 3*log(x)+cos(x)**2

#Method 1
start_time = time.time()
r=[f(x) for x in a]
end_time = time.time()
print("Time taken=",end_time - start_time, " seconds")

#Method 2
start_time = time.time()
rr= 3*np.log(aa)+np.cos(aa)**2
end_time = time.time()
print("Time taken=",end_time - start_time, " seconds")

#Method 3
start_time = time.time()
f='3*log(aa)+cos(aa)**2'
rrr=numexpr.evaluate(f)
end_time = time.time()
print("Time taken=",end_time - start_time, " seconds")
```

When I tried this on one of our terminal terminals (in the "DPO") and on a multi-CPU machine ("ts-access") I got these results

 Method DPO time ts-access time 1 51 34 2 25 3 3 4 0.5

Pandas

The pandas package is installed for data manipulation, analysis and data visualization (of numerical tables and time series). Here's an example adapted from Minutes to pandas

```import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# create some datta
s = pd.Series([1,3,5,np.nan,6,8])
dates = pd.date_range('20130101', periods=6)
# Create a dataframe
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))
# View the data that's in the dataframe
df
# Get some stats and do some sorting
df.describe()
df.sort_index(axis=1, ascending=False)
df.sort_values(by='B')
# Create more data
ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
ts = ts.cumsum()
ts.plot()
plt.show()
```
• © Cambridge University, Engineering Department, Trumpington Street, Cambridge CB2 1PZ, UK (map)
Tel: +44 1223 332600, Fax: +44 1223 332662