1
Programming in Python
Michael Schroeder Sven Schreiber
sven.schreiber@tu-dresden.de
Updates by Andreas Henschel
Lecture 2: Sequences
Slides derived from Ian Holmes, Department of Statistics, University of Oxford
Programming in Python Lecture 2: Sequences Michael Schroeder Sven - - PowerPoint PPT Presentation
Programming in Python Lecture 2: Sequences Michael Schroeder Sven Schreiber sven.schreiber@tu-dresden.de 1 Slides derived from Ian Holmes, Department of Statistics, University of Oxford Updates by Andreas Henschel Overview Types of
1
sven.schreiber@tu-dresden.de
Updates by Andreas Henschel
Lecture 2: Sequences
Slides derived from Ian Holmes, Department of Statistics, University of Oxford
2
3
4
l = [1,2,3,4] l2 = [‘Apple’, ‘Banana’, ‘Orange’] t = (‘sebastian’, ‘m’, 28) t2 = (‘motif’, ‘ATTCG’, ‘E44’)
Construction (Syntax) Accessing Elements
l[0] t[0]
1 sebastian
l.append(3) l[1] = 5 t.append(3) t[1] = 5 l3 = l+[3,2] t3 = t + (‘phd’,’biotec’)
Adding/modifying Elements Concatenating immutable !
5
for x in range(10000): print(x) 1 2 3 ... ... 9998 9999
Excluding last number!
6
7
nucleotides = ['a', 'c', 'g', 't'] print("Nucleotides: ", nucleotides) Nucleotides: ['a', 'c', 'g', 't']
element 0 element 1 element 2 element 3 the list is the collection
Note that the element indices start at zero!
8
a = [1,2,3,4,5] print("a = ",a) b = ['a','c','g','t'] print("b = ",b) c = list(range(1,6)) print("c = ",c) d = "a c g t".split() print("d = ", d) a = [1,2,3,4,5] b = ['a','c','g','t'] c = [1,2,3,4,5] d = ['a','c','g','t'] This is the most common: a comma- separated list, delimited by squared brackets
9
x = ['a', 'c', 'g', 't'] i= 2 print(x[0], x[i], x[-1]) a g t
10
x = ['a', 't', 'g', 'c'] print("x =",x) x.sort() print("x =",x) x.reverse() print("x =",x) x = ['a', 't', 'g', 'c'] x = ['a', 'c', 'g', 't'] x = ['t', 'g', 'c', 'a'] nums = [2,2,5,2,6] nums.append(8) print(nums) print(nums.count(2)) nums.remove(5) print(nums) [2,2,5,2,6,8] 3 [2,2,2,6,8]
11
>>> x=[1,0]*2 >>> x [1, 0, 1, 0] >>> x.pop() >>> x [1, 0, 1] >>> x+=x >>> x [1, 0, 1, 1, 0, 1] >>> x.index(0) 1
pop() obtains and removes the last element of a list multiplying lists concatenating lists with +
index(..) searches for the first occurrence of an element
12
dna = "accACgttAGgtct".lower() replaced = dna.replace("a",“_a") \ .replace("t","a").replace(“_a","t") \ .replace("g",“_g").replace("c","g") \ .replace(“_g", "c") replacedList = list(replaced) replacedList.reverse() print("".join(replacedList))
agacctaacgtggt Start by making string lower case
Convert back to string using join Replace 'a' with 't', 'c' with 'g', 'g' with 'c' and 't' with 'a'
Convert to list and reverse
13
nucleotides = ['a', ’g’, 'c', 't'] print(nucleotides) print(nucleotides[0:2]) # nucleotides[:2] also works print(nucleotides[2:4]) # nucleotides[2:] also works print(nucleotides[-2:]) # takes last two elements print(nucleotides[::2]) # takes every second print(nucleotides[::-1]) # obtains reversed list ['a', 'g', 'c', 't'] ['a', 'g'] ['c', 't'] ['c', 't'] [‘a', ‘c'] [‘t', ‘c', ‘g', ‘a']
14
sentence = ‘This is a complete sentence.’ print(sentence.split()) [‘This’, ‘is’, ‘a’, ‘complete’, ‘sentence’] datarow = ‘Apples,Bananas,Oranges’ print(datarow.split(‘,’)) [‘Apples’,’Bananas’,’Oranges’] cities = [‘Dresden’, ‘Munich’, ‘Hamburg’, ‘Cologne’] print(‘ -> ’.join(cities)) ‘Dresden -> Munich -> Hamburg -> Cologne’
15
16
newlist = [] for x in range(1,11): if x % 2: newlist.append(x**2) Verbose construction of list [1,9,25,49,81] newlist = [x**2 for x in range(1,11) if x % 2] Construction with list comprehension Squares of all odd numbers between 1 and 10
17
sentence = ‘I like MySQL but not Python’ print([(w.lower(), len(w)) for w in sentence.split()])
[(i, 1), (like, 4), (mysql, 5), (but, 3), (not, 3), (python, 6)]
numbers = (1,0,-1,6,3,-2,3,4) sum = sum([x for x in numbers if x >0]) print(sum)
17
Sum up all positive integers in a tuple
18
f = open(‘myfile.txt’, ‘r’) for line in f: if not line.startswith(‘#’): print(line) f.close() #Old number 1234 # New number 5555 # Test 1 1234 5555 1 Returns file handler Loop variable Linewise iteration over file! File mode (r, w, a, ...) with open(‘myfile.txt’, ‘r’) as f: for line in f: if not line.startswith(‘#’): print(line) Shorter and better form File is closed after block!
20
>CG11604 TAGTTATAGCGTGAGTTAGT TGTAAAGGAACGTGAAAGAT AAATACATTTTCAATACC >CG11455 TAGACGGAGACCCGTTTTTC TTGGTTAGTTTCACATTGTA AAACTGCAAATTGTGTAAAA ATAAAATGAGAAACAATTCT GGT >CG11488 TAGAAGTCAAAAAAGTCAAG TTTGTTATATAACAAGAAAT CAAAAATTATATAATTGTTT TTCACTCT
Name of sequence is preceded by > symbol NB sequences can span multiple lines fly3utr.txt
21
with open(‘fly3utr.txt’, ‘r’) as f: for line in f: if line.startswith(‘>’): print(line[1:]) CG11604 CG11455 CG11488
>CG11604 TAGTTATAGCGTGAGTTAGT TGTAAAGGAACGTGAAAGAT AAATACATTTTCAATACC >CG11455 TAGACGGAGACCCGTTTTTC TTGGTTAGTTTCACATTGTA AAACTGCAAATTGTGTAAAA ATAAAATGAGAAACAATTCT GGT >CG11488 TAGAAGTCAAAAAAGTCAAG TTTGTTATATAACAAGAAAT CAAAAATTATATAATTGTTT TTCACTCT
22
name = None length = None with open('fly3utr.txt', 'r') as f: for line in f: line = line.rstrip() if line.startswith('>'): # None -> False if name: print(name, length) name = line[1:] length = 0 else: length += len(line) print(name, length) CG11604 58 CG11455 83 CG11488 69
>CG11604 TAGTTATAGCGTGAGTTAGT TGTAAAGGAACGTGAAAGAT AAATACATTTTCAATACC >CG11455 TAGACGGAGACCCGTTTTTC TTGGTTAGTTTCACATTGTA AAACTGCAAATTGTGTAAAA ATAAAATGAGAAACAATTCT GGT >CG11488 TAGAAGTCAAAAAAGTCAAG TTTGTTATATAACAAGAAAT CAAAAATTATATAATTGTTT TTCACTCT
23
– More flexible, more memory consumption
– Immutable, less memory consumption
– Least memory consumption