programming in python
play

Programming in Python Lecture 3: Patterns and Functions Michael - PowerPoint PPT Presentation

Programming in Python Lecture 3: Patterns and Functions Michael Schroeder Sven Schreiber sven.schreiber@tu-dresden.de 1 Slides derived from Ian Holmes, Department of Statistics, University of Oxford Updates by Andreas Henschel Overview


  1. Programming in Python Lecture 3: Patterns and Functions Michael Schroeder Sven Schreiber sven.schreiber@tu-dresden.de 1 Slides derived from Ian Holmes, Department of Statistics, University of Oxford Updates by Andreas Henschel

  2. Overview • Patterns (Regular Expressions) • Functions and Lambda Functions 2

  3. Patterns 3

  4. What is a pattern? https://commons.wikimedia.org/wiki/Tree 4

  5. Pattern-matching • logical test to ask whether a string contains a pattern • e.g. does a yeast promoter sequence contain the MCB binding site, ACGCGT? name = ‘YBR007C’ 20 bases upstream of dna = ‘TAATAAAAAACGCGTTGTCG’ the yeast gene YBR007C if ‘ACGCGT’ in dna: print(‘%s has MCB!’ % name) The pattern for the MCB binding site The membership operator in YBR007C has MCB! 5

  6. Regular expressions • We already defined a simple pattern: ACGCGT • What if we don’t care about the 3 rd position? => ACGCGT ACCCGT ACACGT ACTCGT • Python provides a pattern-matching engine • Patterns are called regular expressions • They are extremely powerful • Often called "regex" for short • module re 6

  7. Motivation: N-glycosylation motif • Common post-translational modification • Attachment of a sugar group • Occurs at asparagine residues with the consensus sequence NX 1 X 2 , where – X 1 can be anything (but proline inhibits) – X 2 is serine or threonine • Can we detect potential N-glycosylation sites in a protein sequence? 7

  8. Building regexs I: Character Classes • Square brackets define a set of alternative characters (character class) • E.g. [abc] -> matches a,b, or c • Use - to match a range of characters: [A-Z] • Negation : [^X] matches anything but X • [^A-Z] matches anything but A-Z • . matches anything • [a] is equivalent to a 8

  9. Building regexs II: Abbreviations • \d matches any decimal digit [0-9] • \D matches any non-digit [^0-9] • Equivalent syntax for: – whitespace ( \s and \S ) – alphanumeric ( \w and \W ) 9

  10. Building regexps II: Quantifiers • * matches none or any number of times – E.g. ca*t matches: ct, cat, caat, caaat, caaaat, ... • + matches one or any number of times – E.g. ca+t matches cat, caat, caaat, caaaat, ... • ? matches none or once – E.g. bio-?info matches bioinfo and bio-info • { n } matches a specific number of times • { n,m } matches from n (min) to m (max) times – E.g. ab{1,3}c will match abc, abbc, abbbc 10

  11. Using Regular Expressions • Compile a regular expression object (pattern) using re.compile • pattern has a number of methods – match (in case of success returns a Match object, otherwise None, matches only at the beginning !) – search (scans through whole string looking for a match) – findall (returns a list of all matches) A matches >>> import re >>> pattern = re.compile('[ACGT]') >>> if pattern.match(“A"): print(“A matches") successful match Matched >>> if pattern.match("a"): print(“a matches") unsuccessful, returns None >>> by def. case sensitive >>> import re without compiling, short, >>> if re.match('[ACGT]‘, “A"): print("Matched") but less performant >>> Matched Matched 11

  12. Matching alternative strings • (this|that) matches "this" or "that" • ...and is equivalent to th(is|at) case unsensitive search pattern >>> pattern=re.compile("(this|that|other)", re.IGNORECASE) >>> pattern.search("Will match THIS") ## success <_sre.SRE_Match object at 0x00B52860> >>> pattern.search(“Also THat will be matched") ## success <_sre.SRE_Match object at 0x00B528A0> >>> pattern.search("Will not match ot-her") ## will return None >>> Python returns a description of the match object 12

  13. Word and string boundaries  ^ matches the start of a string  $ matches the end of a string  \b matches word boundaries "Escaping" special characters • Characters with special meaning: . ^ $ * + ? { [ ] \ | ( ) • \ is used to free or "escape" those characters from their special meaning • so \[ just matches the character " [ " – if not escaped, " [ " signifies the start of a character class, as in [ACGT] 13

  14. Substitutions/Match Retrieval • Regex can also be used to substitute patterns using re.sub Regex use without compiling >>> re.sub("(red|blue|green)", "colored", "blue socks and red shoes") 'colored socks and colored shoes' matches one or more digits The result, a list of 4 strings, >>> e,raw,frm,to = re.findall("\d+", \ is assigned to 4 variables "E-value: 4, \ Raw Bit Score: 165, \ \ allows multiple line commands Match position: 362-419") alternatively, construct multi-line >>> print(e, raw, frm, to) 4 165 362 419 strings using triple quotes """ …""" 14

  15. N-glycosylation site detector >>> protein=”\ MGMFFNLRSNIKKKAMDNGLSLPISRNGSSNNIKDKRSEHNSNSLKGKYRYQPRSTPSKFQLTVSITSLI\ IIAVLSLYLFISFLSGMGIGVSTQNGRSLLGSSKSSENYKTIDLEDEEYYDYDFEDIDPEVISKFDDGVQ\ HYLISQFGSEVLTPKDDEKYQRELNMLFDSTVEEYDLSNFEGAPNGLETRDHILLCIPLRNAADVLPLMF\ KHLMNLTYPHELIDLAFLVSDCSEGDTTLDALIAYSRHLQNGTLSQIFQEIDAVIDSQTKGTDKLYLKYM\ DEGYINRVHQAFSPPFHENYDKPFRSVQIFQKDFGQVIGQGFSDRHAVKVQGIRRKLMGRARNWLTANAL\ KPYHSWVYWRDADVELCPGSVIQDLMSKNYDVI” >>> regex = "N[^P][ST]" >>> for match in re.finditer(regex, protein): print(match.group(), match.span()) NGS (26, 29) NLT (214, 217) NGT (250, 253) re.finditer N[^P][ST] - the provides an iterator main regular over match-objects expression match.group and match.span print the actual matched string and the position-tuple. 15

  16. Another Example: [KHDAS]DEL 16 Courtesy of Chris Bystroff

  17. Another Example: Zinc finger motif Von Thomas Splettstoesser (www.scistyle.com) - self-made, based on PDB structure 17 1A1L, the open source molecular visualization tool PyMol and Cinema 4D, GFDL, https://commons.wikimedia.org/w/index.php?curid=3106866

  18. hydrophobic C\w{2,4}C\w{3}[LIVMFYWC]\w{8}H\w{3,5}H 18 Courtesy of Chris Bystroff

  19. Test your Regular Expressions www.pythex.org • Develop regular expressions • Test them on examples of your choice 2REG 9ins 1VSN 1osn 1a1b PDB IDs ^[1-9]\w{3}$ 19

  20. Functions 20

  21. Functions • Similar code is often needed in different places of a program • but copy/paste code is a bad idea! • need to separate those pieces of code and call them from different places • Separated code for a self-contained tasks is called a function • Examples of such tasks: – cleaning up a sequence (lowercase, strip newlines..) – reverse complementing a sequence 21

  22. Function Syntax def <functionname> (<arg1>, <arg2>, ...): <block> return <something> Syntax def sum_up_numbers (num1, num2): my_sum = num1 + num2 return my_sum Example 22

  23. Calling a function def sum_up_numbers (num1, num2): my_sum = num1 + num2 return my_sum Function Definition sum_up_numbers (1,5) 6 sum_up_numbers (num1=1,num2=5) 6 Function Calls 23

  24. Example: Largest number • Function to find the largest number in a list def find_max(aList): Function declaration max = aList.pop() for x in aList: Function body if x > max: max = x return max Function result numbers = [1, 5, 1, 12, 3, 4, 6] print("Maximum: %i” % find_max(numbers)) Function call Maximum: 12 24

  25. Lambda Functions 25

  26. Lambda Functions • Kind of anonymous functions • Similar to normal functions but... – ...not bound to a name – ...different syntax – ...can be assigned to variables, passed to functions – ...restricted to one expression/instruction def calc(x): return (x-3)*2 4 calc(5) Normal function definition calc1 = lambda x: (x-3)*2 4 calc1(5) 4 calc2=calc1 calc2(5) Lambda function 26

  27. Map, filter, and reduce • Lambda functions can be passed as arguments to functions • Powerful in combination with map, filter, and reduce map reduce (lambda_function, sequence) filter Function applied to each element ...of the given sequence Decides what to to with the result: map -> apply to each element, return modified list filter -> return list with element tested True reduce -> returns one element resulting from computation 27

  28. Examples map(lambda x: x*3, [1,2,3]) [3,6,9] filter(lambda x: x>=1.0, [1.2,0.5,0.7,1.3]) [1.2,1.3] filter(lambda x: x!=0, map(lambda x: x-2, [4,2,5])) [2,3] 2,0,3 reduce(lambda x,y: x+y, (1,2,3,4)) 10 x, y x, y x, y 1, 2 3 3, 3 6 3,4 28

  29. Summary • Regular expression as powerful tools to detect patterns • Allow matching of character classes, repetitions, alternatives, etc. • Learn the meaning of special characters . ^ $ * + ? { [ ] \ | ( ) • Python offers regexp functions in the re module – match, search, findall, finditer etc. • Regular expressions can be used to find motifs in sequences • Functions as way to separate self-contained tasks and to structure code • Lambda function with map, filter, and reduce for efficient list processing 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend