using disco and mapreduce to study mrna complexity
play

Using Disco and MapReduce to study mRNA complexity Dan Williams - PowerPoint PPT Presentation

Using Disco and MapReduce to study mRNA complexity Dan Williams SciPy 2011 Lightning Talk 7/14/2011 | Life Technologies Proprietary & Confidential | 1 Disco MapReduce framework written in Python and Erlang useful for dealing with


  1. Using Disco and MapReduce to study mRNA complexity Dan Williams SciPy 2011 Lightning Talk 7/14/2011 | Life Technologies Proprietary & Confidential | 1

  2. Disco • MapReduce framework written in Python and Erlang − useful for dealing with massive data • Users specify map and reduce operations as Python functions, then chain them together to get stuff done 7/14/2011 | Life Technologies Proprietary & Confidential | 2

  3. mRNA molecules contain three distinct regions: AAATGACGACAACGGTGAGGGTTCTCGGGCGGGGCCTGGGACAGGCAGCTCCGGGGTCCGCGGTTTCACATCGGAAACAAAACAGCGG CTGGTCTGGAAGGAACCTGAGCTACGAGCCGCGGCGGCAGCGGGGCGGCGGGGAAGCGTATACCTAATCTGGGAGCCTGCAAGTGACA ACAGCCTTTGCGGTCCTTAGACAGCTTGGCCTGGAGGAGAACACATGAAAGAAAGAACCTCAAGAGGCTTTGTTTTCTGTGAAACAGT ATTTCTATACAGTTGCTCCAATGACAGAGTTACCTGCACCGTTGTCCTACTTCCAGAATGCACAGATGTCTGAGGACAACCACCTGAG CAATACTGTACGTAGCCAGAATGACAATAGAGAACGGCAGGAGCACAACGACAGACGGAGCCTTGGCCACCCTGAGCCATTATCTAAT GGACGACCCCAGGGTAACTCCCGGCAGGTGGTGGAGCAAGATGAGGAAGAAGATGAGGAGCTGACATTGAAATATGGCGCCAAGCATG TGATCATGCTCTTTGTCCCTGTGACTCTCTGCATGGTGGTGGTCGTGGCTACCATTAAGTCAGTCAGCTTTTATACCCGGAAGGATGG GCAGCTAATCTATACCCCATTCACAGAAGATACCGAGACTGTGGGCCAGAGAGCCCTGCACTCAATTCTGAATGCTGCCATCATGATC AGTGTCATTGTTGTCATGACTATCCTCCTGGTGGTTCTGTATAAATACAGGTGCTATAAGGTCATCCATGCCTGGCTTATTATATCAT CTCTATTGTTGCTGTTCTTTTTTTCATTCATTTACTTGGGGGAAGTGTTTAAAACCTATAACGTTGCTGTGGACTACATTACTGTTGC ACTCCTGATCTGGAATTTTGGTGTGGTGGGAATGATTTCCATTCACTGGAAAGGTCCACTTCGACTCCAGCAGGCATATCTCATTATG ATTAGTGCCCTCATGGCCCTGGTGTTTATCAAGTACCTCCCTGAATGGACTGCGTGGCTCATCTTGGCTGTGATTTCAGTATATGATT TAGTGGCTGTTTTGTGTCCGAAAGGTCCACTTCGTATGCTGGTTGAAACAGCTCAGGAGAGAAATGAAACGCTTTTTCCAGCTCTCAT TTACTCCTCAACAATGGTGTGGTTGGTGAATATGGCAGAAGGAGACCCGGAAGCTCAAAGGAGAGTATCCAAAAATTCCAAGTATAAT GCAGAAAGCACAGAAAGGGAGTCACAAGACACTGTTGCAGAGAATGATGATGGCGGGTTCAGTGAGGAATGGGAAGCCCAGAGGGACA GTCATCTAGGGCCTCATCGCTCTACACCTGAGTCACGAGCTGCTGTCCAGGAACTTTCCAGCAGTATCCTCGCTGGTGAAGACCCAGA GGAAAGGGGAGTAAAACTTGGATTGGGAGATTTCATTTTCTACAGTGTTCTGGTTGGTAAAGCCTCAGCAACAGCCAGTGGAGACTGG AACACAACCATAGCCTGTTTCGTAGCCATATTAATTGGTTTGTGCCTTACATTATTACTCCTTGCCATTTTCAAGAAAGCATTGCCAG CTCTTCCAATCTCCATCACCTTTGGG Research question: Do the three mRNA regions generally differ in information content? 7/14/2011 | Life Technologies Proprietary & Confidential | 3

  4. Method: Calculate the Shannon entropy of each 21- nucleotide segment of each mRNA from a well-known database. Group results by region and compare. MapReduce with Disco speeds the computation (across ~30k mRNA sequences) 7/14/2011 | Life Technologies Proprietary & Confidential | 4

  5. Map 21-mer segments and regions to 1 Reduce to remove duplicates Reduce to get a boxplot for each region Map Shannon entropy of 21-mer segment to region 7/14/2011 | Life Technologies Proprietary & Confidential | 5

  6. 7/14/2011 | Life Technologies Proprietary & Confidential | 6

  7. Thank you! 7/14/2011 | Life Technologies Proprietary & Confidential | 7

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend