perl for pipeline part i
play

Perl for Pipeline Part I L1110@BUMC 9/18/2018 2-4pm Yun Shen, - PowerPoint PPT Presentation

Perl for Pipeline Part I L1110@BUMC 9/18/2018 2-4pm Yun Shen, Programmer Analyst yshen16@bu.edu Fall 2018 IS&T Research Computing Services Tutorial Resource Before we start, please take a note - all the code scripts and supporting


  1. Perl for Pipeline Part I L1110@BUMC 9/18/2018 2-4pm Yun Shen, Programmer Analyst yshen16@bu.edu Fall 2018 IS&T Research Computing Services

  2. Tutorial Resource Before we start, please take a note - all the code scripts and supporting documents are accessible through: http://rcs.bu.edu/examples/perl/tutorials/ • Yun Shen, Programmer Analyst yshen16@bu.edu Fall 2018 IS&T Research Computing Services

  3. Sign In Sheet We prepared sign-in sheet for each one to sign We do this for internal management and quality control So please SIGN IN if you haven’t done so Yun Shen, Programmer Analyst yshen16@bu.edu Fall 2018 IS&T Research Computing Services

  4. Research Computing Services (RCS) RCS is a group within Information Services & Technology (IS&T) at Boston University • provides computing, storage, and visualization resources and services to support research that has specialized or highly intensive computation, storage, bandwidth, or graphics requirements. Three Primary Services: • 1. Research Computation 2. Research Visualization 3. Research Consulting and Training More Info: http://www.bu.edu/tech/about/research/ • Yun Shen, Programmer Analyst yshen16@bu.edu Fall 2018 IS&T Research Computing Services

  5. Research Computing Services (RCS) Tutorials RCS offers three times a year tutorials Spring – in January/Feburary • Summer – in May/June • Fall – in September/October • This tutorial is part I of a set (Part II come Thursday) Yun Shen, Programmer Analyst yshen16@bu.edu Fall 2018 IS&T Research Computing Services

  6. About Me long time programmer, dated back in 1987 • Proficient in C/C++/Perl • Domain knowledge: Software Design, • Network/Communication, Databases, Bioinformatics, System Integration. Contact: yshen16@bu.edu, 617-638-5851 • Main Office: 801 Mass Ave. 4 th Floor (Crosstown Building) • Yun Shen, Programmer Analyst yshen16@bu.edu Fall 2018 IS&T Research Computing Services

  7. Tell Me A bit about You Name • Experience in programming? If so, which specific lauguage? • Self rating? Experience in Perl? • Account on SCC? • Motivation (Expectation) to attend this tutorial • Any other questions/fun facts you would like the class to • know? Yun Shen, Programmer Analyst yshen16@bu.edu Fall 2018 IS&T Research Computing Services

  8. Evaluation One last piece of information before we start: DON’T FORGET TO GO TO: • http://rcs.bu.edu/survey/tutorial_evaluation.html • Leave your feedback for this tutorial (both good and bad as long as it is honest are welcome. Thank you) Yun Shen, Programmer Analyst yshen16@bu.edu Fall 2018 IS&T Research Computing Services

  9. Topics for today HuRI - A Bioinformatical Pipeline Example Get Back to Fundamentals Perl Environment Using Perl Code Examples Advanced Features Packages, Modules and Oject-Oriented(OO) Methodology Perl Regular Expression Debugger Yun Shen, Programmer Analyst yshen16@bu.edu Fall 2018 IS&T Research Computing Services

  10. HuRI – A Real Bioinformatical Pipeline Example Yun Shen, Programmer Analyst yshen16@bu.edu Fall 2018 IS&T Research Computing Services

  11. HuRI – Human Reference Interactome Map Project Summary: map high-quality binary protein-protein interactions (PPIs) is based on using yeast two-hybrid (Y2H) as the primary screening method followed by validation of subsets of PPIs in multiple orthogonal assays for binary PPI detection. Three Stages: HI-I-05: space of ~7,000 human genes, ~2,700 PPIs HI-II-14: space of ~13,000 human genes , ~14,000 PPIs HI-III: space of ~ 18,000 human genes, ~50,000+ PPIs up to 2015 For more information, go to http://interactome.baderlab.org/ Yun Shen, Programmer Analyst yshen16@bu.edu Fall 2018 IS&T Research Computing Services

  12. HuRI – Human Reference Interactome Map The HI-III space is huge, AD 18k x DB 18k = ~320m binary pairs Each Plate contain 12x8=96 wells So if we do the problem in the linear way: 1 DB x 1 AD/well How many plates do we need to screen: 320m/94 = ~3.4m (plates) If each technician can perform 100 PCR plates every day: 3.4m/100 = 34k/pp/day # this is just unthinkable huge amount of work to do !!! So what would be the solution to tackle this? Yun Shen, Programmer Analyst yshen16@bu.edu Fall 2018 IS&T Research Computing Services

  13. HuRI – Human Reference Interactome Map We came up with some brilliant idea – 1) ’divide and conquer ‘ divided entire space to 9 AD groups and 9 DB groups, that gives 9 x 9 = 81 matrices each matrix: 2k (AD) x 2k (DB) = 4m binary pairs # still a lot plates 2) SWIMseq – attach Short Well Index tag to each PCR primer It’s basically a multiplexing technique, allowing pooling many ADs and DBs into one well we designed 12 sets of AD and DB Well index tags ; each set contains 96 AD index and 96 DB index tags intended to use different sets for different screen/retest sequencing experiments. Yun Shen, Programmer Analyst yshen16@bu.edu Fall 2018 IS&T Research Computing Services

  14. HuRI – Human Reference Interactome Map Now let’s see how many plates do we need to do – 1) ’divide and conquer ‘ divided entire space to 9 AD groups and 9 DB groups, that gives 9 x 9 = 81 matrices each matrix: 2k (AD) x 2k (DB) = 4m binary pairs # still a lot plates pool ADs -> 2k/96 ~ 20 AD plates pool DBs -> 2k/96 ~ 20 DB plates mate 20 AD x 1 DB= 20 plates mate 1 AD x 20 DB = 20 plates colony pick -> much less (usually only ~5 plates for each screen for each matrix) # this is a lot tacklable !!! 81 matrices will need ~40x81 = 3240 plates # this is just one screen Yun Shen, Programmer Analyst yshen16@bu.edu Fall 2018 IS&T Research Computing Services

  15. HuRI – Human Reference Interactome Map Nevertheless, the Project Scope: Total sequence batches: 35 Total PCR plates processed: 6528 Total Read count: ~1.3x10 9 Total Sequence File Size: ~3.5x10 11 ( 350GB up to 06/2015) With each plate be the result of colony pick of PCR product of thousands of AD and DB mating Yun Shen, Programmer Analyst yshen16@bu.edu Fall 2018 IS&T Research Computing Services

  16. HuRI – Human Reference Interactome Map The design sounds very attractive, what would be the computational challenge? Challenge 1: experiment design will be a lot complicated: a. Much complicated bookkeeping work for the technicians: well index tag application, plate labeling, etc. b. ORF collection needs to be grouped in a way that no paralogs be put into same group; c. Experiment clone cherrypicking algorithm has to adapt the change to pick from different group; also it must avoid putting paralogs from different group into same plate Yun Shen, Programmer Analyst yshen16@bu.edu Fall 2018 IS&T Research Computing Services

  17. HuRI – Human Reference Interactome Map Challenge 2: Sequencing analysis would be a lot more complicated: - the program has to be able to extract the right ORF group information through the well-tag mapping information (kind of de-multiplexing work) - a lot of more coordination between dry and wet lab (obtain/use/store/retrieve the experiment information) - more detail-oriented data storage and maintenance - … Yun Shen, Programmer Analyst yshen16@bu.edu Fall 2018 IS&T Research Computing Services

  18. HuRI – Human Reference Interactome Map Y2H screen Sequence NGS Report PCR plates Analysis plate content plate layout Preprocess Batch name Present Reference Align Sequence result in Sequences Identify IST . excel, pdf, QC . text, etc Packaging . The image part with relationship ID rId3 was not found in the file. Yun Shen, Programmer Analyst yshen16@bu.edu IS&T Research Computing Services

  19. HuRI – Human Reference Interactome Map ( source: https://www.ncbi.nlm.nih.gov/pubmed/16189514 ) Yun Shen, Programmer Analyst yshen16@bu.edu Fall 2018 IS&T Research Computing Services

  20. HuRI – Human Reference Interactome Map Yun Shen, Programmer Analyst yshen16@bu.edu Fall 2018 IS&T Research Computing Services

  21. HuRI – Human Reference Interactome Map Yun Shen, Programmer Analyst yshen16@bu.edu Fall 2018 IS&T Research Computing Services

  22. HuRI – Human Reference Interactome Map Yun Shen, Programmer Analyst yshen16@bu.edu Fall 2018 IS&T Research Computing Services

  23. HuRI – Human Reference Interactome Map Output- Summary : Yun Shen, Programmer Analyst yshen16@bu.edu Fall 2018 IS&T Research Computing Services

  24. HuRI – Human Reference Interactome Map Output- Detail : Yun Shen, Programmer Analyst yshen16@bu.edu Fall 2018 IS&T Research Computing Services

  25. So how do we achieve this ?? Yun Shen, Programmer Analyst yshen16@bu.edu Fall 2018 IS&T Research Computing Services

  26. Pipeline code: Huri_pipeline.pl Yun Shen, Programmer Analyst yshen16@bu.edu Fall 2018 IS&T Research Computing Services

  27. Well, we use Perl Script to write the entire pipeline. We will come back later Yun Shen, Programmer Analyst yshen16@bu.edu Fall 2018 IS&T Research Computing Services

  28. Perl Language Fundamentals Yun Shen, Programmer Analyst yshen16@bu.edu Fall 2018 IS&T Research Computing Services

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend