rembrandt
play

REMBRANDT : Building a robust translational research framework for - PowerPoint PPT Presentation

REMBRANDT : Building a robust translational research framework for brain tumor studies RE pository of M olecular BRA in N eoplasia D a T a Himanso Sahni Center for Bioinformatics, NCI SAIC Challenges Few therapeutic advances in the last 3


  1. REMBRANDT : Building a robust translational research framework for brain tumor studies RE pository of M olecular BRA in N eoplasia D a T a Himanso Sahni Center for Bioinformatics, NCI SAIC

  2. Challenges  Few therapeutic advances in the last 3 decades  Histopathological classifications for the heterogeneous group of tumors known as gliomas are broad and do not predict for therapeutic outcome or prognosis  Standard therapies generally have minimal effect on long term survival

  3. Rembrandt Knowledgebase Datawarehouse Expression array data SNPArray data Better understanding Better treatments Proteomics data Clinical data Concept Creation Concept Creation

  4. NCI’s GMDI Study Blood Tumor Plasma Tumor DNA RNA Proteins Core Punch

  5. Typical Rembrandt Usage Scenario  In brain tissue from patients diagnosed with the glioblastoma multiforme (GBM) subtype of Astrocytoma, which genes in the EGF signaling pathway are over or under expressed in cancerous versus normal tissue?  Is there a correlation between the expression and genomic (copy number) data collected from these patients?  How did EGFR up-regulation affect survival of patients within this study?  Of these groups of samples, which ones were obtained from patients that were males and were diagnosed between the ages of 25 and 40 yrs?

  6. Rembrandt’s Objectives  Must support translation research use cases:  Build an infrastructure that provides users with the ability to create complex translational queries  For Example:  Ability to AND /OR a Gene Expression query with a Copy Number query and then further nest this within a Clinical Results Query  Ability to further refine the results by applying a criteria to the subset of samples grouped by high order analysis  Ability to apply filters to the result set for user friendly analysis.

  7. Rembrandt’s Objectives (cont’d)  Allow users to view the results by easily pivoting between the various dimensions:  Grouped by Disease  Grouped by Patient / Sample  Grouped by Genes for Gene Expression or Cytogenic Location for Copy Number  View Associated Annotations  Time Course View (future)

  8. Gene Expression Search Use cases Search differential gene expression by Gene Name <<Uses>> <<Uses>> Search differential gene <<Extends >> Calculate fold change expression by fold change <<Uses>> <<Extends>> <<Uses>> <<Extends>> Search differential gene expression by chromosomal region Obtain gene information from Search RBT Affy Gene <<Uses>> cytoband location RBT_USER Expression Dataset <<Extends>> <<Uses>> <<Extends>> Obtain cytoband location Search differential gene form gene name expression by Probeset ID <<Uses>> <<Extends>> <<Uses>> Search differential gene expression by GO Terms <<Uses>> <<Uses>> <<Uses>> Search differential gene expression by Pathway name <<Uses>> Get Genes

  9. Rembrandt’s caBIG objectives  Aligns with NCI’s caBIG (cancer Biomedical Informatics Grid) principles:  Open source  Open access  Syntactic and Semantic interoperability  Federated access  Leverage NCICB and caBIG Infrastructure Components  caCORE Infrastructure (caBIO, EVS, caDSR)  caARRAY gene expression data repositories and analysis tools  C3D Clinical Informatics System  caBIG Infrastructure being delivered by caBIG workspaces See https://cabig.nci.nih.gov/

  10. Rembrandt Technical Objectives  Build a scalable high performance application  Tiered Architecture  Abstraction / Model View Controller  Support Strong Type Checking & Validations  “Fast” Queries  User Friendly Interface  Groundwork for a robust translational research framework

  11. Rembrandt Current Architecture Complex Query Graphical Plots Tabular Reports Builder User Interface text MicroArray text text text Other Clinical SNPArray caBIO Annotations Cache Manager Report Builder Query Builder Middle Tier Extract Transfer Load Processes Run Time Analysis Components Query Processing (Future) Object Relational Mapping caIntegrator

  12. Another Architecture Perspective JSPs Servlets Struts Domain Result Set Query Criteria Look Up Elements (XML/XSLT) Query Processor Cache Manager Result Set Processor (EHCHACHE) Apache’s Object Relational Bridge (OBJ) Rembrandt Study Data Warehouse (Star Schema)

  13. Query & Retrieval Objects : Support Strong Type Checking & Validations  Such as Query, View, Criteria, Domain Element objects  Abstracts presentation logic from the query helper objects  Provides the ability to nest cross domain queries (AND/OR)  Is strongly typed  Can validate itself

  14. Example: Criteria Objects cd criteria DomainElement de::CytobandDE  Criteria Object + CytobandDE(String) + setValue(Object) : void + getValueObject() : String + setValueObject(String) : void  Consist of DomainElements +cytoband Criteria RegionCriteria  Provide Generic Cross - cytoband: CytobandDE - chromNumber: ChromosomeNumberDE - start: BasePairPositionDE.StartPosition - end: BasePairPositionDE.EndPosition DomainElement - empty: boolean = true Domain Filters de::ChromosomeNumberDE + isValid() : boolean +chromNumber + ChromosomeNumberDE(String) + getCytoband() : CytobandDE  Each Criteria can validate + setValue(Object) : void + setCytoband(CytobandDE) : void + getValueObject() : String + getStart() : BasePairPositionDE.StartPosition + setValueObject(String) : void + setStart(BasePairPositionDE.StartPosition) : void + getEnd() : BasePairPositionDE.EndPosition itself + setEnd(BasePairPositionDE.EndPosition) : void + getChromNumber() : ChromosomeNumberDE + setChromNumber(ChromosomeNumberDE) : void  For e.g.: RegionCriteria +end +start  Consists of inner class inner class de:: de:: ChromosomeNumberDE, BasePairPositionDE:: BasePairPositionDE:: EndPosition StartPosition {leaf} {leaf} CytobandDE, + EndPosition(Integer) + StartPosition(Integer) BasePairPositionDEs for start DomainElement & end positions. de::BasePairPositionDE - positionType: String  Is used in both Gene + START_POSITION: String = "StartPosition" + END_POSITION: String = "StartPosition" Expression and Comparative - BasePairPositionDE(String, Integer) + getPositionType() : String + setValue(Object) : void Genomic domain queries + getValueObject() : Integer + setValueObject(Integer) : void

  15. Agnostication can result in Obfuscation…  Challenge: Making Rembrandt dB agnostic using a standard Object Relational Mapping (ORM) layer AND still create high performance queries.  Currently using Apache’s Object Relational Bridge (OJB) as the ORM layer .( http://db.apache.org/ojb/ )  All ORMs provide great abstraction but may not help produce the most efficient SQL.  Custom implementations or extending frameworks can become a maintenance nightmare.

  16. High Performance Query Processing  Multi-threaded Query Processing:  All queries are constructed and executed in parallel on separate threads from Java server side  Dimensional Result Set Processing  All result set dimensions are reconstituted in Java server side  For example:  The entire Chromosome 7 (1 and 15854551 bp)  Able to retrieve about 51,000 fact records plus all associated annotations and display results for all 51 samples in 20 sec.

  17. ) o e i r e u Q b u S s D I e b r ( M : = g e t u P l t i p l e s ) s ( p , s e i r e u Q e b o r p s e S x e c u t e u e b Q u e r i t i o r u S e t u c e x e ) ( s e i e Q e l e P r o b I u D s S u b Q b u L s y a r A ) S D I e b o r p , e e r r i e s ( p o i b e Q u e r r b i t p m a S e t u c e x e = : e e t e I D C r i ) S R e s u l t l Q e l y r e u q , s D I e n o l C a u P e r y ( a l r , o b e I D s n g e n e g ( e m a N s a l C D I e e e a I D S ) C l s G : = g e t n I ( g s e u l a V D I e n e G t e D r C r i t ) A a = y L i s t : p t s E t i r C D I r e t r o p e R G r t : S e l e c H : a n d l e r e i o a a H t c a F E G : r e l d n H a D : G e n e I C a r i t e r i r s d : Q n o i s e r p x E e n e G g e y d q u e r p n r o c e s i u r e n c o r P y r e u Q : r e l d a y x : G e n e E p H r Q u e r y n l l O a , s D I b o r P l a , j b t C r r ( r e p o t i e r I D C r l l l r u M t e g = : t s i L y a A n e I D s , e v n ) t ) r u n ( e d e g r e l d n a H y r e u Q t e ) H r Q u e r y a = n d l e r : ( R n ( a H t c e l e S ) y r e u q e e t s u l t S e : l = h a n d Multi-threaded Query Processing in Java

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend