GeneQC Statistical Model General Idea Reads can be mapped to - - PowerPoint PPT Presentation

geneqc statistical model general idea
SMART_READER_LITE
LIVE PREVIEW

GeneQC Statistical Model General Idea Reads can be mapped to - - PowerPoint PPT Presentation

GeneQC Statistical Model General Idea Reads can be mapped to multiple gene loci Leads to varying degrees of mapping uncertainty Potentially causes issues with inferences based on read counts Differentially expressed genes


slide-1
SLIDE 1

GeneQC Statistical Model

slide-2
SLIDE 2

General Idea

  • Reads can be mapped to

multiple gene loci

  • Leads to varying degrees
  • f mapping uncertainty
  • Potentially causes issues

with inferences based on read counts

  • Differentially expressed

genes

  • Co-expression patterns
  • Various network analyses
slide-3
SLIDE 3

Options

  • Exclude ambiguous reads
  • Multiple assignment
  • Random assignment
  • Probabilistic assignment
  • Only considering local

information

slide-4
SLIDE 4

Co-expressed Genes

  • Co-expressed genes

provided additional level

  • f information
  • Global data for more solid

statistical evaluation

slide-5
SLIDE 5

Goal

  • Create statistically sound model for assignment of ambiguous

reads

  • Use co-expression of genes
  • Develop method that produces p-value or probability score for

each ambiguous read assignment

  • Provide a p-value signifying the confidence of each gene’s read

count

slide-6
SLIDE 6

Previous Publications

  • Faulkner, G.J., et al., A rescue strategy for multimapping short

sequence tags refines surveys of transcriptional activity by CAGE. Genomics, 2008. 91(3): p. 281-288.

  • Hashimoto, T

., et al., Probabilistic resolution of multi-mapping reads in massively parallel sequencing data using MuMRescueLite. Bioinformatics, 2009. 25(19): p. 2613-2614.

  • Wang, J., Huda, A., Lunyak, V. V., & Jordan, I. K., A Gibbs

sampling strategy applied to the mapping of ambiguous short- sequence tags. Bioinformatics, 2010. 26(20): p.2501-2508

slide-7
SLIDE 7

Overall Direction

  • Assign all unambiguous reads
  • Use co-expression information of unambiguous reads to make first

probabilistic assignment of ambiguous reads

  • Based on assignments, recalculate probabilities for ambiguous

reads

  • Continue iterative procedure until no/minimal changes occur
slide-8
SLIDE 8

Additional parameters

  • Similarity between a given read and each potential gene locus
  • Differences generally very minute
  • Co-expression rate between genes and co-expressed genes
slide-9
SLIDE 9

Concerns & Limitations

  • Requires accurate co-expression information
  • Limited sample size of co-expression information could skew

probability distribution

  • Potentially highly computationally intensive
  • Local optimization may occur
  • Does not currently consider dependence of read assignment
slide-10
SLIDE 10

Our Future Plans

  • Collect test data to verify increased performance using statistical

model

  • Run model with various validated probability assumptions
  • Normal, Poisson, etc.
  • Develop R package with statistical model implementation