The GenABEL project for statistical genomics Yurii Aulchenko [ - - PowerPoint PPT Presentation

the genabel project for statistical genomics
SMART_READER_LITE
LIVE PREVIEW

The GenABEL project for statistical genomics Yurii Aulchenko [ - - PowerPoint PPT Presentation

The GenABEL project for statistical genomics Yurii Aulchenko [ YuriiA consulting (NL) | ICG SB RAS (RU) | CPHS UoE (UK) | @YuriiAulchenko ] for the GenABEL project contributors [ @GenAproj | www.GemABEL.org ] Outline Statistical genomics


slide-1
SLIDE 1

The GenABEL project for statistical genomics

Yurii Aulchenko

[ YuriiA consulting (NL) | ICG SB RAS (RU) | CPHS UoE (UK) | @YuriiAulchenko ]

for the GenABEL project contributors

[ @GenAproj | www.GemABEL.org ]

slide-2
SLIDE 2

Outline

  • Statistical genomics
  • A short history
  • Current state
  • Summary
slide-3
SLIDE 3

3

Why are we different? Why do certain people get a disease? What are the mechanisms underlying these differences? How genetic variation controls the phenotype?

slide-4
SLIDE 4

Statistical genomics

Sample 1 Feature 3

slide-5
SLIDE 5

Statistical genomics

Sample 1 Feature 3

Traits/ phenotypes Genotypes

?

slide-6
SLIDE 6

Genome-wide association scanning (GWAS)

Traits/ phenotypes Genotypes

?

lm(qt1 ~ rs10)

slide-7
SLIDE 7

Genome-wide association scanning (GWAS)

Traits/ phenotypes Genotypes

?

slide-8
SLIDE 8

Genome-wide association scanning (GWAS)

Traits/ phenotypes Genotypes

?

slide-9
SLIDE 9

Genome-wide association scanning (GWAS)

Traits/ phenotypes Genotypes

?

slide-10
SLIDE 10

Genome-wide association scanning (GWAS)

Traits/ phenotypes Genotypes

?

slide-11
SLIDE 11

Genome-wide association scanning (GWAS)

Traits/ phenotypes Genotypes

1,000-100,000 Few 100,000-40,000,000,000

?

slide-12
SLIDE 12

Scanning through “omics” space

Traits/ phenotypes Genotypes

1,000-100,000 100-100,000 100,000-40,000,000,000

?

slide-13
SLIDE 13

Statistical genomics: what is so special?

  • Rules governing genes &

experimental design: analysis methodology and results visualization

  • Technological inputs: data

formats, quality control, analysis methods

  • Analysis is

computationally challenging (and IO demanding)

slide-14
SLIDE 14

Analysis scenarios

  • Classic GWAS scenario
  • One trait – one genetic marker at a time
  • Correlations between phenotypes – mixed models
  • Emerging scenarios
  • One trait – multiple genetic markers
  • Multiple traits – single / multiple markers
slide-15
SLIDE 15

Outline

  • Statistical genomics
  • A short history
  • Current state
  • Summary
slide-16
SLIDE 16

A short history

2006 2007 2008 2009 2010 2011 2012 2013...

GenA Package Paper GenA

GenABEL package

slide-17
SLIDE 17

# GWAS publications

slide-18
SLIDE 18

# loci identified in GWAS

slide-19
SLIDE 19

A short history

2006 2007 2008 2009 2010 2011 2012 2013...

GenA ParallA ProbA GenA Package Paper

GenABEL package

DatA MixA ParallA ProbA MetA

GenABEL suite

slide-20
SLIDE 20

Turning point

slide-21
SLIDE 21

The GenABEL project

Mission: to provide a framework for development of

statistical genomics methodology

Vision: collaboration, transparency and free

exchange of code, ideas, and data is a key to agile and robust methodology development

Strategy: community-based and driven methodology

discussion, development, implementation, dissemination, maintenance, and application

slide-22
SLIDE 22

A short history

2006 2007 2008 2009 2010 2011 2012 2013...

ParallA DatA MixA ParallA ProbA MetA GenA ProbA GenA Package Paper

1000 posts

  • n forum

Open-source tutorial

GenABEL package

GenABEL suite

PredictA PredictA OmicA VariA VariA GenA GenA

GenABEL project

slide-23
SLIDE 23

Outline

  • Statistical genomics
  • A short history
  • Current state
  • Summary
slide-24
SLIDE 24

Infrastructure

GenABEL @ R-Froge forum.GenABEL.org www.GenABEL.org

slide-25
SLIDE 25

Project in numbers

Code of 9 packages Language # kLines

  • f code

R 19 C++ 19 C 17 Other 2 Rnw/Roxy 20 Estimated 12 man-years $1,500,000 Communications Devel-list >700 posts Forum >1000 posts Documentation Manuals >200 pages Tutorials >250 pages Videos ~10 min People Developers 15 (5) Forum 430 (71) Publications Total 7 (4) # citations >700 (>500)

slide-26
SLIDE 26

www.GenABEL.org

  • ~2,000 visits per month (~1,000 unique visitors)
  • Major traffjc from Europe (50%) and US (25%)
  • ~50% of traffjc generated by returning visitors
slide-27
SLIDE 27

GenABEL-package

Type of analysis # functions Data manipulations ~40 Quality control and descriptives ~10 Analysis ~30 Graphics & data presentation ~5 Total 391

Genome-wide analysis of association between directly typed SNPs and quantitative, binary and time-till-event outcomes Highlights:

  • Converters between

different data formats

  • Powerful QC organized

around the check.marker() function

  • A line of mixed-models

based tools for correction for population stratifjcation

slide-28
SLIDE 28

Other R-packages

GWAS analyses

  • VariABEL (5): tools for “environmental sensitivity” vGWAS
  • MixABEL (12): advanced mixed models for GWAS

Post-GWAS

  • MetABEL (7): meta-analysis of GWAS results
  • PredictABEL (111): assessment of (genetic) risk prediction

models

Support

  • DatABEL (72): out-of-RAM large matrices storage and access
  • ParallABEL (52): parallelization algorithms for GWAS
slide-29
SLIDE 29

Non-R packages

ProbABEL: GWAS of imputed data (quantitative, binary, time-till-event traits; regression and mixed models) Filevector: C++ base for the DatABEL-package, facilitating out-of-core computations on large matrices OmicABEL: rapid mixed-model based GWAS especially for multiple trait ("omics") analysis.

slide-30
SLIDE 30

Outline

  • Statistical genomics
  • A short history
  • Current state
  • Summary
slide-31
SLIDE 31

Summary

  • GenABEL is problem-centered project aiming towards

agile development of statistical genomics methodology

  • The GenABEL suite consist of 9 packages

implementing close to 1,000 functions facilitating analyses of polymorphic genomes

  • GenABEL suite is widely used for GWAS analyses of

human, farm, pet animal, and plant data

  • The project runs on enthusiasm and spare time of

several people (and $10 a month from “YuriiA consulting”)

slide-32
SLIDE 32

Difficulties we face

Core functionality

  • The project would gain from re-design and added

“core” functionality (e.g. regarding access to different data formats; parallelization)

  • This gain is on the project, and not individual

developer's level

Coordination and communication

  • Coordination takes time
  • It may take a while before problems well-known for

developers get through to the end-user (anyone willing to become our PR offjcer?)

slide-33
SLIDE 33

Vacant roles

  • Project Coordinator
  • Lead Developer(s)
  • Public Relations Offjcer
slide-34
SLIDE 34

Current support for the project

Your logo could have been here

slide-35
SLIDE 35

Acknowledgements

CRAN

slide-36
SLIDE 36

Key people

Lennart Karssen, Nicola Pirastu, Maria Gonik

Yurii Lennart Nicola Maria Others 50 100 150 200 250 300 350 400

Dev-list

Yurii Nicola Lennart Maria Others 100 200 300 400 500 600

Forum (431/70) members Dev-list (45/12) members