machine vision and computation to describe genome function at the - - PowerPoint PPT Presentation

machine vision and computation to
SMART_READER_LITE
LIVE PREVIEW

machine vision and computation to describe genome function at the - - PowerPoint PPT Presentation

How do genomes function? Using machine vision and computation to describe genome function at the organismal level. Tessa Durham Brooks, Ph.D. Doane College Department of Biology Anticipation at the dawn of the Genomic Era Within the next


slide-1
SLIDE 1

How do genomes function? Using machine vision and computation to describe genome function at the

  • rganismal level.

Tessa Durham Brooks, Ph.D. Doane College Department of Biology

slide-2
SLIDE 2

Anticipation at the dawn of the Genomic Era

“Within the next few years, technologies developed for the Human Genome Project and similar sequencing efforts will revolutionize medicine, agriculture, crimefighting, and other fields.” – Gwynne and Page, Science, 2000

slide-3
SLIDE 3

The genomic powerhouses

~20,000 genes 97 mil base pairs Sequence finished 1998 ~25,000 genes 100 mil base pairs Sequenced finished 2000 ~22,000 genes 137 mil base pairs Sequence finished 2000 For reference: Humans have about 25,000 genes, 3.2 bil base pairs.

slide-4
SLIDE 4

The problem: at best functional roles have been assigned for 15% of predicted genes of a genome

For reference: Humans have about 25,000 genes, 3.2 bil base pairs. ~20,000 genes 97 mil base pairs Sequence finished 1998 ~25,000 genes 100 mil base pairs Sequenced finished 2000 ~22,000 genes 137 mil base pairs Sequence finished 2000

slide-5
SLIDE 5

Why has determining gene function in multicellular organisms been difficult?

slide-6
SLIDE 6

Why has determining gene function in multicellular organisms been difficult?

slide-7
SLIDE 7

The task seems likely to change the nature of biological research, requiring teams of engineers, mathematicians, nanotechnologists and computer programmers, and farms of computers if not a national computer grid.

  • NYT 2001
  • Collins, 2001

At a conference this month …, biologists tried to explore how the study of genomes might develop over the next 20 years and what tools might be needed. Central to their vision of the future is a thorough computerization

  • f biology, made necessary by the vast computing power
  • f the genome itself. - NYT 2001
slide-8
SLIDE 8

Our goal: Describe how genomes function at the organismal scale.

Requirements:

  • Observations should be made at

sufficiently high spatial and temporal resolution

  • Methods should be relatively high-

throughput to allow genomic survey

  • Observations should be able to be

made over time and in many environmental contexts

~25,000 genes

slide-9
SLIDE 9

Root gravitropism: a model for image analysis approaches in functional genomics

slide-10
SLIDE 10

Root gravitropism: a model for image analysis approaches in functional genomics

slide-11
SLIDE 11

Root gravitropism: a model for image analysis approaches in functional genomics

2 4 6 8 10 20 40 60 80 100 glr3.3-1 glr3.3-2 SalkCol Tip Angle (deg) Time (h)

First Order (swing rate)

Scale Scale

Second Order (acceleration)

Time (h)

glr3.3-1 vs wt glr3.3-2 vs wt Miller, Durham Brooks, and Spalding 2010, Genetics

slide-12
SLIDE 12

Developing tools to detect genome function at the organismal scale.

Requirements: Observations should be made at sufficiently high spatial and temporal resolution

  • Methods should be relatively high-

throughput to allow genomic survey

  • Observations should be able to be

made over time and in many environmental contexts

~25,000 genes

slide-13
SLIDE 13

Automation and High Throughput ver. 1

slide-14
SLIDE 14

Automation and High Throughput ver. 2

Doane Phytomorph

slide-15
SLIDE 15
  • Recombinant inbred lines (RILs)
  • Ecotypes (e.g. 1001 genomes project)

High throughput genetic stocks

Matthieu Reymond Max-Planck Institute S T S T QTL Analysis

slide-16
SLIDE 16

Developing tools to detect genome function at the organismal scale.

Requirements: Observations should be made at sufficiently high spatial and temporal resolution Method should be relatively high- throughput to allow genomic survey

  • Observations should be able to be

made over time and in many environmental contexts

~25,000 genes

slide-17
SLIDE 17

Phenotypes are plastic

Durham Brooks, Miller, and Spalding 2010, Plant Physiology

One ecotype

slide-18
SLIDE 18

Position (cM) Time (minutes)

LOD

Genomics analysis in a multi- dimensional condition space

Seed Size Small Large Seedling Age 2d 164 lines X 15 indiv. 3d 4d

Moore, et al., unpublished result

slide-19
SLIDE 19

Developing tools to detect genome function at the organismal scale.

Requirements: Observations should be made at sufficiently high resolution Method should be relatively high- throughput to allow genomic survey Observations should be able to be made over time and in many environmental contexts

  • Cyberinfrastructure must facilitate the

above

~25,000 genes

slide-20
SLIDE 20

Workflow

Data Capture Data Capture

Data Storage (30 TB) Data Compression and Analysis Feature Extraction Data Storage (X TB) Schorr Center and OSG

0.5 TB/day

Data Compression QTL analysis

slide-21
SLIDE 21

Data Compression

Scan Root Response

  • Time Series of 220

uncompressed TIF’s

  • ~225 MB each

Image Compression (OSG)

  • Time series of

lossless -compressed PNG’s

  • ~195 MB each

Auto Crop & Time Stamp (OSG)

  • Equalize dimensions
  • Insert the time stamp into the

least significant bits of the first 14 pixels of each image

Video Compression (Local Grid)

  • Compress to video

using FFV1 codec

  • Uses a lossless

intraframe codec

  • ~160 MB/frame

Ship out to UW

slide-22
SLIDE 22

Feature Extraction

Compressed Data

  • Decompress Data
  • Currently using

FFV1 codec

Root tip Identification

  • Isolate and track

root tip

  • Currently image

curvature features are used

Machine Learning

  • Isolate plant’s Arial and

ground tissue from image background

  • Currently Bayesian and SVM

methods work well

Tip angle Measurement

  • Linear regression on the root’s

meristematic tissue

Ship out to Doane

slide-23
SLIDE 23

QTL Analysis - Association of a phenotypic

value (e.g. root tip angle) with a genetic element

Compile Tip Angle Data

  • Find mean tip angle

at each time point for each genetic line

QTL Detection (Local Grid)

  • Choose detection method
  • Optimize maximum

likelihoods of trait data

  • Determine significance (LOD)

Determine Significance Threshold (OSG)

  • Permutation testing
  • Haley Knott or Multiple

imputation (0.24 vs 63 CPU years per time point)

  • Determine max likelihood of

randomized data

Additional analysis

  • Interactions, additive effects
  • Plasticity QTL
  • Conditional QTL (GxE effects)

Ship out to Doane Ship out to Doane

  • Launch analysis

locally

  • Bleed out to OSG
slide-24
SLIDE 24

Progress and Future Directions

  • One small college has collected over 14,500

individual root gravitropic responses in six conditions (32 TB) in RIL population in 6 mo.

  • We will finish collection from NILs (near-isogenic

lines) - an additional 8,700 individuals, 19 TB in 3 mo.

  • Begin image analysis and QTL analysis – dataset
  • pens new doors in visualizing genomes
  • Expand participation to additional institutions (huge

potential in scaling of data collection)

slide-25
SLIDE 25

Acknowledgments

Doane College

Mike Carpenter (CIO), David Andersen, Dan Schneider

  • Dr. Chris Wentworth (Physics)

Students: Amy Craig and Brad Higgins (Physics), Autumn

Longo and Grant Dewey (Biochemistry), Tracy Guy, Miles Mayer, Halie Smith, Anthony Bieck, Sarah Merithew, Devon Niewohner, Muijj Ghani, Julie Wurdeman (Biology)

University of Wisconsin

  • Dr. Edgar Spalding’s Laboratory: Dr. Nathan Miller,

Candace Moore, Logan Johnson

  • Dr. Miron Livny (CHTC)

UNL – Schorr Center and HCC

Brian Bockelman and Dr. David Swanson

University of Florida

  • Dr. Mark Settles

Computing Resources Funding

PGRP - 1031416 EPSCoR - URE NE-INBRE