Using Grid to Facilitate Using Grid to Facilitate Diseasome - - PowerPoint PPT Presentation

using grid to facilitate using grid to facilitate
SMART_READER_LITE
LIVE PREVIEW

Using Grid to Facilitate Using Grid to Facilitate Diseasome - - PowerPoint PPT Presentation

Using Grid to Facilitate Using Grid to Facilitate Diseasome Analysis from Taiwan Diseasome Analysis from Taiwan National Health Insurance National Health Insurance Research Database Research Database Yu- -Chuan (Jack) Li and Ming Chuan


slide-1
SLIDE 1

Using Grid to Facilitate Using Grid to Facilitate Diseasome Analysis from Taiwan Diseasome Analysis from Taiwan National Health Insurance National Health Insurance Research Database Research Database

Yu Yu-

  • Chuan (Jack) Li and Ming

Chuan (Jack) Li and Ming-

  • Chin Lin, Graduate

Chin Lin, Graduate Institute of Biomedical Informatics, Institute of Biomedical Informatics, Taipei Medical University, Taiwan Taipei Medical University, Taiwan

slide-2
SLIDE 2

Outline Outline

Introduction of NHIRD Introduction of NHIRD Frequency Distribution of Diseasome Frequency Distribution of Diseasome Comorbidity Analysis Comorbidity Analysis Conclusion Conclusion

slide-3
SLIDE 3

The National Health Insurance Research The National Health Insurance Research Database (NHIRD) Database (NHIRD) 10 years of data 10 years of data Coverage: about 99% residents in Taiwan Coverage: about 99% residents in Taiwan (23 million people from 530 hospitals and (23 million people from 530 hospitals and 17,000 clinics) 17,000 clinics) 360 million outpatient visits / year 360 million outpatient visits / year 25 million inpatient 25 million inpatient-

  • day / year

day / year

slide-4
SLIDE 4

NHIRD NHIRD

The NHIRD is opened for research by The NHIRD is opened for research by application application The The NHIRD consists of claim records with NHIRD consists of claim records with numbers and text numbers and text Demographics, Diagnoses Demographics, Diagnoses (ICD 9

(ICD 9-

  • CM 2001

CM 2001 version) version) , Medications, Procedures, Exams

, Medications, Procedures, Exams and Costs data and Costs data Raw data size : 200GB / year Raw data size : 200GB / year

slide-5
SLIDE 5

Frequency of Visits Frequency of Visits

Analyze database by patient visits Analyze database by patient visits

  • Frequency data over time (X

Frequency data over time (X-

  • axis) and Age

axis) and Age (Y (Y-

  • axis)

axis)

  • Heatmap visualization

Heatmap visualization

Dermatophytosis of foot

slide-6
SLIDE 6

Frequency of Visits Frequency of Visits (cont.)

(cont.)

Analyze database by patient visits Analyze database by patient visits

  • Bottleneck

Bottleneck --

  • -> Disk I/O Speed

> Disk I/O Speed

  • Using 12 Apple Mac mini with external

Using 12 Apple Mac mini with external Firewire Hard Drive (400 Mbps) Firewire Hard Drive (400 Mbps)

  • Collective bandwidth on I/O:4.8

Collective bandwidth on I/O:4.8 Gbps Gbps

slide-7
SLIDE 7

Frequency of Visits Frequency of Visits (cont.)

(cont.) May Feb Mar Apr Jan June Nov Aug Sep Oct Jul Dec Result DB WWW Grid (Globus) Send grid commend

slide-8
SLIDE 8

Frequency of Visits Frequency of Visits (cont.)

(cont.)

Big Vs. mini Big Vs. mini

Mild CPU Mild CPU Low I/O speed Low I/O speed Cheap Cheap Low maintain Low maintain fee fee mini mini Expensive Expensive Hard to upgrade Hard to upgrade Strong CPU Strong CPU Strong I/O Strong I/O speed speed Big Big Cons Cons Pros Pros

slide-9
SLIDE 9

Frequency of Visits Frequency of Visits (cont.)

(cont.)

Difficulty on doing job on single machine Difficulty on doing job on single machine

  • Limitation of database size

Limitation of database size

Take very long time to generate index table Take very long time to generate index table

  • Limitation of scaling up

Limitation of scaling up

Hard to improve the performance Hard to improve the performance Performance Performance vs vs Price curve Price curve --

  • -> not linear

> not linear

slide-10
SLIDE 10

Disease Frequency Disease Frequency HeatMap HeatMap (NHIRD 2000) (NHIRD 2000)

slide-11
SLIDE 11

Taiwan NHIRD 2000 Taiwan NHIRD 2000-

  • 2002

2002

Influenza Erythema multiforme Lung Cancer

slide-12
SLIDE 12

Hepatitis B with coma male female

3-year seasonal change of “Cough”

slide-13
SLIDE 13

Influenza

slide-14
SLIDE 14

Hand foot and mouth disease Hand foot and mouth disease

slide-15
SLIDE 15

GIS distribution of GIS distribution of “ “Cough Cough” ”

slide-16
SLIDE 16

Cough Cough

??? QuickTime?和 唯TIFF (LZW)乾?縛︳? ?螃粟??畫蚓

slide-17
SLIDE 17

Cough Cough

slide-18
SLIDE 18

Retrospective study Retrospective study -

  • Comorbility

Comorbility analysis analysis

The limitation The limitation

  • Grouping all visit records by unique ID

Grouping all visit records by unique ID

  • Software memory limitation

Software memory limitation -

  • 2GB memory

2GB memory

25,015,172 25,015,172 655,867 655,867 752,353 752,353 2002 2002 645,846 645,846 644,650 644,650 2001 2001 525 525, ,646 646 571,099 571,099 2000 2000 Total transaction Total transaction record number record number (2000 (2000-

  • 2002)

2002) Feb Feb Jan Jan Essential Essential HYPERTENSION HYPERTENSION

slide-19
SLIDE 19

Disease Comorbidity analysis Disease Comorbidity analysis

For Comorbidity analysis For Comorbidity analysis

  • ID1{dis1,dis2,dis3,dis4

ID1{dis1,dis2,dis3,dis4… ….} .}

For example For example

  • 192305,M,HS10710973,01340,2001

192305,M,HS10710973,01340,2001-

  • 04

04-

  • 11,

11,4919|4659|4019|3534|4011|38022|4640|38 4919|4659|4019|3534|4011|38022|4640|38 04|4785|3004|7291|78059|01340|460|4660 04|4785|3004|7291|78059|01340|460|4660| |

  • 192505,F,KT71864585,01340,2002

192505,F,KT71864585,01340,2002-

  • 07

07-

  • 10,

10,01100|01340|29532|0113|0119 01100|01340|29532|0113|0119| |

slide-20
SLIDE 20

Bottleneck Bottleneck-

  • Grouping by ID

Grouping by ID

May Feb Mar Apr Jan June Nov Aug Sep Oct Jul Dec Result DB WWW Grid (Globus) Send grid commend Bottleneck Grouping

25015172 records

slide-21
SLIDE 21

Solution Solution-

  • Sorting and segmenting database for grid

Sorting and segmenting database for grid architecture architecture

1904 1901 1902 1903 1900 …. 1999 199619971998 1995 2000

Result DB WWW Grid (Globus) Send grid commend

Grouping Grouping Grouping Grouping Grouping Grouping No grouping needed

slide-22
SLIDE 22

Our experience Our experience

Divide NHIDB by month and year of Divide NHIDB by month and year of Birthdates Birthdates Divide NHIDB into 1,212 small databases Divide NHIDB into 1,212 small databases

  • 12 months * 101 years (from 1900 to

12 months * 101 years (from 1900 to 2000)=1,212 segments 2000)=1,212 segments

Easily scale up Easily scale up -

  • Linear acceleration

Linear acceleration Low machine specification requirement Low machine specification requirement

slide-23
SLIDE 23

Comorbidity Comorbidity

About 10 diagnoses per person in 3 years About 10 diagnoses per person in 3 years Clusters of comorbidity are being identified Clusters of comorbidity are being identified and pre and pre-

  • calculated

calculated 1TB of comorbidity data processed for 7 1TB of comorbidity data processed for 7 days under a 100 days under a 100-

  • PC grid

PC grid

slide-24
SLIDE 24

Endometriosis and Neoplasm of uncertain behavior of ovary Endometriosis and Neoplasm of uncertain behavior of ovary

Old Young

slide-25
SLIDE 25

Endometriosis

slide-26
SLIDE 26

Conclusion Conclusion

Linear improvement of performance is Linear improvement of performance is achievable if the data are properly achievable if the data are properly segmented segmented A A heatmap heatmap for visualization of frequency for visualization of frequency distribution over season and patient age is distribution over season and patient age is useful for huge data sets useful for huge data sets A geographical relationship of frequency A geographical relationship of frequency distribution can also be visualized distribution can also be visualized

slide-27
SLIDE 27

Conclusion Conclusion (cont.)

(cont.)

Comorbidity is one area that has great Comorbidity is one area that has great potential but very computation potential but very computation-

  • intensive

intensive Complete comorbidity data can be crossed Complete comorbidity data can be crossed with genome, with genome, haplome haplome and and bibliome bibliome data data to achieve greater utility to achieve greater utility

slide-28
SLIDE 28

Thank you Thank you