Short tandem Repeat (STR) Simple sequence repeat Located in - - PDF document

short tandem repeat str
SMART_READER_LITE
LIVE PREVIEW

Short tandem Repeat (STR) Simple sequence repeat Located in - - PDF document

In Seok Yang Dept. of Forensic Medicine Yonsei University College of Medicine Short tandem Repeat (STR) Simple sequence repeat Located in non-coding region of genome Variable among human individuals Applied in human identification 1


slide-1
SLIDE 1

1

In Seok Yang

  • Dept. of Forensic Medicine

Yonsei University College of Medicine

Short tandem Repeat (STR)

— Simple sequence repeat — Located in non-coding region of genome — Variable among human individuals — Applied in human identification

slide-2
SLIDE 2

2

— This method has been used for genotyping of STR markers in forensic genetics for over a decade. — At the present, there are several commercially available multiplex PCR kit for CE-based STR genotyping. — Alleles of STR markers are separated based on their length and labeled fluorescence dye during CE.

Capillary electrophoresis (CE) based assay

3730xl DNA sequencer Sample profile

Limitation of CE-based assay on STR genotyping

— Use of limited number of STR loci to be measured simultaneously related to the number of fluorescence dyes and the maximum size of STR amplicons — Impossible to distinguish STR alleles with the same length, but with different sequences from each other.

Adopted from John M. Butler’s presentation at ISHI 2013

slide-3
SLIDE 3

3

Next generation sequencing (NGS)

— Recently NGS has been on the spotlight as an ultimate genotyping tool

to overcome the limitation of CE-based STR analysis in forensic field.

— STR profiling using NGS has become available along with advance of

bioinformatics software.

— Therefore, appropriate data analysis protocol may be required for STR

profiling using NGS.

— Preparation of STR amplicons and DNA libraries — NGS platform and sequencing data generation — NGS data analysis for STR genotyping

1.

Design of STR reference sequences for alignment

2.

Analysis of alignment output

3.

Determination of STR alleles

4.

Determination of repeat structure of target STR region

5.

Estimation of mixture ratio

6.

Estimation of male/female ratio

Outlines

slide-4
SLIDE 4

4

Preparation of STR amplicons and DNA libraries

Sample DNA Multiplex PCR Adapter ligation (3) Advantage Multiplexed PCR system previously developed for CE-based STR typing can be used for amplicon generation. (2) Library preparation 1. Use Rapid Library Prep Method (without neubilization) 2. Adaptors (with MIDs) are ligated non- directionally to PCR products (1) PCR amplicon preparation 1. 2800M and 9947A sample DNAs 2. Primers without fluorescence dye labeling that included in PowerPlex 16 HS system à Single source and 1:1 mixture samples were prepared.

Bioanalyzer profile of DNA libraries

< 2800M Single > < 9947A Single > < 1:1 mixture > Bioanalyzer profiles of DNA libraries were

  • btained after size selection using

AMPure beads for removal of fragments with less than 100 bp in length.

slide-5
SLIDE 5

5

NGS data generation

— NGS platform

— GS Junior

— NGS dataset

— 2800M — 9947A — 1:1 mixture

Average read length: 183.64 base pairs Read length (bp)

  • No. of reads

Size distribution of NGS reads

Total number of reads: 164,468

Sample name MID sequences No.of reads 2800M ACACGACGACT 51475 9947A ACACGTAGTAT 33213 1:1 Mixture ACGACACGTAT 76943 Unsorted 2837

slide-6
SLIDE 6

6

< Design of reference sequences > < Example of STR genotyping result >

NGS data analysis protocol in this study

Input sequence format FASTA or FASTQ Tested NGS platform 454 platform OS Linux and Windows systems Data analysis protocol

Bowtie 2 SAMtools BEDTools MS Excel

Build STR reference sequences Indexing and alignment Convert SAM into BAM Convert BAM into BED Determine alleles by counting coverage

MS Excel

#.bed file #.sam file #.sam file #.bam file #.bed file #.fasta file

NGS data

#.fasta file

  • r

#.fastq file

slide-7
SLIDE 7

7

Design of STR reference sequences

5’-flanking Region (500~550 bp) 3’ flanking Region (500~550 bp) STR region · · · D3S1358 8 9 10 11 17 18 19 20 FGA 12.2 13 13.2 14 32 32.2 50.2 51.2 · · · · · · CSF1PO 4 5 6 14 15 16 · · · 7

Alignment output (SAM file)

ID Direction STR & Allele Start CIGAR Sequence Quality Alignment score IG0Y3ZW01B25U7 16 D3S1358_17 521 3 10S84M1I15M1I20M11S * AACAGGAGGTCTTGCATGTATCTATCTGTCTGTCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATGANACGAGGGTCTTGCTCTGTCCACCCAGATTGGACTGCAGTAGTCGTCGTGT 37>??=8333455@BA@DEGHIGGCCCGIIIIIIIGGHIIIIIIIIIHGHIIIIIIIIIIGGGIIIHCCCHGGGGIHIHCB@BD@C89843!45182---5229::AA00//<:::IIIIIIIIIIIIIIIIIIIIIIIIII AS:i:101 XS:i:88 IG0Y3ZW01B99VT 16 D7S820_8 418 2 9S27M1D104M1I10M2D79M11S * ACCACGACTATGTTGGTCAGGCTGACTATGGAGTTATTTAAGGTTAATATATATAAAGGGTATGATAGAACACTTGTCATAGTTTAGAACGAACTAACGATAGATAGATAGATAGATAGATAGATAGATAGACAGATTGATTAGGTTTTTTATCTCACTAAATAGTCTATAGTAAACATTTAATTACCAATATTTGGTGCAATTCTGTCAATGAGGATAAATGTGGAATCAGTCGTCGTGT @@C>>>>FC<<<<<<<<9999999<:?96111121/----/??3333:>DED7722399>FFEEFIGCCCFIIIIIIIIIIIIIIIIIIIIIIIHAACIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIICBB:,,333------3:GIFFIIHHHIIIIIIIIIIIIIIIIIIIIIII?DDIIII999IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFFEE AS:i:193 XS:i:180 IG0Y3ZW01DKSYH D3S1358_17 505 5 11S135M11S * ACACGACGACTATGAAATCAACAGAGGCTTGCATGTATCTATCTGTCTGTCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATGAGACAGGGTCTTGCTCTGTCACCCAGATTGGACTGCAGTAGTCGTGGTGT IIIIIIIIIIIIIE777IIEE@IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFBBIIHHIIFFFFFA?C?DB:::>BIIIIIIIIIIIIIIIIIIDBBDFB AS:i:135 XS:i:122 IG0Y3ZW01BUANU 16 D5S818_12 475 6 17S107M35S * ACACCACGACTGGTGTATTTCCTCTTTGGTATCCTTATGTAATATTTTGAAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGAGGTATAAATAAGGATACAGATAAGATACAATGTTAGTAACTGTGGCTAGTCGTCGTGT IIIIIIIIIIIBBBI;B::<GGIIHHHIIIIIIIDDDFFDBB7E5566HIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIFBB<;111;9999>G@G>B8;66;>;3;<B8;11/1>BBBHHDABHIIIIIIIIIIFF AS:i:107 XS:i:94 IG0Y3ZW01B6RGS 16 D21S11_35 594 163M10S * CTGTCTGTCTGTCTATCTATCTATATCTATCTATCTATCATCTATCTATCCATATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCGTCTATCTATCCAGTCTATCTACCTCCTATTAGTCTGTCTCTGGAGAACATTGACTAATACAAGTCGTCGTGT ><ACCCCEEACIIIIIIIHIECCCC?@?CCCCCCCEEEIIIIIIIIIC@@@CIIIIIIIIIIIIIIIIIIIGGGIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII9744IIIIIIIIIIIIIIIIIIAA>8658ECCII@333<=HIIIIIIIEEEFF AS:i:163 XS:i:163 IG0Y3ZW01CDITH 16 D21S11_24 528 122M40S * AAGTGAATTGCCTTCTATCTATCTATCTATCTGTCTGTCTGTCTGTCTGTCTGTCTATCTATCTATATCTATCTATCTATCATCTATCTATCCATATCTATCTATCTATCTATCCATCTATCCATCCATCCTATGTATTTATCATCTGTCCAGTCGTCGTGT 22:42....63333<>BDDB><<<EDGGGDEACFIIIIIIIIIIIHHHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFFFIIIIIIIIIIIIIIIIIIFFFF AS:i:118 XS:i:118 IG0Y3ZW01CDITH 16 VWA_25 623 94S57M11S * AAGTGAATTGCCTTCTATCTATCTATCTATCTGTCTGTCTGTCTGTCTGTCTGTCTATCTATCTATATCTATCTATCTATCATCTATCTATCCATATCTATCTATCTATCTATCCATCTATCCATCCATCCTATGTATTTATCATCTGTCCAGTCGTCGTGT 22:42....63333<>BDDB><<<EDGGGDEACFIIIIIIIIIIIHHHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFFFIIIIIIIIIIIIIIIIIIFFFF AS:i:57 XS:i:56 IG0Y3ZW01AXWF9 4 * * * ACACGACGACTGAAAGGTCGAAGCTGAAGTGGCCAAGTCAGGCTGATCATGAGGTCGAGGA EE@@?==B==<A===99:>779;<<B??>@>>661114468>@@@@@@A==<>;:955/// IG0Y3ZW01AZ0AF D3S1358_18 505 5 11S110M * ACACGACGACTATGAAATCAACAGAGGCTTGCATGTATCTATCTGTCTGTCTGTCTATCTATCTATCTACCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATGAGACAGGGTC IIIIIIIIIIIIIIHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIEEEIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIC;;;B777>< AS:i:106 XS:i:95 IG0Y3ZW01CL1DS 16 VWA_16(18') 519 2 1S111M1D13M11S * TATCAGTATGTGACTTGGATTGATCTATCTGTCTGTCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCCATCTATCCATCCATCCTATGTATTATCATCTGTCCAGTCGTCGTGT 35569<35338:655566666;BBBABDEDGGGGIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII99;DIIIID;;;?IIIIIIHHDDD AS:i:117 XS:i:113 IG0Y3ZW01AYYCX D3S1358_19 505 11S99M10S * ACACGACGACTATGAAATCAACAGAGGCTTGCATGTATCTATCTGTCTGTCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATGAGCAGGTCG EEEIIEEGBBIEEIGDDIIEEEIIIGG666IIIIIIIIIIIEIEIIIIIIIIIIIEIIIIIIIIIIIIIIIEIIIEIIIIIIIIIIIIIEIEIIIEIEIBIEIEGEDFF??:685-,,0, AS:i:99 XS:i:102 IG0Y3ZW01B9W8N AMELppx16_X 1 182 11S106M11S * ACACGACGACTCCCTGGGCTCTGTAAAGAATAGTGTGTTGATTCTTTATCCCAGATGTTTCTCAAGTGGTCCTGATTTTACAGTTCCTACCACCAGCTTCCCAGTTTAAGCTCTGATAGTCGTGGTGT IIFFFIIIIGD333?===DDCID?455?9IIIIIIH;88IIIIF33345AAEIIIIIIIIIIIIIIIIIIIIIII<<<<>FIIIIIIIHHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:107 XS:i:0 IG0Y3ZW01AR95S D3S1358_17 511 4 18S97M20S * ACACGACGACTATGAAATTCAACAGAGGCTTGCATGTATCTATCTGTCTGTCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATGAGACAGGTCGTTCGTCTCTCGTCTCCA ?>0///0>83333:-..111333>B>??:;;BFGGC@CGIEIEIIIIIEIGIGIIIIIEIEIEIEIBIBIEIEIIIEIIIEIEIEIEIEIEIEIBI@@@DEIEGEEBEE:::=<<<999>666;999450---09 AS:i:97 XS:i:89 IG0Y3ZW01A1RIA D3S1358_16 509 4 14S10M1D82M * ACACGACGACTATGAATCAACAGAGCTTGCATGTATCTATCTGTCTGTCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATGAGACA @@==?EEBB@?ACBAA:7///4@>6444@@BB:9:@@BBBFBIEIEIGIIIIIIIIIEIEIIIIIEIEIIIBHBIEIEIEIEIE?=?BIEIEIEIEGEE=>=<><6 AS:i:85 XS:i:79 IG0Y3ZW01A4QYM 16 D3S1358_18 523 29 86M1I35M11S * TGCATGTATCTATCTGTCTGTCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATGAGACGAGGGTCTTGCTCTGTCACCCAGATTGGACTGCAGTAGTCGTCGTGT 06??>???AEBGBGEGBFBGEHBHEIEIEIEIEIEIEIIIEIEIEIEHBHEIBHBIEIEIEIIIEIEIEGBEBBB>9:34466;963666;865576833--,,--->>:9::;<@IIIAAAIIIIIIIEEEE AS:i:114 XS:i:101 IG0Y3ZW01CX2AF 16 VWA_19(21) 519 5 22S123M1D13M11S * GACTGCCCTAGTGATGATAGATATCAGTATGTGACTTGGATTGATCTATCTGTCTGTCTGTCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCCATCTATCCATCCATCCTATGTATTATCATCTGTCCAGTCGTCGTGT 488:BBBBHEAAAFFGD?<=88888??688;;@<888B@;==AB>AAAFFFIIIIIIIIHHHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIBI557GIIIIIGGGIIIIIIIIIIII AS:i:129 XS:i:116 IG0Y3ZW01CG6IO 16 D5S818_12 496 5 16S68M1D17M33S * CCTCTTTGGATCCTACGTAATATTTTGAAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGAGGTATAATAAGGATACAGATAAGATACAATGTAGTAACTGTGCTAGTCGTCGTGT 885D@BBDEEE;;;:==7777E7889GICEFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIECCGG>;:E:6977=><EEI>AAEIIA7949?FA10/096798DIIIIIIIIIIIIIII AS:i:78 XS:i:64 IG0Y3ZW01AKQFP 16 D5S818_12 506 7 58M1D30M1I14M10S * GAAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGAGGTATAATAAGGATACAGATAAAGATACAAATGTTAGTAAACTGTGGCTAGTCGTCGTGT 36;?AABBEIEIEIIIIIBHBIEIEIEIEIIIIIIIBD>>>IEIIIEGEEDBBC<;70//76<;:9:8:8:1,,,,31713,,,2>00.3400168:;<?;;<=@:::=BB@B AS:i:88 XS:i:75 IG0Y3ZW01AQYTM D3S1358_18 509 2 14S113M5S * ACACGACGACTATGAATCAACAGAGGCTTGCATGTATCTATCTGTCTGTCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATGAGACCAGGTCTTGCTCTCGTCA EEE>666>=BEEEEDFGB998EIGGGGGGHIIIIIIIIIIIIIIIIIIIIGIGIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIEIEIEIEIBIEIBIBEBEBD@=96668888333:4454461/0//, AS:i:105 XS:i:95 IG0Y3ZW01CDVGI 16 D21S11_29.2 524 1 10S211M10S * GTCCAACTTCCCCCAAGTGAATTGCCTTCTATCTATCTATCTATCTGTCTGTCTGTCTGTCTGTCTGTCTATCTATCTATATCTATCTATCTATCATCTATCTATCCATATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATATCTATCGTCTATCTATCCAGTCTATCTACCTCCTATTAGTCTGTCTCTGGAGAACATTGACTAATACAAGTCGTCGTGT 2/..22-2200222333555<<<ECFDGFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:207 XS:i:202 IG0Y3ZW01BEB4K D3S1358_18 509 3 14S121M1I14M11S * ACACGACGACTATGAATCAACAGAGGCTTGCATGTATCTATCTGTCTGTCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATGAGACAGGGTCTTGCTCTGTCACCCACGATTGGACTGCAGTAGTCGTGGTGT IIIIIIIIIIIIIIIIII@@?IIII<777IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHFDDBFFD82311818<DDDHGIIIHHHIIIIIII>>>II AS:i:128 XS:i:115 IG0Y3ZW01AY4O4 16 D3S1358_17 550 55M1I35M11S * ATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATGAGACGAGGGTCTTGCTCTGTCACCCAGATTGGACTGCAGTAGTCGTCGTGT <>@>BBBBBBEEGEGEIEIBIBIEIEIEID888EIEIBB@@@IBE??>??<88/...3886755:88<911.31119B><<<=<<<=??@@@:888>DEEEE AS:i:83 XS:i:79 IG0Y3ZW01A55IQ D3S1358_18 505 6 11S110M1I15M3S * ACACGACGACTATGAAATCAACAGAGGCTTGCATGTATCTATCTGTCTGTCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATGAGACAGGGTCGTTGCTCTGTCACCCACGA IIIIIIIIIIIIID777HIEECIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIDCCIIEBA;178>?=AAB?BB=<<<863 AS:i:118 XS:i:105 IG0Y3ZW01B3IID 16 D5S818_12 469 4 11S33M1I107M10S * ACACCACGACTGGTGATTTTCCTCTTTGGTATCCTTACGTAATATTTTTGAAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGAGGTATAAATAAGGATACAGATAAAGATACAAATGTTGTAAACTGTGGCTAGTCGTCGTGT IIIIIIIIIIIIIGB=2211BBB<111?BFIIIFFCBFFBBB2;00011BBFGFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII<333GII:G211EIIIG?455@IIIIIIIIIIIIIIIIFF AS:i:129 XS:i:118 IG0Y3ZW01BLYEI D3S1358_18 509 3 14S106M6S * ACACGACGACTATGAATCAACAGAGGCTTGCATGTATCTATCTGTCTGTCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATGAGACAGGGTCGTTGTC IIIIIIIIIIIIIIIIIIBB>EHDD:555>@GIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHIC666?1//8:111112 AS:i:106 XS:i:95

slide-8
SLIDE 8

8

Sequence filtering rule

à Reads containing entire STR region and 5 bp

  • f 5’ and 3’ flanking regions

5’-flanking region 3’ flanking region STR region

  • a. Most STR loci (e.g. D3S1358)

5’-flanking region 3’ flanking region STR region

  • b. D18S51

TCAACAGAGGCTTGCATGTA TCTA/TCTG/TCTG/TCTG/TCTA/TCTA/TCTA/TCTA/TCTA/TCTA TGAGACAGGGTCTTGCTCTG ACAAATTGAGACCTTGTCTC AGAA/AGAA/AGAA/AGAA/AGAA/AGAA/AGAA/AGAA/AGAA/AGAA AAAG/AG/AG/AG(AG)GAAAGAAAGAGAAAAAGAAA

D3S1358

115–147

9470 8743 6341 6012 14261 13306 D5S818

119–155

9485 8705 5523 5011 9347 8531 D7S820

215–247

3676 3476 1868 1780 4815 4603 D8S1179

203–247

4458 4017 1967 1805 3368 3054 D13S317

169–201

4897 4631 4060 3868 12839 12140 D16S539

264–304

967 877 708 655 2497 2361 D18S51

290–366

739 332 1284 546 1117 481 D21S11

203–259

3045 2313 2996 2525 4873 3871 CSF1PO

321–357

291 244 596 522 862 742 FGA

322–444

956 460 666 255 3137 1440 Penta_D

376–441

142 31 267 56 403 75 Penta_E

379–474

193 84 356 116 563 309 TH01

156–195

5503 4620 3324 2811 6712 5518 TPOX

262–290

269 230 215 183 679 576 VWA

123–171

3153 2782 1014 919 8565 7649 AMELppx16

106, 112

3416 3247 1773 1741 2334 2247 50660 44792 32958 28805 76372 66903 1105 410 834 Amplicon size range Total Unaligned STR locus

2800M 9947A 1:1 Mixture

All Entire STR All Entire STR All Entire STR

Analysis results of alignment output

  • No. of aligned reads at each STR locus
slide-9
SLIDE 9

9

Determination of STR alleles from single-source samples

20% of total coverage 2800M Single (D3S1358 - allele 17 and 18) 9947A Single (D3S1358 - allele 14 and 15)

Determination of STR alleles from 1:1 mixture sample

10% of total coverage 2800M-9947A 1:1 mixture

slide-10
SLIDE 10

10

Determination of STR repeat structure and sequence variation in target region using Integrative genomics viewer (IGV)

[TATC]11 TATCAATC D13S317 (allele 11)

2800M

STR locus Genotype Core repeat Repeat structure D3S1358 17, 18 TCTA

17: TCTA [TCTG]3 [TCTA]13 18: TCTA [TCTG]3 [TCTA]14

D5S818 12 AGAT

12: [AGAT]12

D7S820 8, 11 GATA

8: [GATA]8 11: [GATA]11

D8S1179 14, 15 TCTA

14: TCTA TCTG [TCTA]12 15: [TCTA]2 TCTG [TCTA]12

D13S317 9, 11 TATC

9: [TATC]9 [AATC]2 11: [TATC]11 TATC AATC

D16S539 9, 13 GATA

9: [GATA]9 13: [GATA]13

D18S51 16, 18 AGAA

16: [AGAA]16 AAAG [AG]3 18: [AGAA]18 AAAG [AG]3

D21S11 29, 31.2 TCTA

29: [TCTA]4 [TCTG]6 [TCTA]3 TA [TCTA]3 TCA [TCTA]2 TCCA TA [TCTA]11 31.2: [TCTA]5 [TCTG]6 [TCTA]3 TA [TCTA]3 TCA [TCTA]2 TCCA TA [TCTA]11 TA TCTA

CSF1PO 12 AGAT

12: [AGAT]12

FGA 20, 23 CTTT

20: [TTTC]3 TTTT TTCT [CTTT]12 CTCC [TTCC]2 23: [TTTC]3 TTTT TTCT [CTTT]15 CTCC [TTCC]2

Penta_D 12, 13 AAAGA

12: [AAAGA]12 13: [AAAGA]13

Penta_E 7, 14 AAAGA

7: [AAAGA]7 14: [AAAGA]14

TH01 6, 9.3 AATG

6: [AATG]6 9.3: [AATG]6 ATG [AATG]3

TPOX 11 AATG

11: [AATG]11

VWA 16, 19 TCTA

16: TCTA [TCTG]3 [TCTA]12 TCCA TCTA 19: TCTA [TCTG]4 [TCTA]14 TCCA TCTA Red: flanking region

slide-11
SLIDE 11

11

9947A

STR locus Genotype Core repeat Repeat structure D3S1358 14, 15 TCTA

14: TCTA [TCTG]2 [TCTA]11 15: TCTA [TCTG]2 [TCTA]12

D5S818 11 AGAT

12: [AGAT]11

D7S820 10, 11 GATA

10: [GATA]10 11: [GATA]11

D8S1179 13 TCTA

13a: TCTA TCTG [TCTA]11 13b: [TCTA]13

D13S317 11 TATC

11: [TATC]11 [AATC]2

D16S539 11, 12 GATA

11: [GATA]11 12: [GATA]12

D18S51 15, 19 AGAA

15: [AGAA]15 AAAG [AG]3 19: [AGAA]19 AAAG [AG]3

D21S11 30 TCTA

30: [TCTA]6 [TCTG]5 [TCTA]3 TA [TCTA]3 TCA [TCTA]2 TCCA TA [TCTA]11

CSF1PO 10, 12 AGAT

10: [AGAT]10 12: [AGAT]12

FGA 23, 24 CTTT

23: [TTTC]3 TTTT TTCT [CTTT]15 CTCC [TTCC]2 24: [TTTC]3 TTTT TTCT [CTTT]16 CTCC [TTCC]2

Penta_D 12 AAAGA

12: [AAAGA]12

Penta_E 12, 13 AAAGA

12: [AAAGA]12 13: [AAAGA]13

TH01 8, 9.3 AATG

8: [AATG]8 9.3: [AATG]6 ATG [AATG]3

TPOX 8 AATG

8: [AATG]8

VWA 17, 18 TCTA

17: TCTA [TCTG]4 [TCTA]12 TCCA TCTA 18: TCTA [TCTG]4 [TCTA]13 TCCA TCTA Red: flanking region

D3S1358 17, 18 TCTA

17: TCTA [TCTG]3 [TCTA]13 18: TCTA [TCTG]3 [TCTA]14

D13S317 9, 11 TATC

9: [TATC]9 [AATC]2 11: [TATC]11 TATC AATC

D3S1358 14, 15 TCTA

14: TCTA [TCTG]2 [TCTA]11 15: TCTA [TCTG]2 [TCTA]12

D8S1179 13 TCTA

13a: TCTA TCTG [TCTA]11 13b: [TCTA]13

2800M 9947A 2800M 9947A

  • 2. Difference repeat structure between samples
  • 3. Sequence variation in flanking region
  • 1. Two STR alleles with the same length, but with different sequence
slide-12
SLIDE 12

12

49.8 42.97 10 20 30 40 50 60 8 9 10 11 12 13 14 15 16 17 18 19 20 Percentage (%)

2800M

47.64 47.89 10 20 30 40 50 60 8 9 10 11 12 13 14 15 16 17 18 19 20 Percentage (%)

9947A

11.4 16.86 37.1 29.32 10 20 30 40 50 60 8 9 10 11 12 13 14 15 16 17 18 19 20 Percentage (%)

1:1 mixture

Estimation of mixture ratio based on coverage ratios of determined STR alleles (D3S1358 locus)

Read arrows indicate main alleles of this locus. CR represents ratio of coverage depth among STR alleles. CR= 1 : 1.5 : 3.3 : 2.6 X axis: STR alleles Y axis: percentage of coverage depth Expected CR= 1 : 1 : 1 : 1 8955 reads 6156 reads 13659 reads

Estimation of mixture ratio based on reference/variant ratios from

  • bserved sequence variations (D13S317 locus)

(1) 2800M (2) 9947A (3) 1:1 mixture

Sample (coverage)

A T G C

2800M (1979) Fwd 19 920 Rev 23 1015 2 Sum 42 1935 (98%) 2 9947A (3489) Fwd 1880 1 2 Rev 1599 5 1 1 Sum 3479 (100%) 6 3 1 Sample (coverage)

A T G C

1:1 mixture (5683) Fwd 1610 1828 2 1 Rev 1032 1209 1 Sum 2642 (46%) 3037 (53%) 2 2

T A T/A

[TATC]11 AATCAATC [TATC]11 TATCAATC

slide-13
SLIDE 13

13

Estimation of male/female ratio based on coverage ratios of determined X and Y alleles of Amelogenin locus

Read arrows indicate main alleles of this locus. CR represents ratio of coverage depth among STR alleles. X axis: X and Y alleles Y axis: proportion of coverage depth Expected CR= 3 : 1 (1) 2800M 3264 reads

0.00% 20.00% 40.00% 60.00% 80.00% 100.00% X Y

(2) 9947A 1750 reads

0.00% 20.00% 40.00% 60.00% 80.00% 100.00% X Y

CR= 1.36 : 1 (3) 1:1 mixture 2255 reads

0.00% 20.00% 40.00% 60.00% 80.00% 100.00% X Y

Summary

— Design of STR reference sequences with long flanking

sequences (500~550 bp) allows sample sequences generated with any primer combinations to be aligned

  • nto the reference.

— Successful STR allele call from single source and 1:1

mixture sample could be achieved based on 20% and 10%

  • f total coverage, respectively.

— The method provides analysis results compatible with

previous data format that are used in CE-based assay.

slide-14
SLIDE 14

14

Summary

— STR repeat structure could be determined by examining

both length and sequence variations in target STR regions.

— Actual mixture ratio of mixed sample was estimated by

analyzing coverage ratios of the assigned alleles in each STR locus and reference/variant ratios from observed sequence variations.

Acknowledgement

Kyoung-Jin Shin, Ph. D Eun Hye Kim, BS Seung Hwan Lee, Ph. D Su Jeong Park, Ph. D

Department of Forensic Medicine, Yonsei University College of Medicine DNA Analysis Laboratory, DNA Forensic Division, Supreme Prosecutors' Office