Applications using the Class Frequency Distribution of Maximal - - PowerPoint PPT Presentation

applications using the class frequency
SMART_READER_LITE
LIVE PREVIEW

Applications using the Class Frequency Distribution of Maximal - - PowerPoint PPT Presentation

2019/7/19. 23 AI Summer Program : Hadoop Map&Reduce Programming for Big Traffic Data Management Applications using the Class Frequency Distribution of Maximal Repeats from Tagged Sequential Data. (Dr. Jing-Doo Wang)


slide-1
SLIDE 1

Applications using the Class Frequency Distribution of Maximal Repeats from Tagged Sequential Data.

王經篤 博士 (Dr. Jing-Doo Wang) 亞洲大學 (Asia University)

2019/7/19. 23 AI Summer Program : Hadoop Map&Reduce Programming for Big Traffic Data Management

slide-2
SLIDE 2

Chinese proverbs: 『老王』賣瓜

http://www.9ht.com/xue/44228.html http://www.pxmart.com.tw/px/ingredients.px?id=2592

Is it sweet and juicy?

slide-3
SLIDE 3

Outline

  • Introduction

– What is “Sequential Data”? – A scalable approach of Maximal Repeat Extraction

  • Applications with Tagged Sequential Data

– Analyzing Text Archaeology. – Extracting Significant Travel Time Interval – Mining for Biomarker. – Improving Quality Control.

  • Future Works
slide-4
SLIDE 4

What is “Sequential Data”?

  • Textual Data : News, Journal Articles, etc.

From:https://www.udn.com/news/story/7266/2834500 https://www.ncbi.nlm.nih.gov/pubmed/24372032 http://edition.cnn.com/2017/11/22/health/jfk-assassination-back-pain/index.html

slide-5
SLIDE 5

What is “Sequential Data”?

  • Genomic Sequences

From:http://blogs.nature.com/naturejobs/2015/10/08 /big-data-the-impact-of-the-human-genome-project/

slide-6
SLIDE 6

What is “Sequential Data”?

  • Traffic Transportation

https://attach.mobile01.com/640x480/attach/201312/ mobile01-b004e8fd829e35140b3de0d91e847953.jpg

https://tptis2015.blogspot.tw/2017/10/blog-post.html https://tptis2015.blogspot.tw/2015/07/300-brt.html

slide-7
SLIDE 7

7

Product Traceability

****************************************

www.iconarchive.com http://technews.tw/2016/04/11/tsmc-and-largan/ http://www.slideshare.net/5045033/ss-1002323

slide-8
SLIDE 8

It‘s a big data problem !

slide-9
SLIDE 9

How to mine from these “sequential data”?

http://clipart- library.com/clipart/kiKB8qLRT.htm http://clipart- library.com/clipart/6Tr5BGG7c.htm

slide-10
SLIDE 10

How to mine from these “sequential data”?

From: http://globe-views.com/dcim/dreams/mine/mine-03.jpg

?

slide-11
SLIDE 11

It’s a Big Data problem!

http://www.mining.com/wp-content/uploads/2015/06/Veladero-Mine.jpg http://haphazardstuffblog.com/wp- content/uploads/2012/01/Big-truck.jpg

slide-12
SLIDE 12

What kind of “features” extracted from Sequential Data?

  • http://www.quickanddirtytips.com/sites/

default/files/images/2499/question- mark2.jpg

http://images.slideplayer.com/16/5176005/slides/slide_2.jpg

slide-13
SLIDE 13

What kind of “Mineral” do you want (mine)?

https://media1.britannica.com/eb-media/71/143171-049-53725C29.jpg https://www.popsci.com/features/how-to-be-an-expert-in- anything/images/feature_video.jpg

slide-14
SLIDE 14

Outline

  • Introduction

– What is “Sequential Data”? – A scalable approach of Maximal Repeat Extraction

  • Applications with Tagged Sequential Data

– Analyzing Text Archaeology. – Extracting Significant Travel Time Interval – Mining for Biomarker. – Improving Quality Control.

  • Future Works
slide-15
SLIDE 15

Journal of Supercomputing, April 2016

https://link.springer.com/article/10.1007/s11227-016- 1713z?wt_mc=internal.event.1.SEM.ArticleAuthorOnlineFirst

slide-16
SLIDE 16

Why use “Maximal Repeats ” as features?

  • Dictionary

– How to identify new words or phrases? – e.g. “just do it”, “洪荒之力”。

  • N-gram (K-mers)

– 2-gram, 3-gram,…,5-grams. (Google Ngram viewer) – The value of “N” is limited.

  • Maximal Repeat

– The length of maximal repeat is variable.

slide-17
SLIDE 17

17

“xabcyiiizabcqabcyrxar”

  • ab
  • bc
  • abc
  • abcy

Not Maximal repeat Pattern

Example: Maximal Repeat Pattern

slide-18
SLIDE 18

18

Distinctive Pattern Mining(1)

S3:********************$*********** S5:********************$*********** S7:*****&*******%****************** S1:******************************** S9:*****&*******%****************** S11:******************************** S10:*********#****?************@**** S8:******************************** S6:*********#****?************@**** S4:*****&*******%****************** S2:*********#****?***********@***** Sequences Classes These Classes are labeled by Domain Experts

jdwang@asia.edu.tw

slide-19
SLIDE 19

19

Distinctive Pattern Mining(2)

******************************** ******************************** ******************************** *********#****?************@**** *********#****?************@**** *********#****?************@**** *****&*******%****************** *****&*******%****************** *****&*******%****************** ********************$*********** ********************$***********

Classes

jdwang@asia.edu.tw

slide-20
SLIDE 20

Distinctive Pattern Mining(3)

Maximal Repeats

#****? @**** &*******% $********** *****

jdwang@asia.edu.tw

Class Frequency Distribution

slide-21
SLIDE 21

Applying for U.S.A. Patent

From: https://www.google.com/patents/US20170255634

slide-22
SLIDE 22

Patent Publication Date :

  • Sep. 7, 2017

http://haphazardstuffblog.com/wp- content/uploads/2012/01/Big- truck.jpg

slide-23
SLIDE 23

Outline

  • Introduction

– What is “Sequential Data”? – A scalable approach of Maximal Repeat Extraction

  • Applications with Tagged Sequential Data

– Analyzing Text Archaeology. – Extracting Significant Travel Time Interval – Mining for Biomarker. – Improving Quality Control.

  • Future Works
slide-24
SLIDE 24

Applications with Tagged Sequential Data

  • Analyzing Trend Analysis via Text Archaeology.
  • Extracting Significant Travel Time Interval from

Gantry Timestamped Sequences.

  • Mining for Biomarker from Genomic Sequences.
  • Improving Quality Control via Product Traceability.
slide-25
SLIDE 25

From: http://www.mdpi.com/2076-3417/7/9/878

slide-26
SLIDE 26

Superhighway

From: http://chiangchiafeng.tian.yam.com/posts/70456997

slide-27
SLIDE 27

e-Tag

http://news.u-car.com.tw/article/16077

slide-28
SLIDE 28

中華民國國道(高速公路)的電子收費系統 (Electronic Toll Collection,簡稱ETC)

From: https://i.ytimg.com/vi/1ML2FFS2dJg/maxresdefault.jpg

slide-29
SLIDE 29

https://attach.mobile01.com/640x480/attach/201312/mobile01-b004e8fd829e35140b3de0d91e847953.jpg

slide-30
SLIDE 30

Gantry Sequences Of different Vehicle Types (VT)

slide-31
SLIDE 31

Gantry Timestamp Sequences with Timestamps

slide-32
SLIDE 32

Gantry Timestamp Sequences with TimeStamps for different Vehicles Type

slide-33
SLIDE 33

Significant Time Intervals of Vehicles

slide-34
SLIDE 34

http://www.7car.tw/articles/read/25927

slide-35
SLIDE 35

https://buzzorange.com/wp- content/uploads/2015/04/640_4a486 dc48d6f1414404627e1c45f1cf9.jpg http://news.ltn.com.tw/photo/society/breakingnews/10883 61_1

slide-36
SLIDE 36

05F0055N,13:33 05F0438N,13:06

05F0287N,13:15

05F0309N,13:13 05F0528N,13:00

slide-37
SLIDE 37
slide-38
SLIDE 38

Significant Time Intervals

  • f Vehicles

05F0528N_13_M1_00 05F0438N_13_M1_06 05F0309N_13_M1_13 05F0287N_13_M1_15 05F0055N_13_M1_33 ##4 ##5 ## (2016-11-15_Mon_41#1#1) (2016-11-29_Mon_41#1#1) (2016-12-09_Thu_31#1#1) (2016-12-20_Mon_31#1#1)

Significant Time intervals Class Frequency Distribution

slide-39
SLIDE 39

Weekday vs. 24 Hours/per day

slide-40
SLIDE 40

Vehicle Types vs. 24 Hours/per day

slide-41
SLIDE 41

Significant Patterns of Travel Time Intervals of Vehicles

slide-42
SLIDE 42

Outline

  • Introduction

– What is “Sequential Data”? – A scalable approach of Maximal Repeat Extraction

  • Applications with Tagged Sequential Data

– Analyzing Text Archaeology. – Extracting Significant Travel Time Interval – Mining for Biomarker. – Improving Quality Control.

  • Future Works
slide-43
SLIDE 43

1+5 cluster nodes

slide-44
SLIDE 44
slide-45
SLIDE 45

2+ 8 cluster nodes

slide-46
SLIDE 46

Cloud Computing Environment

slide-47
SLIDE 47

Artificial Intelligence

Artificial Intelligence Big Data Cloud Computing Machine Learning

slide-48
SLIDE 48

古希臘的科學 阿基米德撐起地球的支點

From:https://phycat.files.wordpress.com/2015/03/leverbigcorners.gif?w=810

Knownledge ? Relationship? Sequential Data ? Infrastructure (Cloud Computing) Domain Expert ? Labels (Tags)

Leverage principle (槓桿原理)

(Maximal Repeat Extraction with Class Frequency Distribution)

slide-49
SLIDE 49

插圖:紀玲玉

slide-50
SLIDE 50

Acknowledgements (Precision Medicine)

  • Jeffrey J.P. Tsai ( 亞洲大學 蔡進發 校長)

計劃名稱:以生醫大數據分析為基礎的精準癌症醫療研究(2/3) 計畫編號: MOST 106-2632-E-468-002 計畫執行起迄: 106/08/01~107/07/31

slide-51
SLIDE 51

Acknowledgements (Bioinformatics)

  • Charles C.N. Wang
  • Wen-Ling Chan
  • Jan-Gowth Chang
  • Tsung-Chi Chen
  • Yi-Chun Wang
  • Rouh-Mei Hu
slide-52
SLIDE 52

Acknowledgements (Traffic Information Analysis)

  • 黃銘崇 主任
  • 潘信宏 教授
  • 連耀南 教授
  • 何承遠 教授
slide-53
SLIDE 53

Acknowledgements (Big-Data: Hadoop Computing)

Jazz Wang (王耀聰) Philip Lin ( 林奇暻) wei-chiu chuang (莊偉赳)

  • Apache Hadoop Committer/PMC

member

slide-54
SLIDE 54

Acknowledgements

  • Hadoop Cluster Set Up and Consulting

– SYSTEX 精誠資訊(2017)

  • Herb Hsu-徐啟超

– Athemaster 炬識科技股份有限公司(2018)

  • Ferrari
  • 亞洲大學 資訊發展處 黃仁德先生
slide-55
SLIDE 55

『老王』賣瓜,自賣自誇

http://www.9ht.com/xue/44228.html http://www.pxmart.com.tw/px/ingredients.px?id=2592

Lao Wang selling melons praises his own goods

slide-56
SLIDE 56

Thanks for your listening!

www.flickr.com www.slideshare.net http://www.pptschool.com/250.html