SLIDE 1 Applications using the Class Frequency Distribution of Maximal Repeats from Tagged Sequential Data.
王經篤 博士 (Dr. Jing-Doo Wang) 亞洲大學 (Asia University)
2019/7/19. 23 AI Summer Program : Hadoop Map&Reduce Programming for Big Traffic Data Management
SLIDE 2 Chinese proverbs: 『老王』賣瓜
http://www.9ht.com/xue/44228.html http://www.pxmart.com.tw/px/ingredients.px?id=2592
Is it sweet and juicy?
SLIDE 3 Outline
– What is “Sequential Data”? – A scalable approach of Maximal Repeat Extraction
- Applications with Tagged Sequential Data
– Analyzing Text Archaeology. – Extracting Significant Travel Time Interval – Mining for Biomarker. – Improving Quality Control.
SLIDE 4 What is “Sequential Data”?
- Textual Data : News, Journal Articles, etc.
From:https://www.udn.com/news/story/7266/2834500 https://www.ncbi.nlm.nih.gov/pubmed/24372032 http://edition.cnn.com/2017/11/22/health/jfk-assassination-back-pain/index.html
SLIDE 5 What is “Sequential Data”?
From:http://blogs.nature.com/naturejobs/2015/10/08 /big-data-the-impact-of-the-human-genome-project/
SLIDE 6 What is “Sequential Data”?
https://attach.mobile01.com/640x480/attach/201312/ mobile01-b004e8fd829e35140b3de0d91e847953.jpg
https://tptis2015.blogspot.tw/2017/10/blog-post.html https://tptis2015.blogspot.tw/2015/07/300-brt.html
SLIDE 7 7
Product Traceability
****************************************
www.iconarchive.com http://technews.tw/2016/04/11/tsmc-and-largan/ http://www.slideshare.net/5045033/ss-1002323
SLIDE 8
It‘s a big data problem !
SLIDE 9 How to mine from these “sequential data”?
http://clipart- library.com/clipart/kiKB8qLRT.htm http://clipart- library.com/clipart/6Tr5BGG7c.htm
SLIDE 10 How to mine from these “sequential data”?
From: http://globe-views.com/dcim/dreams/mine/mine-03.jpg
?
SLIDE 11 It’s a Big Data problem!
http://www.mining.com/wp-content/uploads/2015/06/Veladero-Mine.jpg http://haphazardstuffblog.com/wp- content/uploads/2012/01/Big-truck.jpg
SLIDE 12 What kind of “features” extracted from Sequential Data?
- http://www.quickanddirtytips.com/sites/
default/files/images/2499/question- mark2.jpg
http://images.slideplayer.com/16/5176005/slides/slide_2.jpg
SLIDE 13 What kind of “Mineral” do you want (mine)?
https://media1.britannica.com/eb-media/71/143171-049-53725C29.jpg https://www.popsci.com/features/how-to-be-an-expert-in- anything/images/feature_video.jpg
SLIDE 14 Outline
– What is “Sequential Data”? – A scalable approach of Maximal Repeat Extraction
- Applications with Tagged Sequential Data
– Analyzing Text Archaeology. – Extracting Significant Travel Time Interval – Mining for Biomarker. – Improving Quality Control.
SLIDE 15 Journal of Supercomputing, April 2016
https://link.springer.com/article/10.1007/s11227-016- 1713z?wt_mc=internal.event.1.SEM.ArticleAuthorOnlineFirst
SLIDE 16 Why use “Maximal Repeats ” as features?
– How to identify new words or phrases? – e.g. “just do it”, “洪荒之力”。
– 2-gram, 3-gram,…,5-grams. (Google Ngram viewer) – The value of “N” is limited.
– The length of maximal repeat is variable.
SLIDE 17 17
“xabcyiiizabcqabcyrxar”
Not Maximal repeat Pattern
Example: Maximal Repeat Pattern
SLIDE 18 18
Distinctive Pattern Mining(1)
S3:********************$*********** S5:********************$*********** S7:*****&*******%****************** S1:******************************** S9:*****&*******%****************** S11:******************************** S10:*********#****?************@**** S8:******************************** S6:*********#****?************@**** S4:*****&*******%****************** S2:*********#****?***********@***** Sequences Classes These Classes are labeled by Domain Experts
jdwang@asia.edu.tw
SLIDE 19 19
Distinctive Pattern Mining(2)
******************************** ******************************** ******************************** *********#****?************@**** *********#****?************@**** *********#****?************@**** *****&*******%****************** *****&*******%****************** *****&*******%****************** ********************$*********** ********************$***********
Classes
jdwang@asia.edu.tw
SLIDE 20 Distinctive Pattern Mining(3)
Maximal Repeats
#****? @**** &*******% $********** *****
jdwang@asia.edu.tw
Class Frequency Distribution
SLIDE 21 Applying for U.S.A. Patent
From: https://www.google.com/patents/US20170255634
SLIDE 22 Patent Publication Date :
http://haphazardstuffblog.com/wp- content/uploads/2012/01/Big- truck.jpg
SLIDE 23 Outline
– What is “Sequential Data”? – A scalable approach of Maximal Repeat Extraction
- Applications with Tagged Sequential Data
– Analyzing Text Archaeology. – Extracting Significant Travel Time Interval – Mining for Biomarker. – Improving Quality Control.
SLIDE 24 Applications with Tagged Sequential Data
- Analyzing Trend Analysis via Text Archaeology.
- Extracting Significant Travel Time Interval from
Gantry Timestamped Sequences.
- Mining for Biomarker from Genomic Sequences.
- Improving Quality Control via Product Traceability.
SLIDE 25 From: http://www.mdpi.com/2076-3417/7/9/878
SLIDE 26 Superhighway
From: http://chiangchiafeng.tian.yam.com/posts/70456997
SLIDE 27 e-Tag
http://news.u-car.com.tw/article/16077
SLIDE 28 中華民國國道(高速公路)的電子收費系統 (Electronic Toll Collection,簡稱ETC)
From: https://i.ytimg.com/vi/1ML2FFS2dJg/maxresdefault.jpg
SLIDE 29 https://attach.mobile01.com/640x480/attach/201312/mobile01-b004e8fd829e35140b3de0d91e847953.jpg
SLIDE 30
Gantry Sequences Of different Vehicle Types (VT)
SLIDE 31
Gantry Timestamp Sequences with Timestamps
SLIDE 32
Gantry Timestamp Sequences with TimeStamps for different Vehicles Type
SLIDE 33
Significant Time Intervals of Vehicles
SLIDE 34 http://www.7car.tw/articles/read/25927
SLIDE 35 https://buzzorange.com/wp- content/uploads/2015/04/640_4a486 dc48d6f1414404627e1c45f1cf9.jpg http://news.ltn.com.tw/photo/society/breakingnews/10883 61_1
SLIDE 36 05F0055N,13:33 05F0438N,13:06
05F0287N,13:15
05F0309N,13:13 05F0528N,13:00
SLIDE 37
SLIDE 38 Significant Time Intervals
05F0528N_13_M1_00 05F0438N_13_M1_06 05F0309N_13_M1_13 05F0287N_13_M1_15 05F0055N_13_M1_33 ##4 ##5 ## (2016-11-15_Mon_41#1#1) (2016-11-29_Mon_41#1#1) (2016-12-09_Thu_31#1#1) (2016-12-20_Mon_31#1#1)
Significant Time intervals Class Frequency Distribution
SLIDE 39
Weekday vs. 24 Hours/per day
SLIDE 40
Vehicle Types vs. 24 Hours/per day
SLIDE 41
Significant Patterns of Travel Time Intervals of Vehicles
SLIDE 42 Outline
– What is “Sequential Data”? – A scalable approach of Maximal Repeat Extraction
- Applications with Tagged Sequential Data
– Analyzing Text Archaeology. – Extracting Significant Travel Time Interval – Mining for Biomarker. – Improving Quality Control.
SLIDE 43
1+5 cluster nodes
SLIDE 44
SLIDE 45
2+ 8 cluster nodes
SLIDE 46
Cloud Computing Environment
SLIDE 47
Artificial Intelligence
Artificial Intelligence Big Data Cloud Computing Machine Learning
SLIDE 48 古希臘的科學 阿基米德撐起地球的支點
From:https://phycat.files.wordpress.com/2015/03/leverbigcorners.gif?w=810
Knownledge ? Relationship? Sequential Data ? Infrastructure (Cloud Computing) Domain Expert ? Labels (Tags)
Leverage principle (槓桿原理)
(Maximal Repeat Extraction with Class Frequency Distribution)
SLIDE 50 Acknowledgements (Precision Medicine)
- Jeffrey J.P. Tsai ( 亞洲大學 蔡進發 校長)
計劃名稱:以生醫大數據分析為基礎的精準癌症醫療研究(2/3) 計畫編號: MOST 106-2632-E-468-002 計畫執行起迄: 106/08/01~107/07/31
SLIDE 51 Acknowledgements (Bioinformatics)
- Charles C.N. Wang
- Wen-Ling Chan
- Jan-Gowth Chang
- Tsung-Chi Chen
- Yi-Chun Wang
- Rouh-Mei Hu
SLIDE 52 Acknowledgements (Traffic Information Analysis)
- 黃銘崇 主任
- 潘信宏 教授
- 連耀南 教授
- 何承遠 教授
SLIDE 53 Acknowledgements (Big-Data: Hadoop Computing)
Jazz Wang (王耀聰) Philip Lin ( 林奇暻) wei-chiu chuang (莊偉赳)
- Apache Hadoop Committer/PMC
member
SLIDE 54 Acknowledgements
- Hadoop Cluster Set Up and Consulting
– SYSTEX 精誠資訊(2017)
– Athemaster 炬識科技股份有限公司(2018)
SLIDE 55 『老王』賣瓜,自賣自誇
http://www.9ht.com/xue/44228.html http://www.pxmart.com.tw/px/ingredients.px?id=2592
Lao Wang selling melons praises his own goods
SLIDE 56 Thanks for your listening!
www.flickr.com www.slideshare.net http://www.pptschool.com/250.html