applications using the class frequency
play

Applications using the Class Frequency Distribution of Maximal - PowerPoint PPT Presentation

2019/7/19. 23 AI Summer Program : Hadoop Map&Reduce Programming for Big Traffic Data Management Applications using the Class Frequency Distribution of Maximal Repeats from Tagged Sequential Data. (Dr. Jing-Doo Wang)


  1. 2019/7/19. 23 AI Summer Program : Hadoop Map&Reduce Programming for Big Traffic Data Management Applications using the Class Frequency Distribution of Maximal Repeats from Tagged Sequential Data. 王經篤 博士 (Dr. Jing-Doo Wang) 亞洲大學 ( Asia University )

  2. Chinese proverbs: 『老王』賣瓜 Is it sweet and juicy? http://www.pxmart.com.tw/px/ingredients.px?id=2592 http://www.9ht.com/xue/44228.html

  3. Outline • Introduction – What is “Sequential Data”? – A scalable approach of Maximal Repeat Extraction • Applications with Tagged Sequential Data – Analyzing Text Archaeology. – Extracting Significant Travel Time Interval – Mining for Biomarker. – Improving Quality Control. • Future Works

  4. What is “ Sequential Data ”? • Textual Data : News, Journal Articles, etc. http://edition.cnn.com/2017/11/22/health/jfk-assassination-back-pain/index.html From:https://www.udn.com/news/story/7266/2834500 https://www.ncbi.nlm.nih.gov/pubmed/24372032

  5. What is “ Sequential Data ”? • Genomic Sequences From:http://blogs.nature.com/naturejobs/2015/10/08 /big-data-the-impact-of-the-human-genome-project/

  6. What is “ Sequential Data ”? • Traffic Transportation https://tptis2015.blogspot.tw/2015/07/300-brt.html https://attach.mobile01.com/640x480/attach/201312/ https://tptis2015.blogspot.tw/2017/10/blog-post.html mobile01-b004e8fd829e35140b3de0d91e847953.jpg

  7. Product Traceability **************************************** http://www.slideshare.net/5045033/ss-1002323 7 http://technews.tw/2016/04/11/tsmc-and-largan/ www.iconarchive.com

  8. It‘s a big data problem !

  9. How to mine from these “sequential data”? http://clipart- http://clipart- library.com/clipart/6Tr5BGG7c.htm library.com/clipart/kiKB8qLRT.htm

  10. How to mine from these “sequential data”? ? From: http://globe-views.com/dcim/dreams/mine/mine-03.jpg

  11. It ’ s a Big Data problem! http://haphazardstuffblog.com/wp- content/uploads/2012/01/Big-truck.jpg http://www.mining.com/wp-content/uploads/2015/06/Veladero-Mine.jpg

  12. What kind of “features” extracted from Sequential Data? • http://www.quickanddirtytips.com/sites/ default/files/images/2499/question- http://images.slideplayer.com/16/5176005/slides/slide_2.jpg mark2.jpg

  13. What kind of “Mineral” do you want (mine)? https://www.popsci.com/features/how-to-be-an-expert-in- anything/images/feature_video.jpg https://media1.britannica.com/eb-media/71/143171-049-53725C29.jpg

  14. Outline • Introduction – What is “Sequential Data”? – A scalable approach of Maximal Repeat Extraction • Applications with Tagged Sequential Data – Analyzing Text Archaeology. – Extracting Significant Travel Time Interval – Mining for Biomarker. – Improving Quality Control. • Future Works

  15. Journal of Supercomputing, April 2016 https://link.springer.com/article/10.1007/s11227-016- 1713z?wt_mc=internal.event.1.SEM.ArticleAuthorOnlineFirst

  16. Why use “ Maximal Repeats ” as features? • Dictionary – How to identify new words or phrases? – e.g. “just do it”, “ 洪荒之力 ” 。 • N-gram (K-mers) – 2-gram, 3- gram,…,5 -grams. (Google Ngram viewer) – The value of “N” is limited . • Maximal Repeat – The length of maximal repeat is variable.

  17. Example: Maximal Repeat Pattern “ xabcyiiizabcqabcyrxar ” • ab • bc Not Maximal repeat Pattern • abc • abcy 17

  18. Distinctive Pattern Mining(1) These Classes are labeled Classes by Domain Experts S1:******************************** S2:*********#****?***********@***** S3:********************$*********** S4:*****&*******%****************** Sequences S5:********************$*********** S6:*********#****?************@**** S7:*****&*******%****************** S8:******************************** S9:*****&*******%****************** S10:*********#****?************@**** S11:******************************** 18 jdwang@asia.edu.tw

  19. Distinctive Pattern Mining(2) Classes ******************************** ******************************** ******************************** *********#****?************@**** *********#****?************@**** *********#****?************@**** *****&*******%****************** *****&*******%****************** *****&*******%****************** ********************$*********** ********************$*********** 19 jdwang@asia.edu.tw

  20. Distinctive Pattern Mining(3) Maximal Repeats #****? @**** &*******% $********** ***** Class Frequency Distribution jdwang@asia.edu.tw

  21. Applying for U.S.A. Patent From: https://www.google.com/patents/US20170255634

  22. Patent Publication Date : Sep. 7, 2017 http://haphazardstuffblog.com/wp- content/uploads/2012/01/Big- truck.jpg

  23. Outline • Introduction – What is “Sequential Data”? – A scalable approach of Maximal Repeat Extraction • Applications with Tagged Sequential Data – Analyzing Text Archaeology. – Extracting Significant Travel Time Interval – Mining for Biomarker. – Improving Quality Control. • Future Works

  24. Applications with Tagged Sequential Data • Analyzing Trend Analysis via Text Archaeology. • Extracting Significant Travel Time Interval from Gantry Timestamped Sequences. • Mining for Biomarker from Genomic Sequences. • Improving Quality Control via Product Traceability.

  25. From: http://www.mdpi.com/2076-3417/7/9/878

  26. Superhighway From: http://chiangchiafeng.tian.yam.com/posts/70456997

  27. e-Tag http://news.u-car.com.tw/article/16077

  28. 中華民國國道(高速公路)的電子收費系統 ( Electronic Toll Collection ,簡稱 ETC ) From: https://i.ytimg.com/vi/1ML2FFS2dJg/maxresdefault.jpg

  29. https://attach.mobile01.com/640x480/attach/201312/mobile01-b004e8fd829e35140b3de0d91e847953.jpg

  30. Gantry Sequences Of different Vehicle Types (VT)

  31. Gantry Timestamp Sequences with Timestamps

  32. Gantry Timestamp Sequences with TimeStamps for different Vehicles Type

  33. Significant Time Intervals of Vehicles

  34. http://www.7car.tw/articles/read/25927

  35. https://buzzorange.com/wp- content/uploads/2015/04/640_4a486 dc48d6f1414404627e1c45f1cf9.jpg http://news.ltn.com.tw/photo/society/breakingnews/10883 61_1

  36. 05F0055N,13: 33 05F0287N,13: 15 05F0309N,13: 13 05F0438N,13: 06 05F0528N,13: 00

  37. Significant Time Intervals of Vehicles 05F0528N_13_M1_ 00 05F0438N_13_M1_ 06 Significant Time intervals 05F0309N_13_M1_ 13 05F0287N_13_M1_ 15 05F0055N_13_M1_ 33 ##4 ##5 ## (2016-11-15_Mon_41#1#1) (2016-11-29_Mon_41#1#1) Class Frequency Distribution (2016-12-09_Thu_31#1#1) (2016-12-20_Mon_31#1#1)

  38. Weekday vs. 24 Hours/per day

  39. Vehicle Types vs. 24 Hours/per day

  40. Significant Patterns of Travel Time Intervals of Vehicles

  41. Outline • Introduction – What is “Sequential Data”? – A scalable approach of Maximal Repeat Extraction • Applications with Tagged Sequential Data – Analyzing Text Archaeology. – Extracting Significant Travel Time Interval – Mining for Biomarker. – Improving Quality Control. • Future Works

  42. 1+5 cluster nodes

  43. 2+ 8 cluster nodes

  44. Cloud Computing Environment

  45. Artificial Intelligence Artificial Intelligence Cloud Machine Big Data Computing Learning

  46. 古希臘的科學 Leverage principle ( 槓桿原理 ) 阿基米德撐起地球的支點 (Maximal Repeat Extraction with Class Frequency Distribution) Domain Expert Knownledge ? ? Labels (Tags) Relationship? Sequential Data ? Infrastructure (Cloud Computing) From:https://phycat.files.wordpress.com/2015/03/leverbigcorners.gif?w=810

  47. 插圖:紀玲玉

  48. Acknowledgements ( Precision Medicine ) • Jeffrey J.P. Tsai ( 亞洲大學 蔡進發 校長 ) 計劃名稱:以生醫大數據分析為基礎的精準癌症醫療研究(2/3) 計畫編號 : MOST 106-2632-E-468-002 計畫執行起迄 : 106/08/01~107/07/31

  49. Acknowledgements ( Bioinformatics ) • Charles C.N. Wang • Tsung-Chi Chen • Wen-Ling Chan • Rouh-Mei Hu • Jan-Gowth Chang • Yi-Chun Wang

  50. Acknowledgements ( Traffic Information Analysis ) • 黃銘崇 主任 • 連耀南 教授 • 潘信宏 教授 • 何承遠 教授

  51. Acknowledgements ( Big-Data: Hadoop Computing ) Jazz Wang ( 王耀聰 ) Philip Lin ( 林奇暻 ) wei-chiu chuang (莊偉赳) • Apache Hadoop Committer/PMC member

  52. Acknowledgements • Hadoop Cluster Set Up and Consulting – SYSTEX 精誠資訊( 2017 ) • Herb Hsu- 徐啟超 – Athemaster 炬識科技股份有限公司( 2018 ) • Ferrari • 亞洲大學 資訊 發展 處 黃仁德先生

  53. 『老王』賣瓜,自賣自誇 Lao Wang selling melons praises his own goods http://www.pxmart.com.tw/px/ingredients.px?id=2592 http://www.9ht.com/xue/44228.html

  54. Thanks for your listening! http://www.pptschool.com/250.html www.flickr.com www.slideshare.net

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend