Welcome
THE UNIVERSITY OF TEXAS AT ARLINGTON
DATA-DRIVEN DISCOVERY
Welcome Data-Driven Discovery Teik C. Lim, PhD Provost and Vice - - PowerPoint PPT Presentation
THE UNIVERSITY OF TEXAS AT ARLINGTON DATA-DRIVEN DISCOVERY Welcome Data-Driven Discovery Teik C. Lim, PhD Provost and Vice President for Academic Affairs D ATA - D R I V E N D I S C O V E RY Completed/Underway (select cases): New
THE UNIVERSITY OF TEXAS AT ARLINGTON
DATA-DRIVEN DISCOVERY
Teik C. Lim, PhD
Provost and Vice President for Academic Affairs
Completed/Underway (select cases):
Planning (select items):
D ATA - D R I V E N D I S C O V E RY
2
NEW T/TT FACULTY
(Since Fall ‘17)
CAPPA Jiwon Suh COED
Catherine Robert Daniel Robinson
COB
Wayne Crawford Yun Fan Alison Hall Alper Nakkas David Rosser
Vaghefi
COE
William Beksi Ye Cao Animesh Chakravarthy Kyung "Kate" Hyun Mohammad Islam Chen Kan Won Hwa Kim Caroline Krejci Ming “Kate” Li Shirin Nilizadeh Deok Gun Park Samantha Sabatino Dajiang Zhu
COS
Yujie Chi Souvik Roy Amir Shahmoradi Leili Shahriyari Li Wang
COLA
Cynthia Laborde Seungmug “Zech” Lee Joshua Wilson
SSW
Ryon Cobb Genevieve Graaf
CONHI
Ziyad Ben Taleb Jing Wang
D ATA - D R I V E N D I S C O V E RY
3
POSSIBLE NEW/ ENHANCED ACADEMIC PROGRAMS OR UNITS
COB
Business Analytics COE
Data Science & Analytics SMART Cities Bioinformatics Cybersecurity
CAPPA SMART Cities
CONHI
Health Care Informatics Healthcare simulation Bioinformatics Biotech
COS
Data Science and Analytics Bioinformatics
4
D ATA - D R I V E N D I S C O V E RY
COLA Digital Humanities Media/Digital Comm.
THE UNIVERSITY OF TEXAS AT ARLINGTON
DATA-DRIVEN DISCOVERY SYMPOSIUM
Harry Dombroski
Dean, College of Business
THE UNIVERSITY OF TEXAS AT ARLINGTON
DATA-DRIVEN DISCOVERY SYMPOSIUM
Teresa Taber Doughty, PhD Dean, College of Education
How fast data are changing Amount of available data Different kinds and sources of data What are the characteristics of Big Data?
Gartner, 2012
Meaningful Insight
Students Instructors Programs Institutions
Real-Time Data
Dashboards Statistical Analysis Machine Learning Data Modeling Data Mining
Transforming Education
Grades Outcomes Evidence Continuous Learning
Personalized Learning
What does Big Data mean in Education?
From: Desire2Learn
Higher Education
Student Retention Degree Completion Time to Degree
degree
academically prepared
Bachelor’s within 8 years
bachelor’s in 8 years
needed
From: Complete College America, Time is the Enemy Summary
predicting future outcomes (graduation rates, post-school employment, success in STEM fields, etc…)?
complete high school? A 4-year degree? Graduate degree? How does type of disability impact employment rates?
traditional, F2F program versus a blended or online post-secondary program?
– Examining big data set related to longitudinal data relating to Texas public schools – https://www.utdallas.edu/research/tsp-erc/data- holdings.html – Davis, B.W., & Bowers, A.J. (2018). Stepping stones and pathways from school district leadership certification to the superintendency: An event history analysis of all Texas districts 2000- 01 to 2014-15. Educational Administration Quarterly, 15(1). 3-41.
Current Projects in the COEd
– Restricted and unrestricted data sets: National Center of Education Statistics under an IES/NCES license and NSF – Answers questions about gender differences in labor market outcomes; employment and earnings of international STEM graduates of U.S. universities; transfer student success
Current Projects in the COEd
– Visual Accent Trainer – Used large language samples to look for “statistically normal language” to better identify abnormal language and language disorders – Predictive modeling (subcategory of learning analytics) within a virtual educational system that may impact educational policy and instructional strategies
Current Projects in the COEd
Education
Cost/Benefit Analysis Relationship to healthy communities Link between zip code and education
School violence
THE UNIVERSITY OF TEXAS AT ARLINGTON
DATA-DRIVEN DISCOVERY SYMPOSIUM
Gautam Das, PhD
Professor, Department of Computer Science & Engineering
Analysis Increased availability of data yield accurate analysis in health care to genomics, business to physics Decision Making Insights from big data leads to confident decision making Efficiency Better decision leads to greater efficiencies, cost reductions, and reduced risks
Data Driven Discovery: The exponential growth & availability of big data presents opportunities to fuel decisions at every level of society
management, analysis
Complex Fast Changing Heterogeneous High Volume
Dat a C o llec t io n C leaning & St o r age M anagem ent & Q uer y Analysis & Applic at io ns
Big Data Storage and Management Data collection technologies such as sensors & cyber- physical systems (e.g., IoT) Challenges in storing, managing & querying big data
High-performance computing systems that rely
distributed and multi-core systems
Cloud computing platforms
Chris Ding CSE Yan Wan EEE Chengkai Li CSE Ming Li CSE
Data collection technologies - sensors & cyber- physical systems (e.g., IoT)
Hong Jiang CSE Ishfaq Ahmad CSE Jia Rao CSE High-performance computing systems that rely on fast networks & distributed and multi-core systems
Cloud computing platforms
Mohammad Islam CSE William Beksi CSE Jia Rao CSE
Scaling of existing data mining and machine learning algorithms in big data environments (E.g., scalable clustering and classification) Develop new machine learning algorithms to take advantage of advances in hardware, sophisticated algorithms and scalable software framework (E.g., deep learning)
Advances in Big Data Mining and Machine Learning
Sharma Chakravarthy CSE Leonidas Fegaras CSE Song Jiang CSE Hao Che CSE Ishfaq Ahmad CSE Mohammad Islam CSE Jia Rao CSE
Scaling existing data mining and machine learning algorithms in big data environments
Gautam Das CSE Sharma Chakravarthy CSE Won Hwa Kim CSE Vassilis Athitsos CSE Jia Rao CSE Junzhou Huang CSE Fillia Makedon CSE Ramez Elmasri CSE
Develop machine learning algorithms to take advantage of advances in hardware and scalable software framework
Computer vision Natural language processing Autonomous cars Robotics Genomics Law Business Healthcare Engineering Physical and Social Sciences
Andrew Makeev MAE Yan Wan EEE Stephen Mattingly Civil Eng Shouyi Wang Industrial Eng Seyedali Abolmaali Civil Eng Anand Puppala Civil Eng Jay Rosenberger Industrial Eng Kate Hyun Civil Eng
Engineering
Gautam Das CSE Shouyi Wang Industrial Eng Won Hwa Kim CSE Ishfaq Ahmad CSE Dajiang Zhu CSE Chris Ding CSE Jean Gao CSE Junzhou Huang CSE Fillia Makedon CSE Vassilis Athitsos CSE
Healthcare
Andrew Makeev MAE Vassilis Athitsos CSE Won Hwa Kim CSE William Beksi CSE Fillia Makedon CSE William Beksi CSE Mohammad Islam CSE Ming Li CSE Sridhar Nerur Information Systems Fillia Makedon CSE Yan Wan EEE Kaushik De Physics
Robotics
Computer Vision Business Physics and Social Sciences
Jean Gao CSE Chengkai Li CSE Deokgun Park CSE Shirin Nilizadeh CSE Ming Li CSE Jiang Ming CSE Mohammad Islam CSE Jeff Lei CSE Jay Rosenberger Industrial Eng
NLP
Security Genomics
49
Director: Gautam Das
Current Industry Collaborations
Members and Collaborators
including several industrial collaborators
Research Focus
Mission
as external collaborators to engage on BDA issues
understand their BDA needs, and partner with them to develop new research projects and educational programs
technologies into the real world by encouraging entrepreneurship and partnerships with industry and
THE UNIVERSITY OF TEXAS AT ARLINGTON
DATA-DRIVEN DISCOVERY SYMPOSIUM
Kaushik De, PhD
Professor, Department of Physics and Director, High-Energy Physics Center of Excellence
Worlds largest experimental apparatus located 100m underground at the Large Hadron Collider (LHC) LHC is a high energy proton smasher near lake Geneva, in a 27 km long tunnel snaking under France and Switzerland
UTA in ATLAS since 1995
Continuing study of Higgs; search for dark matter, study Standard Model … till >2035 Study the fundamental properties
Particles and forces Around 750 journal publications
Parts were built by students at UTA
Computing centers are distributed worldwide ~150 clusters Heterogeneous Independently maintained Provides ~300k CPU cores on average,
~350 PB storage Worldwide LHC Computing Grid (WLCG) Dozens of applications Dozens of workflows Worldwide community
Infrastructure and Software used globally Driven by innovations
– PanDA software was started by UTA and BNL a dozen years ago – Allows us to use computers at data centers around the world to solve data driven science problems – Pre-cursor to cloud computing – Cutting edge of US innovations
Total Per Month
storage and processing
understand the data
Future challenges – LHC data growing by factors of 5-10 in the next decade. Need new innovations.
THE UNIVERSITY OF TEXAS AT ARLINGTON
DATA-DRIVEN DISCOVERY SYMPOSIUM