Welcome Data-Driven Discovery Teik C. Lim, PhD Provost and Vice - - PowerPoint PPT Presentation

welcome data driven discovery
SMART_READER_LITE
LIVE PREVIEW

Welcome Data-Driven Discovery Teik C. Lim, PhD Provost and Vice - - PowerPoint PPT Presentation

THE UNIVERSITY OF TEXAS AT ARLINGTON DATA-DRIVEN DISCOVERY Welcome Data-Driven Discovery Teik C. Lim, PhD Provost and Vice President for Academic Affairs D ATA - D R I V E N D I S C O V E RY Completed/Underway (select cases): New


slide-1
SLIDE 1

Welcome

THE UNIVERSITY OF TEXAS AT ARLINGTON

DATA-DRIVEN DISCOVERY

slide-2
SLIDE 2

Data-Driven Discovery

Teik C. Lim, PhD

Provost and Vice President for Academic Affairs

slide-3
SLIDE 3

Completed/Underway (select cases):

  • New degrees/certs in data science & analytics in CoB, CoE and CoS to meet workforce needs
  • DataCAVE – data analytics lab in Library, for software access, collaboration and training needs
  • Mavs Dataverse – repository for citable research data, to increase reach/impact of UTA research
  • Big Data Analytics Research Center housed in CoE

Planning (select items):

  • Cybersecurity in human and societal dimension
  • Data Science Clinic
  • Industry-university collaboration in SMART megacity analytics
  • Interdisciplinary thrust in digital humanities
  • Artificial intelligence and machine learning

D ATA - D R I V E N D I S C O V E RY

2

slide-4
SLIDE 4

NEW T/TT FACULTY

(Since Fall ‘17)

CAPPA Jiwon Suh COED

Catherine Robert Daniel Robinson

COB

Wayne Crawford Yun Fan Alison Hall Alper Nakkas David Rosser

  • Jayarajan Samuel
  • Mahyar Sharif

Vaghefi

COE

William Beksi Ye Cao Animesh Chakravarthy Kyung "Kate" Hyun Mohammad Islam Chen Kan Won Hwa Kim Caroline Krejci Ming “Kate” Li Shirin Nilizadeh Deok Gun Park Samantha Sabatino Dajiang Zhu

COS

Yujie Chi Souvik Roy Amir Shahmoradi Leili Shahriyari Li Wang

  • Daniel Welling

COLA

Cynthia Laborde Seungmug “Zech” Lee Joshua Wilson

SSW

Ryon Cobb Genevieve Graaf

CONHI

Ziyad Ben Taleb Jing Wang

D ATA - D R I V E N D I S C O V E RY

3

slide-5
SLIDE 5

POSSIBLE NEW/ ENHANCED ACADEMIC PROGRAMS OR UNITS

COB

Business Analytics COE

Data Science & Analytics SMART Cities Bioinformatics Cybersecurity

CAPPA SMART Cities

CONHI

Health Care Informatics Healthcare simulation Bioinformatics Biotech

COS

Data Science and Analytics Bioinformatics

4

D ATA - D R I V E N D I S C O V E RY

COLA Digital Humanities Media/Digital Comm.

slide-6
SLIDE 6

THE UNIVERSITY OF TEXAS AT ARLINGTON

DATA-DRIVEN DISCOVERY SYMPOSIUM

slide-7
SLIDE 7

Data-Driven Discovery

Harry Dombroski

Dean, College of Business

slide-8
SLIDE 8

The CoB is positioning itself to be a leader in Business Analytics and is committed to:

  • Preparing students for a successful career in business

analytics

  • Pursuing research excellence in data driven discovery
  • Engaging industry partners

Data Driven Discovery

slide-9
SLIDE 9

Graduate programs:

  • MS Business Analytics
  • MS Economic Data

Analytics MS in Marketing Research

Undergraduate Programs:

  • BS Business Analytics (Fall ‘19)
  • Undergraduate minors in various

business disciplines

Data Driven Discovery - Programs

slide-10
SLIDE 10
  • Offers technical and business perspective of data
  • “Hands on” experience for students, allowing them to

apply advanced technical skills and solving real world business problems

  • Excellent growth prospects
  • Recent student placements - Amazon, Capsule8, Deloitte,

GM Financial, PwC, Pier 1 Imports MS Business Analytics

slide-11
SLIDE 11
  • Provides students with quantitative skills that allow them

to formulate pricing strategies, cost management, process flow, demand forecasting and customer acquisition valuation

  • Recent student placements- Cottonwood Financial, Mary

Kay, GM Financial, 2M Research, Targetbase, Buxton MS Economic Data Analytics

slide-12
SLIDE 12
  • Hands-on program designed to prepare students for

careers in marketing research

  • Students meld logic with creativity, quantitative data with

qualitative insights, and intelligence to solve marketing problems and create business opportunities

  • Recent student internships - Enterprise, Northwestern

Mutual, Hilti, State Farm, Vivint and Sellmark MS in Marketing Research

slide-13
SLIDE 13

Undergraduate Marketing students analyzed customer feedback data and recommended a new marketing strategy for a local popcorn business. Student Analytics Research in Action

slide-14
SLIDE 14

4th Annual Analytics Symposium

  • Held on March 29th 2019, attracted 138 attendees,

including 80 professionals from 30 companies

  • In conjunction with the symposium, 38 student teams from

UTA, UTD, UNT, and SMU participated in an industry sponsored Student Analytics Competition Engaging the Business Community

slide-15
SLIDE 15
  • Custom Certificate Program developed in cooperation with

a Global Fortune 500 company

  • Designed to reskill executives and promote a data driven

decision making culture Executive Certificate in Business Analytics

slide-16
SLIDE 16

Sample of Research topics:

  • Twitter data analysis - understanding health related life

style behavior

  • Performance of crowd funding projects
  • Using deep learning to predict supply chain performance
  • Understanding the human brain engaged in economic

decision making Faculty Analytics Research Excellence

slide-17
SLIDE 17

Sample of Cooperative Data Driven Discovery between Colleges:

  • At UTA: Several MS BA courses are taken by CSE students as electives,

COB faculty collaborate with other college faculty on various projects

  • Researchers from other colleges can use the Bloomberg terminals in

the new Financial Market Labs

  • Researchers can use the financial markets databases that COB/UTA

subscribe to such as Compustat, CRSP, and Option Metrics

Working Together

slide-18
SLIDE 18

THE UNIVERSITY OF TEXAS AT ARLINGTON

DATA-DRIVEN DISCOVERY SYMPOSIUM

slide-19
SLIDE 19

Big Data in Education

Teresa Taber Doughty, PhD Dean, College of Education

slide-20
SLIDE 20

Velocity Volume Variety

How fast data are changing Amount of available data Different kinds and sources of data What are the characteristics of Big Data?

Gartner, 2012

slide-21
SLIDE 21

Meaningful Insight

Students Instructors Programs Institutions

Real-Time Data

Dashboards Statistical Analysis Machine Learning Data Modeling Data Mining

Transforming Education

Grades Outcomes Evidence Continuous Learning

Personalized Learning

What does Big Data mean in Education?

From: Desire2Learn

slide-22
SLIDE 22

Higher Education

Student Retention Degree Completion Time to Degree

  • First year attrition rates exceed 25%
  • Some states reach 40%
  • Only 1 in 2 students ever complete a

degree

  • 75% students are non-traditional
  • 40% students are not

academically prepared

  • 40% are part time
  • 60% FT students complete 4-yr

Bachelor’s within 8 years

  • 24% PT students complete

bachelor’s in 8 years

  • 20% take more courses than

needed

From: Complete College America, Time is the Enemy Summary

slide-23
SLIDE 23
  • personalized instruction
  • responsive formative assessment
  • actively engaged pedagogy
  • collaborative learning

How Are Big Data Sets used in Education?

slide-24
SLIDE 24
slide-25
SLIDE 25
  • How effective are standardized annual K-12 state assessments in

predicting future outcomes (graduation rates, post-school employment, success in STEM fields, etc…)?

  • What are the employment rates of individuals with disabilities who

complete high school? A 4-year degree? Graduate degree? How does type of disability impact employment rates?

  • Is there a time-to-degree difference for students who complete a

traditional, F2F program versus a blended or online post-secondary program?

Questions we might answer in Education using Big Data:

slide-26
SLIDE 26
  • Dr. Bradley Davis

– Examining big data set related to longitudinal data relating to Texas public schools – https://www.utdallas.edu/research/tsp-erc/data- holdings.html – Davis, B.W., & Bowers, A.J. (2018). Stepping stones and pathways from school district leadership certification to the superintendency: An event history analysis of all Texas districts 2000- 01 to 2014-15. Educational Administration Quarterly, 15(1). 3-41.

Current Projects in the COEd

slide-27
SLIDE 27
slide-28
SLIDE 28
  • Dr. Maria Trache

– Restricted and unrestricted data sets: National Center of Education Statistics under an IES/NCES license and NSF – Answers questions about gender differences in labor market outcomes; employment and earnings of international STEM graduates of U.S. universities; transfer student success

Current Projects in the COEd

slide-29
SLIDE 29
  • Dr. Jodi Tommerdahl

– Visual Accent Trainer – Used large language samples to look for “statistically normal language” to better identify abnormal language and language disorders – Predictive modeling (subcategory of learning analytics) within a virtual educational system that may impact educational policy and instructional strategies

Current Projects in the COEd

slide-30
SLIDE 30
slide-31
SLIDE 31

Potential Collaborations

Education

Cost/Benefit Analysis Relationship to healthy communities Link between zip code and education

  • utcomes

School violence

slide-32
SLIDE 32

THE UNIVERSITY OF TEXAS AT ARLINGTON

DATA-DRIVEN DISCOVERY SYMPOSIUM

slide-33
SLIDE 33

Big Data Research - Opportunities and Challenges

Gautam Das, PhD

Professor, Department of Computer Science & Engineering

slide-34
SLIDE 34

Big Data - Opportunities

Analysis Increased availability of data yield accurate analysis in health care to genomics, business to physics Decision Making Insights from big data leads to confident decision making Efficiency Better decision leads to greater efficiencies, cost reductions, and reduced risks

Data Driven Discovery: The exponential growth & availability of big data presents opportunities to fuel decisions at every level of society

slide-35
SLIDE 35

Big Data - Challenges

  • Challenges in data collection, cleaning, storage,

management, analysis

Complex Fast Changing Heterogeneous High Volume

slide-36
SLIDE 36

The Big Data Science Pipeline

Dat a C o llec t io n C leaning & St o r age M anagem ent & Q uer y Analysis & Applic at io ns

slide-37
SLIDE 37

Core Big Data Research Opportunities

Big Data Storage and Management Data collection technologies such as sensors & cyber- physical systems (e.g., IoT) Challenges in storing, managing & querying big data

High-performance computing systems that rely

  • n fast networks &

distributed and multi-core systems

Cloud computing platforms

slide-38
SLIDE 38

Big Data Storage and Management

Chris Ding CSE Yan Wan EEE Chengkai Li CSE Ming Li CSE

Data collection technologies - sensors & cyber- physical systems (e.g., IoT)

slide-39
SLIDE 39

Hong Jiang CSE Ishfaq Ahmad CSE Jia Rao CSE High-performance computing systems that rely on fast networks & distributed and multi-core systems

Big Data Storage and Management

slide-40
SLIDE 40

Cloud computing platforms

Mohammad Islam CSE William Beksi CSE Jia Rao CSE

Big Data Storage and Management

slide-41
SLIDE 41

Scaling of existing data mining and machine learning algorithms in big data environments (E.g., scalable clustering and classification) Develop new machine learning algorithms to take advantage of advances in hardware, sophisticated algorithms and scalable software framework (E.g., deep learning)

Core Big Data Research Opportunities

Advances in Big Data Mining and Machine Learning

slide-42
SLIDE 42

Sharma Chakravarthy CSE Leonidas Fegaras CSE Song Jiang CSE Hao Che CSE Ishfaq Ahmad CSE Mohammad Islam CSE Jia Rao CSE

Scaling existing data mining and machine learning algorithms in big data environments

Big Data Mining and Machine Learning

slide-43
SLIDE 43

Gautam Das CSE Sharma Chakravarthy CSE Won Hwa Kim CSE Vassilis Athitsos CSE Jia Rao CSE Junzhou Huang CSE Fillia Makedon CSE Ramez Elmasri CSE

Develop machine learning algorithms to take advantage of advances in hardware and scalable software framework

Big Data Mining and Machine Learning

slide-44
SLIDE 44

Big Data Applications Research

Computer vision Natural language processing Autonomous cars Robotics Genomics Law Business Healthcare Engineering Physical and Social Sciences

slide-45
SLIDE 45

Andrew Makeev MAE Yan Wan EEE Stephen Mattingly Civil Eng Shouyi Wang Industrial Eng Seyedali Abolmaali Civil Eng Anand Puppala Civil Eng Jay Rosenberger Industrial Eng Kate Hyun Civil Eng

Engineering

Big Data Applications Research

slide-46
SLIDE 46

Gautam Das CSE Shouyi Wang Industrial Eng Won Hwa Kim CSE Ishfaq Ahmad CSE Dajiang Zhu CSE Chris Ding CSE Jean Gao CSE Junzhou Huang CSE Fillia Makedon CSE Vassilis Athitsos CSE

Healthcare

Big Data Applications Research

slide-47
SLIDE 47

Andrew Makeev MAE Vassilis Athitsos CSE Won Hwa Kim CSE William Beksi CSE Fillia Makedon CSE William Beksi CSE Mohammad Islam CSE Ming Li CSE Sridhar Nerur Information Systems Fillia Makedon CSE Yan Wan EEE Kaushik De Physics

Robotics

Big Data Applications Research

Computer Vision Business Physics and Social Sciences

slide-48
SLIDE 48

Jean Gao CSE Chengkai Li CSE Deokgun Park CSE Shirin Nilizadeh CSE Ming Li CSE Jiang Ming CSE Mohammad Islam CSE Jeff Lei CSE Jay Rosenberger Industrial Eng

NLP

Big Data Applications Research

Security Genomics

slide-49
SLIDE 49

49

Proposal for BigDAC: Big Data Analytics (BDA) Center

Director: Gautam Das

Current Industry Collaborations

Members and Collaborators

  • 15 UTA faculty, 11 external researchers

including several industrial collaborators

Research Focus

  • Big Data Storage and Management
  • Big Data Mining and Machine Learning
  • Big Data Analytics Applications

Mission

  • Develop “grand-challenge ”research projects in BDA
  • Deliver world-class research via papers, patents and products
  • Serve as a hub for UTA faculty, researchers, students as well

as external collaborators to engage on BDA issues

  • Reach out to industry, government, and nonprofits to

understand their BDA needs, and partner with them to develop new research projects and educational programs

  • Create innovative BDA technologies and help to transfer such

technologies into the real world by encouraging entrepreneurship and partnerships with industry and

  • rganizations
slide-50
SLIDE 50

THE UNIVERSITY OF TEXAS AT ARLINGTON

DATA-DRIVEN DISCOVERY SYMPOSIUM

slide-51
SLIDE 51

Data-Driven Physics Discovery

Kaushik De, PhD

Professor, Department of Physics and Director, High-Energy Physics Center of Excellence

slide-52
SLIDE 52

Introduction

  • UTA High Energy Physics at the forefront
  • f data driven discoveries for 20 years

– Scientific data drives scale and complexity – Computational advances drive discoveries – Infrastructure and software equally important

  • A look at the past, present, and future
slide-53
SLIDE 53

The ATLAS Experiment

Worlds largest experimental apparatus located 100m underground at the Large Hadron Collider (LHC) LHC is a high energy proton smasher near lake Geneva, in a 27 km long tunnel snaking under France and Switzerland

UTA in ATLAS since 1995

slide-54
SLIDE 54

Why build ATLAS?

Continuing study of Higgs; search for dark matter, study Standard Model … till >2035 Study the fundamental properties

  • f our universe

Particles and forces Around 750 journal publications

slide-55
SLIDE 55
  • The largest “camera” ever built
  • And the fastest – few hundred million pictures every second
  • Each picture contains ten million bits of complex information
  • Scientific data driven discoveries – scale and complexity

ATLAS.CH

Parts were built by students at UTA

slide-56
SLIDE 56

ATLAS Computing Ecosystem

Computing centers are distributed worldwide ~150 clusters Heterogeneous Independently maintained Provides ~300k CPU cores on average,

  • ne million burst

~350 PB storage Worldwide LHC Computing Grid (WLCG) Dozens of applications Dozens of workflows Worldwide community

  • f thousands
  • f users

Infrastructure and Software used globally Driven by innovations

slide-57
SLIDE 57

Inventing PanDA

– PanDA software was started by UTA and BNL a dozen years ago – Allows us to use computers at data centers around the world to solve data driven science problems – Pre-cursor to cloud computing – Cutting edge of US innovations

slide-58
SLIDE 58

PanDA Scale – Jobs Completed

Total Per Month

slide-59
SLIDE 59
  • Truly worldwide data movement,

storage and processing

  • And massive simulations to

understand the data

slide-60
SLIDE 60

UTA Leading the Way

  • LHC computing center at UTA
  • ~12,000 CPU cores
  • ~7,000 TB disk space
  • Global computing center for ATLAS
  • PanDA, AI, deep learning…

Future challenges – LHC data growing by factors of 5-10 in the next decade. Need new innovations.

slide-61
SLIDE 61

THE UNIVERSITY OF TEXAS AT ARLINGTON

DATA-DRIVEN DISCOVERY SYMPOSIUM

slide-62
SLIDE 62
  • Ishfaq Ahmad
  • David Arditi
  • William Beksi
  • Sharma Chakravarthy
  • Morgan Chivers
  • Rebekah Chojnacki
  • Muhammad Huda
  • Ashley Lemke
  • Hanli Liu
  • Peace Ossom Williamson
  • Kenneth Roemer
  • Leili Shahriyari
  • Jianzhong Su
  • Charles Travis
  • Ahoura Zandiatashbar

Registrant Speakers