UW eScience Institute Initiatives
Cecilia Aragon University of Washington Seattle, WA, USA aragon@uw.edu
(slides courtesy Bill Howe, Anissa Tanweer, Carole Goble)
Dagstuhl EAS plenary talk, Jun 23, 2016
UW eScience Institute Initiatives Cecilia Aragon University of - - PowerPoint PPT Presentation
UW eScience Institute Initiatives Cecilia Aragon University of Washington Seattle, WA, USA aragon@uw.edu (slides courtesy Bill Howe, Anissa Tanweer, Carole Goble) Dagstuhl EAS plenary talk, Jun 23, 2016 2005-2008 All across our campus,
Cecilia Aragon University of Washington Seattle, WA, USA aragon@uw.edu
(slides courtesy Bill Howe, Anissa Tanweer, Carole Goble)
Dagstuhl EAS plenary talk, Jun 23, 2016
“All across our campus, the process of discovery will increasingly rely on researchers’ ability to extract knowledge from vast amounts of data… In order to remain at the forefront, UW must be a leader in advancing these techniques and technologies, and in making [them] accessible to researchers in the broadest imaginable range of fields.”
2005-2008 In other words:
PDB GenBank UniProt Pfam Spreadsheets, Notebooks Local, Lost High throughput experimental methods Industrial scale Commons based production Publicly data sets Cherry picked results Preserved CATH, SCOP (Protein Structure Classification) ChemSpider
[src: Carole Goble]
How much data do you work with?
Wright 2013
7/8/2016
Bill Howe, UW
5
Data Science Kickoff Session: 137 posters from 30+ departments and units
6
PIs on Moore/Sloan effort + eScience Institute Steering Committee + UW participants in February 7 Data Science poster session
Broad collaborations
7 Impact Graphic by Ray Hong and eScience Institute, UW
Recruited / recruiting data scientists
emphasis on taking responsibility for core activities (e.g., incubator projects)
Recruited / recruiting research scientists
with emphasis on specific science goals
Designated 33 faculty and staff as Data Science Fellows
Recruited 6 “Provost’s Initiative” faculty members
science methodology and to applying it at the forefront of a specific field
Mathematics, Statistics + Computer Science & Engineering
Recruited 2 cohorts of 6 Data Science Postdoctoral Fellows
UW flagship activity: Establish two new roles on campus: “Data Science Fellows” and “Data Scientists”
Science & Engineering, Genome Sciences, Oceanography, and Statistics
UW flagship activity: Establish new graduate program tracks in data science
“transcriptable option”
– Multiple Software Carpentry Bootcamps (Python, R, etc.) – AstroData Hack Week – Many others
Interdisciplinary
Biostats, iSchool, Applied Math)
Innovative
science
science and society, ‘big data’ user experience, visualization
Designed for working professionals
“Incubator” program
to provide the labor; you provide the guidance
commitments
Data Scientists
UW flagship activity: Establish an “incubator” seed grant program
Research Computing Team, Network Design & Architecture Team
Consulting Service
UW campus-wide monthly meetings May 2014 national workshop
Research Center, Allen Institute for Brain Science, Sage Bionetworks, Google, …
Draft guidelines for reproducible research Weekly tutorials on “research hygiene” topics
UW flagship activity: Establish a campus-wide community around reproducible research
Washington Research Foundation Data Science Studio
UW flagship activity: Establish a “Data Science Studio”
Ethnography and evaluation integrated into a wide range of Data Science Environment activities
with participants from grad students through faculty)
Developed ethnography research questions
forms of social interaction and organization, intellectual groupings, career reward structures, collaborative tool use in scientific workflows, data science values and ethics, etc.
Established baseline for evaluation, and determined evaluation questions
UW flagship activity: Establish a research program in “the data science of data science”
Qualitative field-based technique originally from anthropology
sociotechnical system
ecological validity
Ethnographers immerse themselves in a community to discern
Ethnography involves
artifacts from the field
Ethnographic insights emerge as patterns and themes are detected Ethnographers work with members of community to interpret observations Analysis
“Applied ethnography”
what doesn’t
Duration of Engagement # of engagements go to them come to us
Office Hours
2015-present 50+ annually
Door-to-Door; Lab Visits
2011-present 25-30 annually
Incubator; DSSG
2014-present 1-2 annually
Embedded
2010-present 0-2 annually
Joint Research
2010-present 0-2 annually per FTE:
ß
Goal: Identify high-impact data- intensive science projects that will benefit from quarter-long sprints of expertise Protocol: ~ 1-2-page proposals, in- studio collaboration two days per week Best projects: “I have the questions, I have the data, I need help getting the answers”
among cohort beyond 1: 1 interactions
eScience FTE
ensures progress (and an exit strategy)
today, so I can’t go do XXXX”
Spring 2014, Fall 2014, Winter 2016
http://data.uw.edu/incubator/
“I talked with Alicia a bit yesterday, and she showed me that her earthquake-repeater- searching implementation is more general, and more powerful than I had thought, and closer to trial by others (and I have a particular use in mind in the ongoing iMUSH experiment on Mount St Helens)<snip> “So I'm encouraging her to continue to work on it a day per week or so for the foreseeable future, assuming you have the facilities to continue the incubation.”
The project outlives the incubator…… Publications in the works on both the software and the science – from three months of half-time work
24
Assessing Community Well-Being
Third-Place Technologies
Optimization of King County Metro Paratransit
Computer Science & Engineering
Predictors of Permanent Housing for Homeless Families
Bill and Melinda Gates Foundation
Open Sidewalk Graph for Accessible Trip Planning
Computer Science & Engineering
Inaugural 2015 program: 16 spots 140 applicants …from 20+ departments
Predictors of Permanent Housing for Homeless Families
Project Leads: Neil Roche & Anjana Sundaram, Gates Foundation DSSG Fellows: Joan Wang, Jason Portenoy, Fabliha Ibnat, Chris Suberlak ALVA High School Students: Cameron Holt, Xilalit Sanchez eScience Data Scientist Mentors: Ariel Rokem, Bryna Hazelton
When homeless families engage in services and programs, what factors are most likely to lead to a successful exit?
The DSSG team
‘families’ and to identify ‘episodes’ of homelessness including back-to-back,
individual programs
and analyze the ways families transition between programs
The Gates Foundation, together with Building Changes have partnered with King, Pierce and Snohomish counties to make homelessness in these counties rare, brief and one-time.
Conduct analysis to identify predictors of permanent housing
Correlation with successful outcome, by family characteristics Correlation with successful outcome, by homelessness program
Emergency Shelter use tends to be associated with unsuccessful
Homelessness Prevention programs more strongly associated with positive
transitional housing Substance abuse strongly associated with unsuccessful outcomes Parent employment strongest predictor of successful outcomes
Open Sidewalks – Sidewalk maps for low-mobility citizens
Project Leads: Nick Bolten, Anat Caspi – Taskar Center, CSE DSSG Fellows: Amir Amini, Yun Hao, Vaishnavi Ravichandran, Andre Stephens ALVA High School Students: Nick Krasnoselsky, Doris Layman eScience Data Scientist Mentors: Anthony Arendt, Jake Vanderplas
“ 30 million Americans over 15
years old experience limited mobility, including difficulty walking, climbing stairs, using wheelchairs, crutches, walkers” while 24
million more persons experience difficulty walking a quarter mile”
|Picture: US Federal Highway administration http://www.fhwa.dot.gov/environment/bicycle_pedestrian/publications/sidewalk2/sidewalks204.cfm
Automated cleaning of sidewalk data through computational geometry
powered by data from: SDOT/Socrata Google API
Step Runtime Solved (All) Percent Connecting T-Gaps ~3.9s 3,837 (4,352) 88.2 Intersection Cleaning ~23.6s 38,844 (44,700) 86.9 Polygon Cleaning ~10min 7,283 (8,035) 90.6 Subgraphs ~23.2s 39,913 (45,265) 88.1
"Impediment to insight to innovation: understanding data assemblages through the breakdown-repair process." Anissa Tanweer, Brittany Fiore-Gartland, Cecilia Aragon. Information, Communication & Society, March 2016
Staff
Program
32
Cecilia Aragon
eScience Institute University of Washington Seattle, WA, USA aragon@uw.edu escience.uw.edu data.uw.edu faculty.uw.edu/ aragon depts.washington.edu/ hdsl/