CLASSIEFIER: USING MACHINE LEARNING TO PAINT A PICTURE OF SOCIAL TRENDS
- Dr. Paola Oliva-Altamirano, Innovation Lab, Our
Community, May 2019
CLASSIEFIER: USING MACHINE LEARNING TO PAINT A PICTURE OF SOCIAL - - PowerPoint PPT Presentation
Dr. Paola Oliva-Altamirano, Innovation Lab, Our Community, May 2019 CLASSIEFIER: USING MACHINE LEARNING TO PAINT A PICTURE OF SOCIAL TRENDS A foreigner Who am I? From Honduras to the US to Australia From Galaxies to Taxonomies Dr. Paola
CLASSIEFIER: USING MACHINE LEARNING TO PAINT A PICTURE OF SOCIAL TRENDS
Community, May 2019
Our Community - Innovation Lab 2
Our Community, May 2019
Our Community - Innovation Lab 3
Is a social enterprise and B Corp that provides advice, connections, training and easy-to-use tech tools for community-builders.
Our Community - Innovation Lab 4
Donation Platform Grants database Training and networking Software for grants applications
Our Community - Innovation Lab 5
Our Community - Innovation Lab 7
Australia lacked a unified taxonomy to classify subjects, beneficiaries and
In 2016, OC introduced CLASSIE The classification system for Australian social sector initiatives and entities CLASSIE opens the door to standard classification
Our Community - Innovation Lab 8
A social sector dictionary Where is the money going? and How is the Australian social sector working?
Social Sciences Anthropology Archeology Biological anthropology Interdisciplinary studies Ethnic studies Indigenous studies Asian studies Sport and recreation Community recreation Parks Camps Sport Outdoor sport Mountain and rock climbing Hiking and walking Paralympics
Level 1
17 categories
Level 4
243 categories
Level 3
492 categories
Level 2
132 categories
Our Community - Innovation Lab 9
choosing the correct category?
800,000 grant applications since 2010 Now we have the dictionary – How do we apply it?
Our Community - Innovation Lab 10
Our Community - Innovation Lab 11
Source: “One model to rule them all” by Christoph Molnar
Our Community - Innovation Lab 14
1. To give automatic suggestions to grant applicants 2. To classify historical data
Seems like you are applying for:
q Sports and recreation q Art and culture q Community and development
Our Community - Innovation Lab
15
At least 2000 applications per category
What do we have?
Our Community - Innovation Lab 17
grant applications
grant applications labeled by users since CLASSIE went live
a simple keyword matching to
extract more labels
Keyword matching = the process of searching for ‘Literal’ matches (e.g. “hospital”) in a given piece of text (e.g. a grant description) to identify groups or subjects (e.g. health sector).
Stages:
We found that:
Example:
This project will raise awareness and empower deaf deaf people by providing key mental health information in their primary language (Australian Sign Language Sign Language). People with hearing impediment People with hearing impediment.
Our Community - Innovation Lab 18
For example: “orphans” is a confusing category. “wildlife welfare” is a straight forward category
Our Community - Innovation Lab 19
DIFFICULTY #1: Multilabel
Training the Machine Learning model
Training dataset:
grant applications Classified by keyword matching
DIFFICULTY #2: Hierarchy DIFFICULTY #3: Number of labels per category
Our Community - Innovation Lab 20
Example: A grant application that is aimed at helping teenagers teenagers with autism autism.
Beneficiaries:
And also,
Multilabels and Hierarchy
Our Community - Innovation Lab 21
Than the 2000 minimum required
DIFFICULTY #3: Number of labels per category Niche classification or “black holes”
Reads the application
Classification Level 1 – Machine learning
Sports and recreation Classification Level 2: We have enough labels we use another ML model Classification Level 3: Keyword matching Information and communications Classification Level 2: we do not have enough labels we use keyword matching Classification Level 3: Keyword matching
Our Community - Innovation Lab 22
Our Community - Innovation Lab 23
Stages:
checking for biases
Our Community - Innovation Lab 24
Recall:
!" !"#$% &'(&)*+&,' ,- .&//&'0 1*213/
Precision:
!" !"#$" &'(&)*+&,' ,- 2*( 453(&)+&,'/
Our Community - Innovation Lab 25
Based on the fact that each application has several categories
0 None 1 <45% 2 >45% 3 Perfect match
0 All 1 >55% 2 <55% 3 None – Perfect match
Useless Model Perfect Model!! CLASSIEfier ~4-5
minorities that are already overlooked
Our Community - Innovation Lab 26
“The best minds of my generation are thinking about how to make people click ads,” he says. “That sucks.”
(Cloudera and Facebook data leader)
that is already biased or with insufficient data - The algorithm will predict biased classifications.
Our Community - Innovation Lab 28 Sport people
Our Community - Innovation Lab 29
xkdc.com/1838/
Our Community - Innovation Lab 30
SHAP (SHapley Additive exPlanations) WEAT tests proposed in Caliskan et al. 2017
AI Fairness 360
Our Community - Innovation Lab 31
It is not feasible to classify human natural languages with 100% accuracy
Our Community - Innovation Lab 32
Church Religion Christian Model = Religion Reality – A fete in a Catholic school
Our Community - Innovation Lab 33
Church Religion Christian Out 200 applications classified by Users we found that:
Our Community - Innovation Lab 34
Church Religion Christian
Grant applications 85% accuracy
Grant applications 75% accuracy
Our Community - Innovation Lab 35
Church Religion Christian
Seems like you are applying for:
q Sports and recreation q Art and culture q Community and development
Data preprocessing Writing and testing the algorithm Production – back and front end product
Maintenance
Our Community - Innovation Lab 36
Linkedin: paola-oliva-altamirano Email: paolao@ourcommunity.com.au Innovation lab: https://www.ourcommunity.com.au/innovationlab