CLASSIEFIER: USING MACHINE LEARNING TO PAINT A PICTURE OF SOCIAL - - PowerPoint PPT Presentation

classiefier using machine learning to paint a picture of
SMART_READER_LITE
LIVE PREVIEW

CLASSIEFIER: USING MACHINE LEARNING TO PAINT A PICTURE OF SOCIAL - - PowerPoint PPT Presentation

Dr. Paola Oliva-Altamirano, Innovation Lab, Our Community, May 2019 CLASSIEFIER: USING MACHINE LEARNING TO PAINT A PICTURE OF SOCIAL TRENDS A foreigner Who am I? From Honduras to the US to Australia From Galaxies to Taxonomies Dr. Paola


slide-1
SLIDE 1

CLASSIEFIER: USING MACHINE LEARNING TO PAINT A PICTURE OF SOCIAL TRENDS

  • Dr. Paola Oliva-Altamirano, Innovation Lab, Our

Community, May 2019

slide-2
SLIDE 2

Who am I?

Our Community - Innovation Lab 2

A foreigner

From Honduras to the US to Australia From Galaxies to Taxonomies

  • Dr. Paola Oliva-Altamirano, Innovation Lab,

Our Community, May 2019

slide-3
SLIDE 3

Outline:

  • Introducing Our community’s data initiatives
  • Background: CLASSIE a social dictionary
  • How did we scope CLASSIEfier?
  • How did CLASSIEfier evolve as a project?
  • Data science for social good concept
  • Results and conclusions

Our Community - Innovation Lab 3

slide-4
SLIDE 4

Is a social enterprise and B Corp that provides advice, connections, training and easy-to-use tech tools for community-builders.

Our Community - Innovation Lab 4

Donation Platform Grants database Training and networking Software for grants applications

slide-5
SLIDE 5

Our Community - Innovation Lab 5

slide-6
SLIDE 6

From CLASSIE to CLASSIEfier

slide-7
SLIDE 7

Main objective – Classification of grants

Our Community - Innovation Lab 7

Australia lacked a unified taxonomy to classify subjects, beneficiaries and

  • rganization types

In 2016, OC introduced CLASSIE The classification system for Australian social sector initiatives and entities CLASSIE opens the door to standard classification

slide-8
SLIDE 8

CLASSIE

  • Subjects
  • Populations
  • Organisation type

Our Community - Innovation Lab 8

A social sector dictionary Where is the money going? and How is the Australian social sector working?

slide-9
SLIDE 9

Hierarchical Classification – e.g. Subjects

Social Sciences Anthropology Archeology Biological anthropology Interdisciplinary studies Ethnic studies Indigenous studies Asian studies Sport and recreation Community recreation Parks Camps Sport Outdoor sport Mountain and rock climbing Hiking and walking Paralympics

Level 1

17 categories

Level 4

243 categories

Level 3

492 categories

Level 2

132 categories

Our Community - Innovation Lab 9

slide-10
SLIDE 10

Questions

  • How do we ensure that users are

choosing the correct category?

  • How do we classify historical data?

800,000 grant applications since 2010 Now we have the dictionary – How do we apply it?

Our Community - Innovation Lab 10

slide-11
SLIDE 11

CLASSIEfier is a tool that will automatically classify grants

Our Community - Innovation Lab 11

slide-12
SLIDE 12

How did we scope CLASSIEfier?

slide-13
SLIDE 13

Source: “One model to rule them all” by Christoph Molnar

slide-14
SLIDE 14

CLASSIEfier – Two different models

Our Community - Innovation Lab 14

1. To give automatic suggestions to grant applicants 2. To classify historical data

Seems like you are applying for:

q Sports and recreation q Art and culture q Community and development

slide-15
SLIDE 15

CLASSIEfier: How does it work?

Our Community - Innovation Lab

15

slide-16
SLIDE 16

How did CLASSIEfier evolve?

slide-17
SLIDE 17

CLASSIEfier – The Algorithm How do we generate more labels?

At least 2000 applications per category

What do we have?

Our Community - Innovation Lab 17

800,000

grant applications

4,000

grant applications labeled by users since CLASSIE went live

slide-18
SLIDE 18

First phase:

a simple keyword matching to

extract more labels

Keyword matching = the process of searching for ‘Literal’ matches (e.g. “hospital”) in a given piece of text (e.g. a grant description) to identify groups or subjects (e.g. health sector).

Stages:

  • Identify keywords for CLASSIE
  • Extract applications that exhibit a strong match
  • Score the classification done by Users

We found that:

  • Keyword matching accuracy differs from one category to another.
  • On average is around 80%

Example:

This project will raise awareness and empower deaf deaf people by providing key mental health information in their primary language (Australian Sign Language Sign Language). People with hearing impediment People with hearing impediment.

Our Community - Innovation Lab 18

CLASSIEfier – The Algorithm

For example: “orphans” is a confusing category. “wildlife welfare” is a straight forward category

slide-19
SLIDE 19

Our Community - Innovation Lab 19

DIFFICULTY #1: Multilabel

Second phase:

Training the Machine Learning model

CLASSIEfier – The Algorithm

Training dataset:

128,000

grant applications Classified by keyword matching

DIFFICULTY #2: Hierarchy DIFFICULTY #3: Number of labels per category

slide-20
SLIDE 20

Our Community - Innovation Lab 20

Example: A grant application that is aimed at helping teenagers teenagers with autism autism.

Beneficiaries:

  • “Children and youth” at level 1
  • “Adolescents” at level 2

And also,

  • “People with disabilities” at level 1
  • “People with intellectual disabilities” at level 2

Multilabels and Hierarchy

slide-21
SLIDE 21
  • Categories such as Confucius, North American people, Nomadic

people among others will have less than 100 grant applications.

Our Community - Innovation Lab 21

20X less

Than the 2000 minimum required

DIFFICULTY #3: Number of labels per category Niche classification or “black holes”

slide-22
SLIDE 22

Reads the application

Classification Level 1 – Machine learning

Sports and recreation Classification Level 2: We have enough labels we use another ML model Classification Level 3: Keyword matching Information and communications Classification Level 2: we do not have enough labels we use keyword matching Classification Level 3: Keyword matching

How do we solve it? – Separate training

Our Community - Innovation Lab 22

slide-23
SLIDE 23

Our Community - Innovation Lab 23

Stages:

  • Choose the best model – k-nearest neighbours (k-nn)
  • Choose the best parameters
  • Choose the best scoring

Third phase: Model interpretation: scoring and

checking for biases

CLASSIEfier – The Algorithm

slide-24
SLIDE 24

Scoring

Our Community - Innovation Lab 24

Recall:

!" !"#$% &'(&)*+&,' ,- .&//&'0 1*213/

Precision:

!" !"#$" &'(&)*+&,' ,- 2*( 453(&)+&,'/

slide-25
SLIDE 25

Scoring

Our Community - Innovation Lab 25

Based on the fact that each application has several categories

Recall: How many categories got picked per application

0 None 1 <45% 2 >45% 3 Perfect match

Precision: How many categories are wrong per application

0 All 1 >55% 2 <55% 3 None – Perfect match

0 6

Useless Model Perfect Model!! CLASSIEfier ~4-5

slide-26
SLIDE 26

Misclassifications and black holes will cause to underfund

minorities that are already overlooked

Our Community - Innovation Lab 26

slide-27
SLIDE 27

The Data Science for Social Good Movement

“The best minds of my generation are thinking about how to make people click ads,” he says. “That sucks.”

  • - Jeff Hammerbacher

(Cloudera and Facebook data leader)

slide-28
SLIDE 28

Algorithmic bias

  • This will happen if you feed in the algorithm with data

that is already biased or with insufficient data - The algorithm will predict biased classifications.

  • Algorithms are mirrors

Our Community - Innovation Lab 28 Sport people

slide-29
SLIDE 29

Know your Model!

Our Community - Innovation Lab 29

xkdc.com/1838/

slide-30
SLIDE 30

Our Community - Innovation Lab 30

SHAP (SHapley Additive exPlanations) WEAT tests proposed in Caliskan et al. 2017

AI Fairness 360

slide-31
SLIDE 31

Our Community - Innovation Lab 31

Document everything! – this is how we tackle biases

Choose transparency

slide-32
SLIDE 32

Results and conclusions

It is not feasible to classify human natural languages with 100% accuracy

Our Community - Innovation Lab 32

Church Religion Christian Model = Religion Reality – A fete in a Catholic school

slide-33
SLIDE 33

Results and conclusions

  • CLASSIEfier works similar to humans, not better not worse. ~70-80% accuracy

Our Community - Innovation Lab 33

Church Religion Christian Out 200 applications classified by Users we found that:

63% right 18% wrong 19%

Half right

slide-34
SLIDE 34

Results and conclusions

  • The model is also discriminating between good and bad applications

Our Community - Innovation Lab 34

Church Religion Christian

Approved

Grant applications 85% accuracy

Declined

Grant applications 75% accuracy

slide-35
SLIDE 35

Results and conclusions

CLASSIEfier is now feeding back into CLASSIE

Our Community - Innovation Lab 35

Church Religion Christian

Seems like you are applying for:

q Sports and recreation q Art and culture q Community and development

slide-36
SLIDE 36

CLASSIEfier – More than just an algorithm

Data preprocessing Writing and testing the algorithm Production – back and front end product

Maintenance

Our Community - Innovation Lab 36

slide-37
SLIDE 37

DO YOU WANT TO LEARN MORE?

Linkedin: paola-oliva-altamirano Email: paolao@ourcommunity.com.au Innovation lab: https://www.ourcommunity.com.au/innovationlab