Mining(photo8sharing(websites(to(( study(ecological(phenomena( - - PowerPoint PPT Presentation

mining photo8sharing websites to study ecological
SMART_READER_LITE
LIVE PREVIEW

Mining(photo8sharing(websites(to(( study(ecological(phenomena( - - PowerPoint PPT Presentation

Mining(photo8sharing(websites(to(( study(ecological(phenomena( Haipeng(Zhang,(Mohammed(Korayem,(David(Crandall( School&of&Informa-cs&and&Compu-ng& Indiana&University,&Bloomington,&USA& Gretchen(LeBuhn(


slide-1
SLIDE 1

Haipeng(Zhang,(Mohammed(Korayem,(David(Crandall( School&of&Informa-cs&and&Compu-ng& Indiana&University,&Bloomington,&USA&

Mining(photo8sharing(websites(to(( study(ecological(phenomena(

Gretchen(LeBuhn( Department&of&Biology& San&Francisco&State&University,&San&Francisco,&USA&

slide-2
SLIDE 2

Social&photo&sharing&websites&

100+&billion(photos&& 6+&billion(photos&&

slide-3
SLIDE 3

Snow( Flowers( Cloud( cover( Wildlife( Foliage(

slide-4
SLIDE 4
slide-5
SLIDE 5

Need&for&ecological&data&

&How&is&nature&changing&due&to& global&warming?&

– Plot8based(studies:(FineHgrained& informa-on&but&only&at&a&few& loca-ons,&and&laborHintensive& – Aerial(surveillance:(Con-nentalH scale&informa-on,&but&only& useful&for&some&phenomena&

[IPCC2007]&

slide-6
SLIDE 6

Our&paper&

  • Can&we&observe&nature&by&mining&photo&websites?&
  • We&study&two&phenomena:&snow(and&vegetaHon(cover(

– Es-mate&geoHtemporal&distribu-ons&at&con-nental&scale,& using&~150&million&photos&from&Flickr&(via&public&API)& – Analyze&geoHtags,&-mestamps,&text&tags,&visual&content& – Evaluate&techniques&for&es-ma-on&in&crowdHsourced&data& – Compare&to&data&from&weather&sta-ons&and&satellites&

slide-7
SLIDE 7

Related&Work&

  • CrowdHsourced&observa-onal&data,&e.g.:&

– Es-ma-ng&public&mood&from&Twi]er&[Bollen11]&& – Predic-ng&product&sales&from&Flickr&tags&[Jin10]& – Es-ma-ng&spread&of&flu&from&search&queries&[Ginsberg09]& – Monitoring&forest&fires&from&Twi]er&[DeLongueville09]&

  • VolunteerHbased&ci-zen&science&

&

The Great Sunflower Project

slide-8
SLIDE 8

Challenges&

  • Incorrect&geotags&and&-mestamps&
  • Difficult&to&recognize&image&content

&& automa-cally&

  • Text&tags&helpful&but&noisy&

– Some&tags&are&completely&incorrect,&others&are&misleading&

  • Dataset&biases&

– Many&more&photos&in&ci-es&than&rural&areas& – People&more&likely&to&take&photos&of&the&unusual&

  • Misleading&image&data&

– e.g.&zoos,&ski&slopes,&synthe-c&images,&etc.&

slide-9
SLIDE 9

Combining&evidence&

  • Photos&by&different&people&are&(almost)&independent&
  • bserva-ons,&with&uncorrelated&noise&

#&of&users&tagging&a&photo& with&“snow”& Probability&of&actual&snow&

slide-10
SLIDE 10

A&simple&model&

  • Suppose&we’re&interested&in&some&object&X&(e.g.&snow)&

– Specifically,&whether&X&was&present&at&a&given&-me&and&place& – Let&s&denote&the&event&that&a&given&user&takes&a&picture&of&X# – Assume&s&depends&on&presence&of&X:&

P(s&|&X)&=&probability&of&taking&picture&of&X,&given&X&was&present& & &Could&be&factored&into:&Probability&of&seeing&X,&probability&of& & &taking&photo,&probability&of&uploading&to&Flickr,&…& P(s&|&X)&=&probability&of&taking&picture&of&X,&given&X&was&not&present& & &Bad&-mestamps&or&geotags,&misleading&image&content,&…&

slide-11
SLIDE 11

A&simple&model&

  • Suppose&m&users&took&photos&of&X,&and&n&users&did&not&

– Using&Bayes&law,& – Assuming&each&user&acts&independently&(condi-oned&on&X),&&

&

– High&or&low&ra-o&means&high&or&low&probability&of&X;&& &ra-o&near&1&means&low&confidence&either&way&

slide-12
SLIDE 12

False&posi-ve&rate& True&posi-ve&rate&

Snow&es-ma-on&in&ci-es&

  • Es-mate&daily&snow&cover&(presence&or&absence)&&

– Predict&using&Flickr&photo&tags,&compare&to&ground&truth& from&Na-onal&Weather&Service&historical&data& – Es-mate&parameters&on&2007H2008,&test&on&2009H2010.&&

Tag(set((hand8selected):(( {snow,&snowy,&snowing,&& &&snowstorm}& & Model(parameters(( (esHmated(from(training(data):( P(s|snow)&=&17.12%& P(s|no&snow)&=&0.14%&

slide-13
SLIDE 13

Learning&relevant&tags&

  • Find&tags&that&correlate&well&with&snow&cover&in&GT&

– Feature&vector&for&each&day&is&histogram&of&number&of& people&that&used&each&tag;&labels&are&snow/no&snow&from&GT& – Train&on&2007H2008&data,&test&on&2009H2010&data& – Increases&classifica-on&accuracies&significantly:&

False&posi-ve&rate& True&posi-ve& rate& False&posi-ve&rate& True&posi-ve&rate&

HandHselected&tags& Learned&tags&(via&SVM)&

slide-14
SLIDE 14

Con-nentalHscale&observa-on&

  • Es-mate&snow&cover&on&each&day&at&

each&place&in&North&America&

– For&each&geographic&bin&of&size&1°&x&1°& – Use&ground&truth&data&from&Terra& satellite&

Snow&cover& (green)& No&snow&cover& (blue)&& Missing&data& and&cloud&cover& (black)&

NASA&Terra&

slide-15
SLIDE 15

Satellite(map((1(degree(geo(bins)( Map(esHmated(by(Flickr(photo(analysis(

Dec(21,(2009& No&snow&cover& (blue)&& Snow&cover& (green)& Missing& data& (black/ gray)& Dec(21,(2009&

slide-16
SLIDE 16

Con-nentalHscale&es-ma-on&

  • Predict&presence&of&snow&on&each&day&for&each&geo&bin&

– ~35&million&total&decisions&

Recall& Precision& Learned&tags&

slide-17
SLIDE 17

Visual&features&

Gradient(magnitude(

  • Color&and&texture&features&similar&to&GIST&[Torralba03]&

– Divide&image&into&array&of&4x4&cells;&in&each&cell&compute& mean&color&value&(in&CIELab&space)&and&mean&gradient&energy& &

Color(channels( Image(

slide-18
SLIDE 18

Visual&features&

  • Color&and&texture&features&similar&to&GIST&[Torralba03]&

– Divide&image&into&array&of&4x4&cells;&in&each&cell&compute& mean&color&value&(in&CIELab&space)&and&mean&gradient&energy& &

Gradient(magnitude( Color(channels(

L&=&(L11,&L12,&…,&L44)&

&

&&

&

&

A&=&(a11,&a12,&…,&a44)&

&

&

B&=&(b11,&b12,&…,&b44)& G&=&(G11,&G12,&…,&G44)&

Image&descriptor& is&concatena-on&&

  • f&L,&A,&B,&and&G(

(64&dimensions);& then&learn&& SVM&classifier&

Image(

slide-19
SLIDE 19

Classifica-on&with&visual&features&

  • Vision&yields&modest&(~3%)&improvement&in&precision&

Correctly(classified(as(non8snow:( Incorrectly(classified(as(snow:(

slide-20
SLIDE 20

Es-ma-ng&vegeta-on&cover&

  • We&also&es-mate&vegeta-on&cover&(greenery&index)&
  • n&a&con-nental&scale&

– Again&using&ground&truth&data&from&Terra&satellite&

slide-21
SLIDE 21

BuSerfly(

slide-22
SLIDE 22

Leaves(

slide-23
SLIDE 23

Conclusion&

  • We&propose&to&observe&the&natural&world&through&mining&&

public&photos&from&online&social&sharing&sites&

– Hundreds&of&billions&of&images&available& – But&noise,&bias,&content&extrac-on&are&challenges&

  • We&study&two&phenomena,&snow&cover&and&vegeta-on&

– Using&geoHtags,&-me&stamps,&text&tags,&and&visual&features& – Use&ground&truth&from&satellites&to&measure&es-ma-on&accuracy&

  • Future&work&

– More&sophis-cated&computer&vision&techniques& – Combine&our&noisy,&sparse&data&with&biologists’&noisy,&sparse&data& – Study&other&phenomena,&like&migra-on&pa]erns&of&wildlife,& distribu-ons&of&blooming&flowers,&etc.&

slide-24
SLIDE 24

Thank&you!&

slide-25
SLIDE 25

False&posi-ves&

Visible(snow,(( i.e.&bad&ground&truth,&&

  • mestamps,&geotags&

16%& No(visible(snow,(( i.e.&Incorrect&or&& misleading&tags& 42%& Trace(or(distant(snow( 33%& Man8made(snow( 9%& (Total&of&N=1,855&photos)&