SLIDE 1
AUTOMATIC IMPROVEMENT OF POINT-OF-INTEREST TAGS FOR OPENSTREETMAP - - PowerPoint PPT Presentation
AUTOMATIC IMPROVEMENT OF POINT-OF-INTEREST TAGS FOR OPENSTREETMAP - - PowerPoint PPT Presentation
AUTOMATIC IMPROVEMENT OF POINT-OF-INTEREST TAGS FOR OPENSTREETMAP DATA Stefan Funke and Sabine Storandt <node id="2955673661"> <tag k="amenity" v="restaurant"/> <tag k="name" v="
SLIDE 2
SLIDE 3
A LITTLE TAGGING GAME
<tag k="amenity" v="restaurant"\> <tag k="name" v="..."\> <tag k="cuisine" v="???"\> izumi sushi bar
SLIDE 4
A LITTLE TAGGING GAME
<tag k="amenity" v="restaurant"\> <tag k="name" v="..."\> <tag k="cuisine" v="???"\> izumi sushi bar sushi; japanese
SLIDE 5
A LITTLE TAGGING GAME
<tag k="amenity" v="restaurant"\> <tag k="name" v="..."\> <tag k="cuisine" v="???"\> izumi sushi bar sushi; japanese toro blanco tapas bar
SLIDE 6
A LITTLE TAGGING GAME
<tag k="amenity" v="restaurant"\> <tag k="name" v="..."\> <tag k="cuisine" v="???"\> izumi sushi bar sushi; japanese toro blanco tapas bar tapas; spanish
SLIDE 7
A LITTLE TAGGING GAME
<tag k="amenity" v="restaurant"\> <tag k="name" v="..."\> <tag k="cuisine" v="???"\> izumi sushi bar sushi; japanese toro blanco tapas bar tapas; spanish pizzaria bella italia
SLIDE 8
A LITTLE TAGGING GAME
<tag k="amenity" v="restaurant"\> <tag k="name" v="..."\> <tag k="cuisine" v="???"\> izumi sushi bar sushi; japanese toro blanco tapas bar tapas; spanish pizzaria bella italia pizza; italian
SLIDE 9
A LITTLE TAGGING GAME
<tag k="amenity" v="restaurant"\> <tag k="name" v="..."\> <tag k="cuisine" v="???"\> izumi sushi bar sushi; japanese toro blanco tapas bar tapas; spanish pizzaria bella italia pizza; italian 50’s diner
SLIDE 10
A LITTLE TAGGING GAME
<tag k="amenity" v="restaurant"\> <tag k="name" v="..."\> <tag k="cuisine" v="???"\> izumi sushi bar sushi; japanese toro blanco tapas bar tapas; spanish pizzaria bella italia pizza; italian 50’s diner burger; american
SLIDE 11
A LITTLE TAGGING GAME
<tag k="amenity" v="restaurant"\> <tag k="name" v="..."\> <tag k="cuisine" v="???"\> izumi sushi bar sushi; japanese toro blanco tapas bar tapas; spanish pizzaria bella italia pizza; italian 50’s diner burger; american subway
SLIDE 12
A LITTLE TAGGING GAME
<tag k="amenity" v="restaurant"\> <tag k="name" v="..."\> <tag k="cuisine" v="???"\> izumi sushi bar sushi; japanese toro blanco tapas bar tapas; spanish pizzaria bella italia pizza; italian 50’s diner burger; american subway sandwich; american
SLIDE 13
A LITTLE TAGGING GAME
<tag k="amenity" v="restaurant"\> <tag k="name" v="..."\> <tag k="cuisine" v="???"\> izumi sushi bar sushi; japanese toro blanco tapas bar tapas; spanish pizzaria bella italia pizza; italian 50’s diner burger; american subway sandwich; american chau’s wok
SLIDE 14
A LITTLE TAGGING GAME
<tag k="amenity" v="restaurant"\> <tag k="name" v="..."\> <tag k="cuisine" v="???"\> izumi sushi bar sushi; japanese toro blanco tapas bar tapas; spanish pizzaria bella italia pizza; italian 50’s diner burger; american subway sandwich; american chau’s wok chinese
SLIDE 15
A LITTLE TAGGING GAME
<tag k="amenity" v="restaurant"\> <tag k="name" v="..."\> <tag k="cuisine" v="???"\> izumi sushi bar sushi; japanese toro blanco tapas bar tapas; spanish pizzaria bella italia pizza; italian 50’s diner burger; american subway sandwich; american chau’s wok chinese fresh fish buns
SLIDE 16
A LITTLE TAGGING GAME
<tag k="amenity" v="restaurant"\> <tag k="name" v="..."\> <tag k="cuisine" v="???"\> izumi sushi bar sushi; japanese toro blanco tapas bar tapas; spanish pizzaria bella italia pizza; italian 50’s diner burger; american subway sandwich; american chau’s wok chinese fresh fish buns seafood
SLIDE 17
A LITTLE TAGGING GAME
<tag k="amenity" v="restaurant"\> <tag k="name" v="..."\> <tag k="cuisine" v="???"\> izumi sushi bar sushi; japanese toro blanco tapas bar tapas; spanish pizzaria bella italia pizza; italian 50’s diner burger; american subway sandwich; american chau’s wok chinese fresh fish buns seafood mykonos restaurant
SLIDE 18
A LITTLE TAGGING GAME
<tag k="amenity" v="restaurant"\> <tag k="name" v="..."\> <tag k="cuisine" v="???"\> izumi sushi bar sushi; japanese toro blanco tapas bar tapas; spanish pizzaria bella italia pizza; italian 50’s diner burger; american subway sandwich; american chau’s wok chinese fresh fish buns seafood mykonos restaurant greek
SLIDE 19
A LITTLE TAGGING GAME
<tag k="amenity" v="restaurant"\> <tag k="name" v="..."\> <tag k="cuisine" v="???"\> izumi sushi bar sushi; japanese toro blanco tapas bar tapas; spanish pizzaria bella italia pizza; italian 50’s diner burger; american subway sandwich; american chau’s wok chinese fresh fish buns seafood mykonos restaurant greek taj mahal
SLIDE 20
A LITTLE TAGGING GAME
<tag k="amenity" v="restaurant"\> <tag k="name" v="..."\> <tag k="cuisine" v="???"\> izumi sushi bar sushi; japanese toro blanco tapas bar tapas; spanish pizzaria bella italia pizza; italian 50’s diner burger; american subway sandwich; american chau’s wok chinese fresh fish buns seafood mykonos restaurant greek taj mahal indian
SLIDE 21
A LITTLE TAGGING GAME
<tag k="amenity" v="restaurant"\> <tag k="name" v="..."\> <tag k="cuisine" v="???"\> izumi sushi bar sushi; japanese toro blanco tapas bar tapas; spanish pizzaria bella italia pizza; italian 50’s diner burger; american subway sandwich; american chau’s wok chinese fresh fish buns seafood mykonos restaurant greek taj mahal indian ⇒ Machine Learning to deduce new tags from the name tag automatically
SLIDE 22
EXTRAPOLATABLE TAGS
SLIDE 23
EXTRAPOLATABLE TAGS
OSM Wiki provides overview of reasonable tags.
SLIDE 24
EXTRAPOLATABLE TAGS
OSM Wiki provides overview of reasonable tags. We only consider tags which occur at least 200 times in our data set.
SLIDE 25
EXTRAPOLATABLE TAGS
OSM Wiki provides overview of reasonable tags. We only consider tags which occur at least 200 times in our data set. ... 25 out of over 1500 cuisine classes remained ... not considered, e.g.
home made cake (too specific) german-bohemian (home-brewed) b¨ urgerliche k¨ uche (not in English) music (wrong usage) israelian (indeed rare) chineese (wrong spelling)
SLIDE 26
FEATURE EXTRACTION
punjab moghul mahal indian palace simran shezan maharani taj palace rama aanjal namaskar bombay tandoori ganesha flavours of india satyam palace of india satluj safran saaz shaan taj mahal anmol carmen mehefil india haus maharani chai ji amaltas krishna (indisch) amrit surya maharadscha taste of india kashmir yogi haus jaipur ghandi badsha maharadscha shahi rama sitar maharaja indian place el sol badshah india house indian mango shalimar shivalik goa dhaba indira gandhi krishna delhi palace express shere punjab jai pur kashmirhaus dehli palace swagat shiva zum ratskeller indian garden the rambagh palace sher e punjab namaste shalimar bombay maharadscha maharani natraj radha shere punjab himalaya goa indian palace swagatam bella punjabi shanti shop curry king swagat shiva’s thali gasthaus adler shan lahori
SLIDE 27
FEATURE EXTRACTION
punjab moghul mahal indian palace simran shezan maharani taj palace rama aanjal namaskar bombay tandoori ganesha flavours of india satyam palace of india satluj safran saaz shaan taj mahal anmol carmen mehefil india haus maharani chai ji amaltas krishna (indisch) amrit surya maharadscha taste of india kashmir yogi haus jaipur ghandi badsha maharadscha shahi rama sitar maharaja indian place el sol badshah india house indian mango shalimar shivalik goa dhaba indira gandhi krishna delhi palace express shere punjab jai pur kashmirhaus dehli palace swagat shiva zum ratskeller indian garden the rambagh palace sher e punjab namaste shalimar bombay maharadscha maharani natraj radha shere punjab himalaya goa indian palace swagatam bella punjabi shanti shop curry king swagat shiva’s thali gasthaus adler shan lahori
Good indicator phrases? indian palace mahal taj bombay mahara
SLIDE 28
FEATURE EXTRACTION
punjab moghul mahal indian palace simran shezan maharani taj palace rama aanjal namaskar bombay tandoori ganesha flavours of india satyam palace of india satluj safran saaz shaan taj mahal anmol carmen mehefil india haus maharani chai ji amaltas krishna (indisch) amrit surya maharadscha taste of india kashmir yogi haus jaipur ghandi badsha maharadscha shahi rama sitar maharaja indian place el sol badshah india house indian mango shalimar shivalik goa dhaba indira gandhi krishna delhi palace express shere punjab jai pur kashmirhaus dehli palace swagat shiva zum ratskeller indian garden the rambagh palace sher e punjab namaste shalimar bombay maharadscha maharani natraj radha shere punjab himalaya goa indian palace swagatam bella punjabi shanti shop curry king swagat shiva’s thali gasthaus adler shan lahori
N For every name in N we construct all k-grams with k = 3, . . . , 10. k-gram – substring of length k
example ’taj mahal’, k=4: taj , aj m, j ma, mah, maha, ahal
SLIDE 29
FEATURE EXTRACTION
punjab moghul mahal indian palace simran shezan maharani taj palace rama aanjal namaskar bombay tandoori ganesha flavours of india satyam palace of india satluj safran saaz shaan taj mahal anmol carmen mehefil india haus maharani chai ji amaltas krishna (indisch) amrit surya maharadscha taste of india kashmir yogi haus jaipur ghandi badsha maharadscha shahi rama sitar maharaja indian place el sol badshah india house indian mango shalimar shivalik goa dhaba indira gandhi krishna delhi palace express shere punjab jai pur kashmirhaus dehli palace swagat shiva zum ratskeller indian garden the rambagh palace sher e punjab namaste shalimar bombay maharadscha maharani natraj radha shere punjab himalaya goa indian palace swagatam bella punjabi shanti shop curry king swagat shiva’s thali gasthaus adler shan lahori
N For every name in N we construct all k-grams with k = 3, . . . , 10. k-gram – substring of length k
example ’taj mahal’, k=4: taj , aj m, j ma, mah, maha, ahal
For all k-grams we count their
- ccurencies in N.
example: taj 2 maha 9
SLIDE 30
FEATURE EXTRACTION
punjab moghul mahal indian palace simran shezan maharani taj palace rama aanjal namaskar bombay tandoori ganesha flavours of india satyam palace of india satluj safran saaz shaan taj mahal anmol carmen mehefil india haus maharani chai ji amaltas krishna (indisch) amrit surya maharadscha taste of india kashmir yogi haus jaipur ghandi badsha maharadscha shahi rama sitar maharaja indian place el sol badshah india house indian mango shalimar shivalik goa dhaba indira gandhi krishna delhi palace express shere punjab jai pur kashmirhaus dehli palace swagat shiva zum ratskeller indian garden the rambagh palace sher e punjab namaste shalimar bombay maharadscha maharani natraj radha shere punjab himalaya goa indian palace swagatam bella punjabi shanti shop curry king swagat shiva’s thali gasthaus adler shan lahori
N For every name in N we construct all k-grams with k = 3, . . . , 10. k-gram – substring of length k
example ’taj mahal’, k=4: taj , aj m, j ma, mah, maha, ahal
For all k-grams we count their
- ccurencies in N.
example: taj 2 maha 9
Significant k-grams are contained in at least 2% of names in N. Prune k-grams that are substrings of
- ther significant k-grams if the
freuency is the same.
example:
- nald 753
mc donald’s 753
SLIDE 31
MACHINE LEARNING
SLIDE 32
MACHINE LEARNING
For each class, we have a list of indicator phrases with percentages.
class indian italian greek chinese indian maha palace 14.35 12.69 6.50 gri 11.95 tavern 9.02 akropolis 4.56 ria 25.50 pizz 21.19 china 26.30 asia 10.98 ing 8.49 ang 7.79
SLIDE 33
MACHINE LEARNING
For each class, we have a list of indicator phrases with percentages.
class indian italian greek chinese indian maha palace 14.35 12.69 6.50 gri 11.95 tavern 9.02 akropolis 4.56 ria 25.50 pizz 21.19 china 26.30 asia 10.98 ing 8.49 ang 7.79
For each name we construct a feature-vector: ⇒ Otherwise phrase length multiplied with percentage. ⇒ 0 if phrase is not contained in the name.
indian mango 86.1 23.4 pizzaria trulli 76.5 84.8
SLIDE 34
MACHINE LEARNING
For each class, we have a list of indicator phrases with percentages.
class indian italian greek chinese indian maha palace 14.35 12.69 6.50 gri 11.95 tavern 9.02 akropolis 4.56 ria 25.50 pizz 21.19 china 26.30 asia 10.98 ing 8.49 ang 7.79
For each name we construct a feature-vector: ⇒ Otherwise phrase length multiplied with percentage. ⇒ 0 if phrase is not contained in the name.
indian mango 86.1 23.4 pizzaria trulli 76.5 84.8
Machine Learning on tagged data. ⇒ Random Forest ⇒ returns probability distribution over all possible classes
indian mango pizzaria trulli 90% 0% 0% 10% 0% 0% 100% 0%
SLIDE 35
EXPERIMENTAL RESULTS
SLIDE 36
EXPERIMENTAL RESULTS
SLIDE 37
EXPERIMENTAL RESULTS
28,128 restaurants without cuisine tag. Assigned ethnicity tag when probability > 75%, food type when probability 100%.
SLIDE 38
EXPERIMENTAL RESULTS
28,128 restaurants without cuisine tag. Assigned ethnicity tag when probability > 75%, food type when probability 100%. 19,671 new ethnicity cuisine tags and 1,460 new food type cuisine tags. Manual checks (500) showed an accuracy of 98%.
fischerklause c = seafood la stella c = pizza eiscaf´ e rialto c = ice_cream pizzeria italia c = pizza 50’s diner c = burger block house c = steak_house calimero c = ice_cream nordsee c = seafood pizzeria marino c = pizza rosenburger hof c = burger nazar kebap stube c = kebab chilli peppers rock cafe c = coffee_shop eiscaf´ e dolce vita c = ice_cream baguetterie filou c = sandwich classic western steakhouse c = steak_house shaki sushi c = sushi cafe kamps c = coffee_shop sakura sushi & grill c = sushi speisekammer c = ice_cream d¨
- ner haus
c = kebab schwaben-br¨ au c = german ginnheimer wirtshaus c = german china imbiss drache c = chinese gameiro pizza-express c = italian taverna ilios c = greek zur feurigen bratwurst c = german pizzeria italia c = italian kartoffelhaus c = german pizzeria venezia c = italian einkehr c = german sushi for friends c = japanese il capriccio c = italian deutscher hof c = german mykonos c = greek sausalitos c = mexican my thai c = thai
- mr. kebab
c = turkish dschingis khan c = chinese el paso c = mexican caf´ e mallorca c = spanish
SLIDE 39
OTHER RESULTS
Consider POIs which only have a name tag. 461 new food related tags (restaurant, bar, biergarten, cafe) 4,212 new amenity and shop tags (supermarket, bakery, hairdresser, etc.) 3,452 new tourism and leisure tags (hotel, playground, sports centre) Overall accuracy 85%.
SLIDE 40
OTHER RESULTS
Consider POIs which only have a name tag. 461 new food related tags (restaurant, bar, biergarten, cafe) 4,212 new amenity and shop tags (supermarket, bakery, hairdresser, etc.) 3,452 new tourism and leisure tags (hotel, playground, sports centre) Overall accuracy 85%. Could be integrated in a dialogue system.
<node> <tag k="name" v="Walmart"\> </node> Would you like to add a shop=supermarket tag? yes no <node> <tag k="name" v="Walmart"\> <tag k="shop" v="supermarket"\> </node>
SLIDE 41
CONCLUSIONS AND FUTURE WORK
A large portion of POI names already contains information about e.g. the amenity or cuisine.
SLIDE 42
CONCLUSIONS AND FUTURE WORK
A large portion of POI names already contains information about e.g. the amenity or cuisine. Future work Consider additional tags beside the name tag for extrapolation, e.g. the opening hours tag, the brand tag or free text tags as note or description tags. Perform experiments on other countries.
SLIDE 43