RetailNet: Uma abordagem baseada em Deep Learning para contagem de - - PowerPoint PPT Presentation

retailnet uma abordagem baseada em deep learning para
SMART_READER_LITE
LIVE PREVIEW

RetailNet: Uma abordagem baseada em Deep Learning para contagem de - - PowerPoint PPT Presentation

RetailNet: Uma abordagem baseada em Deep Learning para contagem de pessoas e deteco de zonas quentes em lojas de varejo Valrio Nogueira Jr., Hugo Oliveira, Jos Krerley Oliveira Augusto Silva, Thales Vieira (presenter) Institute of


slide-1
SLIDE 1

RetailNet: Uma abordagem baseada em Deep Learning para contagem de pessoas e detecção de zonas quentes em lojas de varejo

Valério Nogueira Jr., Hugo Oliveira, José Augusto Silva, Thales Vieira (presenter) Institute of Computing Krerley Oliveira Institute of Mathematics

Federal University of Alagoas (UFAL)

slide-2
SLIDE 2

Projeto de inovação: Matemática & Indústria

slide-3
SLIDE 3

Projeto de inovação: Matemática & Indústria

Empresa do setor varejista com lojas em várias capitais do Nordeste

slide-4
SLIDE 4

Projeto de inovação: Matemática & Indústria

Problema: como entender melhor o comportamento dos clientes para otimizar a gestão? Empresa do setor varejista com lojas em várias capitais do Nordeste

slide-5
SLIDE 5

Customer behavior analysis

Retail sector: major fraction of the world’s developed economies Understanding customer attitudes and behavior is crucial to maximize profit and increase the competitiveness of retail stores Effective sales staff scheduling is

  • f critical importance to the

profitable operations Managing these aspects efficiently has been the focus of research for decades. when do the customers go shopping? (customer’s flow) where are the hot spots of the store?

slide-6
SLIDE 6

Actually Computer Vision problems!

Customer’s flow analysis (people count) Hot spots detection

slide-7
SLIDE 7

Actually Computer Vision problems!

Customer’s flow analysis (people count) Hot spots detection

slide-8
SLIDE 8

Actually Computer Vision problems!

Customer’s flow analysis (people count) Hot spots detection

50 100 150 200 250 time (s) 5 10 15 20 25 people count

ground truth predicted

slide-9
SLIDE 9

Current Scenario for Computer Vision

✓Remarkable advances in hardware and

software: Deep learning revolution

✓Outstanding solutions for Computer

Vision problems: object recognition and localization, autonomous cars… Not much attention has been given to more specific problems, such as accurate people counting

slide-10
SLIDE 10

Related Work: counting by detection

Dalal and Triggs (2005)

HOG descriptors

Sabzmeydani and Mori (2007)

Shapelet features

Detect each individual in the image

slide-11
SLIDE 11

Related Work: counting by detection

Dalal and Triggs (2005)

HOG descriptors

Sabzmeydani and Mori (2007)

Shapelet features Deep learning (R-CNN, Yolo…) ?

Detect each individual in the image

slide-12
SLIDE 12

Why not counting by detection/deep learning?

✓Low-resolution images from low-cost surveillance cameras ✓Extreme poses / occlusion

slide-13
SLIDE 13

Related Work: clustering-based methods

✓Relies on spatiotemporal coherence ✓People count is affected by each individual detection failure (individual/local detection)

Brostow and Cipolla (2006)

Space-time Bayesian clustering of local-features

slide-14
SLIDE 14

Related Work: regression methods

✓Crowd density estimation vs. accurate people counting

Conte et al (2010)

✓Globally estimates the crowd density ✓Most extensively used approach ✓Mainly employed for outdoor crowd analysis: sporting events, political rallies, etc. SVR regression Deep learning

Boominathan et al (2016)

slide-15
SLIDE 15

Challenges

slide-16
SLIDE 16

Challenges

✓Accurate people counting ✓Severe occlusion ✓Extreme poses ✓Low-resolution images from low-cost surveillance cameras

slide-17
SLIDE 17

✓A foreground detection method to recognize people in low-resolution RGB videos (adapted to our problem) ✓An input image format named RGBP to provide color and foreground (or people) information ✓A CNN regression model to accurately count people

A deep learning approach for people counting A method to generate heat maps for hot spots detection

Our contributions

slide-18
SLIDE 18
slide-19
SLIDE 19

Outline

  • 1. Overview
  • 2. Foreground detection & RGBP images
  • 3. CNN based regression for people counting
  • 4. Heat map generation for hot spot detection
  • 5. Experiments
  • 6. Conclusion & Future work

Camera RGB image foreground/background detection quantized RGB image binary P image extraction RGBP image composition Hot spots heat map accumulation and visualization people count CNN training set annotation

} }

50 100 150 200 250 time (s) 5 10 15 20 25 people count

ground truth predicted

slide-20
SLIDE 20

Overview: training phase

Camera RGB image foreground/background detection quantized RGB image binary P image extraction RGBP image composition CNN training set annotation

slide-21
SLIDE 21

Camera RGB image foreground/background detection quantized RGB image binary P image extraction RGBP image composition people count CNN

Overview: real-time prediction

slide-22
SLIDE 22

Overview: hot spots detection

Camera RGB image foreground/background detection quantized RGB image binary P image extraction Hot spots heat map accumulation and visualization

slide-23
SLIDE 23

Outline

  • 1. Overview
  • 2. Foreground detection & RGBP images
  • 3. CNN based regression for people counting
  • 4. Heat map generation for hot spot detection
  • 5. Experiments
  • 6. Conclusion & Future work

Camera RGB image foreground/background detection quantized RGB image binary P image extraction RGBP image composition Hot spots heat map accumulation and visualization people count CNN training set annotation

} }

50 100 150 200 250 time (s) 5 10 15 20 25 people count

ground truth predicted

slide-24
SLIDE 24

Camera RGB image foreground/background detection quantized RGB image binary P image extraction RGBP image composition Hot spots heat map accumulation and visualization people count CNN training set annotation

Foreground detection & RGBP images

slide-25
SLIDE 25

Camera RGB image foreground/background detection quantized RGB image binary P image extraction RGBP image composition Hot spots heat map accumulation and visualization people count CNN training set annotation

Foreground detection & RGBP images

slide-26
SLIDE 26

Foreground detection & RGBP images

Strategy 1: acquire a static background image (empty store)

  • Background is not static (furniture, products..)
  • Illumination changes/shadows

Camera RGB image foreground/background detection quantized RGB image binary P image extraction

Strategy II: analyze motion features (detect static/moving objects)

  • people often remain static
slide-27
SLIDE 27

Foreground detection & RGBP images

Strategy 1: acquire a static background image (empty store)

  • Background is not static (furniture, products..)
  • Illumination changes/shadows

Camera RGB image foreground/background detection quantized RGB image binary P image extraction

Strategy II: analyze motion features (detect static/moving objects)

  • people often remain static

Our strategy:

✓Image preprocessing to improve invariance ✓Background initialization ✓Dynamic background updates

slide-28
SLIDE 28

RGB Image preprocessing

Camera RGB image quantized RGB image

✓Image resampling: 400x225 pixels (CNN input, empirically chosen) ✓Image quantization: uniform quantization with 64 levels (4 per channel)

Obs.:quantized image used for background generation only

slide-29
SLIDE 29

Background initialization

Strategy 1: collect a single background image (empty store) Strategy I1: accumulate data from a few images in pixel histograms

slide-30
SLIDE 30

Background initialization

Strategy 1: collect a single background image (empty store) Strategy I1: accumulate data from a few images in pixel histograms

<latexit sha1_base64="AguHUBwHUjvs0PYuiZvgdq6M5AM=">ACZnicbVFdSxtBFJ1da6uptVEpfejLpUnBQlh2gloRBMEXHy0YFZIQZmfvJhNnZ5aZWlY9k/65rMv/RmdfFBa9T6dOfr3DNJIYV1cfwYhGtv1t+29hsvN/6sP2xubN7bXVpOPa4ltrcJsyiFAp7TjiJt4VBlicSb5K783n+5h6NFVpduVmBw5yNlcgEZ85To2Y9UFqoFJUDGsG5VlakaKA9yJmbcCari3pUiWndBjdBmHhBemxYDpk2UIhfKG9LzrT7x0Qp3SQameh2z3swPSUwvJ5EMftDhwdQCKUhcIP/zslGjVbcRQvAl4CugItsorLUfPBD+Vl7vVyazt07hw4oZJ7jEujEoLRaM37Ex9j1ULEc7rBY21fDNM+lCeab9vQv2346K5dbO8sRXzq+3z3Nz8rVcv3TZ8bASqigdKr5clJUSnIa5AKg9zJmQeMG+G1Ap8w7jzP9PwJtDnJ78E192IxhH92W2dnazs2CBfyFeyTyj5Qc7IBbkPcLJU7AZ7AZ7we9wO/wUfl6WhsGqZ4/8FyH8AeEKs9g=</latexit> <latexit sha1_base64="c8F4xF20s7xmyg8z79dKTZLA2BQ=">ACRXicbVA9bxNBEN0LBKTgCFlmhEOUqrTnRsQVSQ3lEHCSfZc3N7eFV9nYvu7NI5uQ/R0NPxz+goQChtLB2XEDCSCM9vfl8r2y18pxlX5Ote/e3Hzc2e092t/KT/9NmZt8GRHJPV1l2U6KVWRo5ZsZYXrZPYlFqel5ejVf38g3ReWfOF62cNvjeqFoRcqRm/aIwVplKGoZhCiNrvKqkAwRSjoJGB2Wo68gcFQ3ynFB3o+UR2Bp4LkGjZ0C6CsrJCq4CGlYfI1LxivTprD/I0mwdcBfkGzAQmzid9b8UlaXQxHco7vaTPGt52qFjRVoue0XwskW6jNsnERpspJ92axeW8CIyFdTWxYxy1uzfEx023i+aMnaupPjbtRX5v9okcP1q2inTBpaGbg7VQNbWFkKVRPrBcRIDkVfwWao0PiaHwvmpDflnwXnA3TPEvzt8PByeuNHTviUDwXxyIXL8WJeCNOxViQ+CS+iR/iZ/I5+Z78Sq5vWreSzcyB+CeS38AQAexhQ=</latexit> <latexit sha1_base64="sG7+jr9gnzx5EORqUx7qeFu9Q9s=">ACRXicbVBNSyNBFOyJ+xHjfmT1uJdmE8GDN62MWTEIQcI2xUyITwpufF9NrTM3S/EcIwf24v3r35D7x4UMSrdmIO8eNBQ1FVj65Xca6kpSC48morHz5+lxfbax9+frte/PH+pHNCiOwLzKVmZMYLCqpsU+SFJ7kBiGNFR7HZ52ZfnyOxspM/6VpjsMUTrUcSwHkqFEzinQmdYKa+K7PD0BMOMkUOWgunRW5tFxqi4YwcYC3oxRoIkCVnaq9zYs8AXJupZaUbjUq5b+q7Y+arcAP5sPfgnABWmwxvVHzMkoyUaQujlBg7SAMchqWYEgKhVUjKizmIM5csIGDGlK0w3LeQsU3HZPwcWbc+fM2eWNElJrp2nsnLOk9rU2I9/TBgWN/wxLqfOCUIvnj8aF4pTxWaU8kQYFqakDIx0WbmYgAFBrviGKyF8fJbcLTjh4EfHu609vcWdTZT/aLbGQ/Wb7rMt6rM8E+8+u2S278y68G+/e3i21rzFzgZ7Md7jE6FksI8=</latexit> <latexit sha1_base64="lgUJRoZ35vG5LEBM3Zgah9QE=">ACOnicbVBaxNBGJ2N1cZYNerRy2Aq9CBhtwhKTwEvPTZg0kASlm9nv92MmZ1Zr4thCW/qxd/hTcPXnpoKV79Ac6mOWjTBwOP976Pb95LSiUdheHPoPVo7/GT/fbTzrOD5y9edl+9HjtTWYEjYZSxkwQcKqlxRJIUTkqLUCQKz5Pl8Y/v0DrpNFfaVXivIBcy0wKIC/F3eFMG6lT1MQ/9vnIT+cFUALAao+Xce1/LY+5GR4jhotEHJaIJdakgTFExDL3JpKpx94Li9Q82R1End7YT/cgO+SaEt6bIuzuPtjlhpRFf4PQoFz0ygsaV6DJSkUrjuzymHpL0GOU081FOjm9Sb6mr/3SsozY/3zGTbqvxs1FM6tisRPNrHcfa8RH/KmFWf57XUZUWoxd2hrFJNF02PJUWBamVJyCs70NwsQALgnzbHV9CdD/yLhkf96OwHw2Pe4OTbR1t9pa9Y0csYp/YgJ2yMzZigl2yX+ya3QTfg6vgNvh9N9oKtjtv2H8I/vwFOKytEA=</latexit> <latexit sha1_base64="3lHcn2TvVimuWkAXTKXWqGsGRQY=">ACTHicbVDPSxtBFJ5Nrdq0aqzHXgZDIUIu8GiCELQS4KRoVsCLOzLzpxdmY781Yalv0De/HQW/8KLx5aRHA2RvHXBwPfN97M+9USqFRd/61U+zH2cX1j8VP38ZWl5pb69djqzHDocS21OY2YBSkU9FCghNPUAEsiCSfRxX7pn1yCsUKrI5ykMEjYmRIjwRk6aVjodJCxaCQhgnDc85kvlc0RHO8QXdpiPAL80THUDSe7G4xzMW42GiGPzMWU7EbhLFGS9vtH0aNun4Udj0fXdvVYe1ut/yp6BvSTAjdTLDwbD2xz3As8RNxSWzth/4KQ5yZlBwCU1zCykjF+wM+g7qlgCdpBPwyjod6fEdKSNO26rqfq8I2eJtZMkcpXlRva1V4rvef0MR9uDXKg0Q1D84aNRJilqWiZLY2GAo5w4wrgRblbKz5lhHF3+ZQjB65XfkuN2K/BbweFmvbMzi2ORfCPrpECskU6pEsOSI9w8ptck3/kv3fl3Xi3t1DacWb9ayRF6jM3wMqLrC4</latexit>
slide-31
SLIDE 31

Background initialization

Strategy 1: collect a single background image (empty store) Strategy I1: accumulate data from a few images in pixel histograms reliable backgrounds generated by sampling

  • ne frame per second, for 100 seconds.
<latexit sha1_base64="AguHUBwHUjvs0PYuiZvgdq6M5AM=">ACZnicbVFdSxtBFJ1da6uptVEpfejLpUnBQlh2gloRBMEXHy0YFZIQZmfvJhNnZ5aZWlY9k/65rMv/RmdfFBa9T6dOfr3DNJIYV1cfwYhGtv1t+29hsvN/6sP2xubN7bXVpOPa4ltrcJsyiFAp7TjiJt4VBlicSb5K783n+5h6NFVpduVmBw5yNlcgEZ85To2Y9UFqoFJUDGsG5VlakaKA9yJmbcCari3pUiWndBjdBmHhBemxYDpk2UIhfKG9LzrT7x0Qp3SQameh2z3swPSUwvJ5EMftDhwdQCKUhcIP/zslGjVbcRQvAl4CugItsorLUfPBD+Vl7vVyazt07hw4oZJ7jEujEoLRaM37Ex9j1ULEc7rBY21fDNM+lCeab9vQv2346K5dbO8sRXzq+3z3Nz8rVcv3TZ8bASqigdKr5clJUSnIa5AKg9zJmQeMG+G1Ap8w7jzP9PwJtDnJ78E192IxhH92W2dnazs2CBfyFeyTyj5Qc7IBbkPcLJU7AZ7AZ7we9wO/wUfl6WhsGqZ4/8FyH8AeEKs9g=</latexit> <latexit sha1_base64="c8F4xF20s7xmyg8z79dKTZLA2BQ=">ACRXicbVA9bxNBEN0LBKTgCFlmhEOUqrTnRsQVSQ3lEHCSfZc3N7eFV9nYvu7NI5uQ/R0NPxz+goQChtLB2XEDCSCM9vfl8r2y18pxlX5Ote/e3Hzc2e092t/KT/9NmZt8GRHJPV1l2U6KVWRo5ZsZYXrZPYlFqel5ejVf38g3ReWfOF62cNvjeqFoRcqRm/aIwVplKGoZhCiNrvKqkAwRSjoJGB2Wo68gcFQ3ynFB3o+UR2Bp4LkGjZ0C6CsrJCq4CGlYfI1LxivTprD/I0mwdcBfkGzAQmzid9b8UlaXQxHco7vaTPGt52qFjRVoue0XwskW6jNsnERpspJ92axeW8CIyFdTWxYxy1uzfEx023i+aMnaupPjbtRX5v9okcP1q2inTBpaGbg7VQNbWFkKVRPrBcRIDkVfwWao0PiaHwvmpDflnwXnA3TPEvzt8PByeuNHTviUDwXxyIXL8WJeCNOxViQ+CS+iR/iZ/I5+Z78Sq5vWreSzcyB+CeS38AQAexhQ=</latexit> <latexit sha1_base64="sG7+jr9gnzx5EORqUx7qeFu9Q9s=">ACRXicbVBNSyNBFOyJ+xHjfmT1uJdmE8GDN62MWTEIQcI2xUyITwpufF9NrTM3S/EcIwf24v3r35D7x4UMSrdmIO8eNBQ1FVj65Xca6kpSC48morHz5+lxfbax9+frte/PH+pHNCiOwLzKVmZMYLCqpsU+SFJ7kBiGNFR7HZ52ZfnyOxspM/6VpjsMUTrUcSwHkqFEzinQmdYKa+K7PD0BMOMkUOWgunRW5tFxqi4YwcYC3oxRoIkCVnaq9zYs8AXJupZaUbjUq5b+q7Y+arcAP5sPfgnABWmwxvVHzMkoyUaQujlBg7SAMchqWYEgKhVUjKizmIM5csIGDGlK0w3LeQsU3HZPwcWbc+fM2eWNElJrp2nsnLOk9rU2I9/TBgWN/wxLqfOCUIvnj8aF4pTxWaU8kQYFqakDIx0WbmYgAFBrviGKyF8fJbcLTjh4EfHu609vcWdTZT/aLbGQ/Wb7rMt6rM8E+8+u2S278y68G+/e3i21rzFzgZ7Md7jE6FksI8=</latexit> <latexit sha1_base64="lgUJRoZ35vG5LEBM3Zgah9QE=">ACOnicbVBaxNBGJ2N1cZYNerRy2Aq9CBhtwhKTwEvPTZg0kASlm9nv92MmZ1Zr4thCW/qxd/hTcPXnpoKV79Ac6mOWjTBwOP976Pb95LSiUdheHPoPVo7/GT/fbTzrOD5y9edl+9HjtTWYEjYZSxkwQcKqlxRJIUTkqLUCQKz5Pl8Y/v0DrpNFfaVXivIBcy0wKIC/F3eFMG6lT1MQ/9vnIT+cFUALAao+Xce1/LY+5GR4jhotEHJaIJdakgTFExDL3JpKpx94Li9Q82R1End7YT/cgO+SaEt6bIuzuPtjlhpRFf4PQoFz0ygsaV6DJSkUrjuzymHpL0GOU081FOjm9Sb6mr/3SsozY/3zGTbqvxs1FM6tisRPNrHcfa8RH/KmFWf57XUZUWoxd2hrFJNF02PJUWBamVJyCs70NwsQALgnzbHV9CdD/yLhkf96OwHw2Pe4OTbR1t9pa9Y0csYp/YgJ2yMzZigl2yX+ya3QTfg6vgNvh9N9oKtjtv2H8I/vwFOKytEA=</latexit> <latexit sha1_base64="3lHcn2TvVimuWkAXTKXWqGsGRQY=">ACTHicbVDPSxtBFJ5Nrdq0aqzHXgZDIUIu8GiCELQS4KRoVsCLOzLzpxdmY781Yalv0De/HQW/8KLx5aRHA2RvHXBwPfN97M+9USqFRd/61U+zH2cX1j8VP38ZWl5pb69djqzHDocS21OY2YBSkU9FCghNPUAEsiCSfRxX7pn1yCsUKrI5ykMEjYmRIjwRk6aVjodJCxaCQhgnDc85kvlc0RHO8QXdpiPAL80THUDSe7G4xzMW42GiGPzMWU7EbhLFGS9vtH0aNun4Udj0fXdvVYe1ut/yp6BvSTAjdTLDwbD2xz3As8RNxSWzth/4KQ5yZlBwCU1zCykjF+wM+g7qlgCdpBPwyjod6fEdKSNO26rqfq8I2eJtZMkcpXlRva1V4rvef0MR9uDXKg0Q1D84aNRJilqWiZLY2GAo5w4wrgRblbKz5lhHF3+ZQjB65XfkuN2K/BbweFmvbMzi2ORfCPrpECskU6pEsOSI9w8ptck3/kv3fl3Xi3t1DacWb9ayRF6jM3wMqLrC4</latexit>
slide-32
SLIDE 32

Background updates

Required due to dynamic changes of objects in the scene Simply using the previous strategy may not work due to static people Use the same strategy being more conservative:

<latexit sha1_base64="iFvnWQV303/1FKch3Dg0HVZltGg=">AC43icbVJNb9MwGHbC1xY+VuDIxaICdVKpkl1Ak5AmuOw4JLpNqktxnDetN8fO7DewKsqVCwcQ4sqf4sZf4YTljG2vZKlR8/7vN9OSyUdxvGvILx2/cbNW2vr0e07d+9tdO4/2HemsgKGwihjD1PuQEkNQ5So4LC0wItUwUF6/Lr1H3wA6TRb3FewrjgUy1zKTh6atL5zVKYSl3DSbVgmogVHGeCq/pV8w57sn+0SV/SaCUTvpRronWGcIp1YTJoemcBu82klkc+arNPn9KlROa08YLTq1SUTeHE63hFmcgMUgbIKetTxnyFc23U+CxpFq38S2xwBvajdND0IwY6+9saUzwF5efZTrk4nlpT6awqM47QLHVnk0463XgQL4xeBskKdMnK9iadnywzoipAo1DcuVESlziuUpVJu+clD6onwKIw81L8CN68WNGvrEMxnNjfVPI12w5yNqXjg3L1KvbCd3F30teZVvVGH+YlxLXVYIWiwL5ZWiaGh7cJpJCwLV3AMurPS9UjHjlgv03yLyS0gujnwZ7G8NkniQvNnq7myv1rFGHpHpEcS8pzskF2yR4ZEBO+DT8GX4GsI4efwW/h9KQ2DVcxD8p+FP/4AXizpg=</latexit>
slide-33
SLIDE 33

Foreground detection (P image generation)

Absolute difference image:

<latexit sha1_base64="zMoWyDiTA91qsj3AIasFah27yiA=">ACKHicbVDLSsNAFJ34rPUVdelmsAgutCQiKIJYdKO7CvYBTQiTyaQdOnkwMxFKms9x46+4EVGkW7/ESRtE2x4YOHPOvdx7jxszKqRhjLSFxaXldXSWnl9Y3NrW9/ZbYo4Zg0cMQi3naRIyGpCGpZKQdc4ICl5GW27/N/dYT4YJG4aMcxMQOUDekPsVIKsnRr8uWpMwjqRUg2cOIpfdZ5njwCg7nGPAE/n5vsiG0jh29YlSNMeAsMQtSAQXqjv5ueRFOAhJKzJAQHdOIpZ0iLilmJCtbiSAxwn3UJR1FQxQYafjQzN4qBQP+hFXL5RwrP7tSFEgxCBwVW+pj2cnGe10mkf2GnNIwTSUI8GeQnDMoI5qlBj3KCJRsogjCnaleIe4gjLFW2ZRWCOX3yLGmeVk2jaj6cVWqXRwlsA8OwBEwTmogTtQBw2AwTN4BR/gU3vR3rQvbTQpXdCKnj3wD9r3Dy7kpo=</latexit>

Binarization by thresholding:

<latexit sha1_base64="xbt4zAVUaZswAPRa10oJNLTKU8=">ACY3icbVFRS9xAEN5Ea73Y1mj7VoSlR8sJx5EUQRFahL60b1fwVLg9js1mcre62YTdiXqG/Mm+9a0v/R/dOwOt2oGFj29mvpn5NimVtBhFPz1/bf3ZxvPNTrD14uWr7XBn98wWlREwEoUqzEXCLSipYQSFVyUBnieKDhPr4s8+fXYKws9CkuSpjkfKZlJgVHR03Du2FP9i/36ScasARmUtfCqdkm6MR9+oEyhFusZUZnhi+aHkOpUqhZznEuKq/Nc0XQns08/UCSCnrE8ZCzrR3/YC52BupIWmHzDQaTtiGnajQbQK+hTELeiSNobT8AdLC1HloFEobu04jkqc1NygFAqagFUWSi6u+AzGDmqeg53UK48a+t4xKc0K45GumL/7ah5bu0iT1zl8jr7OLck/5cbV5gdTWqpywpBi/tBWaUoFnRpOE2lAYFq4QAXRrpdqZhzwW6bwmcCfHjk5+Cs4+DOBrE3w+6J8etHZvkLXlHeiQmh+SEfCVDMiKC/PI2vG0v9H7W/6u/+a+1PfantfkQfh7fwDhlLNd</latexit> <latexit sha1_base64="kuMibHfUWQL/9fNWJ9/XmcGIULg=">ACH3icbVDLSgNBEJyN7/iKevQymAiewm4OKp4CXjxGMFIQuidNzB2dlpleJwT/x4q948aCIeMvfOJvk4KtgoKiqpqcrTJW05PtjrzA3v7C4tLxSXF1b39gsbW23bJIZgU2RqMRchWBRSY1NkqTwKjUIcajwMrw5zf3LWzRWJvqChil2Y7jWciAFkJN6pcO7CA3ySidEgqXlgMPpQYj7ycJTpFBGyWqzy0Sp4RX/GpQ6ZXKftWfgP8lwYyU2QyNXumz09EFqMmocDaduCn1B2BISkUPhQ7mcUxA1cY9tRDTHa7mhy3wPfd0qfDxLjniY+Ub9PjC2dhiHLhkDRfa3l4v/e2MBsfdkdRpRqjFdNEgU/mZeVm8Lw0KUkNHQBjp/spFBAYEuUqLroTg98l/SatWDVxl57Vy/WRWxzLbZXvsgAXsiNXZGWuwJhPskT2zV/bmPXkv3rv3MY0WvNnMDvsBb/wFTqhMg=</latexit>

Camera RGB image foreground/background detection quantized RGB image binary P image extraction RGBP image composition Hot spots heat map accumulation and visualization

slide-34
SLIDE 34

Outline

  • 1. Overview
  • 2. Foreground detection & RGBP images
  • 3. CNN based regression for people counting
  • 4. Heat map generation for hot spot detection
  • 5. Experiments
  • 6. Conclusion & Future work

Camera RGB image foreground/background detection quantized RGB image binary P image extraction RGBP image composition Hot spots heat map accumulation and visualization people count CNN training set annotation

} }

50 100 150 200 250 time (s) 5 10 15 20 25 people count

ground truth predicted

slide-35
SLIDE 35

Camera RGB image foreground/background detection quantized RGB image binary P image extraction RGBP image composition Hot spots heat map accumulation and visualization people count CNN training set annotation

CNN based regression for people counting

slide-36
SLIDE 36

Camera RGB image foreground/background detection quantized RGB image binary P image extraction RGBP image composition Hot spots heat map accumulation and visualization people count CNN training set annotation

CNN based regression for people counting

slide-37
SLIDE 37

What is a CNN?

slide-38
SLIDE 38

A brief history of Convolutional Neural Networks

✓Introduced by LeCun in the 1980s ✓Deep learning revolution in 2012

slide-39
SLIDE 39

Convolutional Neural Networks

✓Employed mainly for images ✓But also for video, text, geometry, etc. ✓Consists of a number of convolutional and subsampling layers

  • ptionally followed by fully connected layers

f( ) = ?

<latexit sha1_base64="FODiDzHXEi9SJWZEZ79zChyvm3E=">ACD3icbVDdSgJBGJ21P7O/tS67GZLAIGTXgrqJhG6NFATXJHZ8VsdnJ3dZmYLER+iB+i2HqG76LZH6Al6jUbdi9QOfHA45/s4H8ePOVPacb6tzMrq2vpGdjO3tb2zu2fn9xsqSiSFOo14Js+UcCZgLpmkMzlkBCn8O9P7iZ+PePIBWLRE0PY2iHpCdYwCjRurY+aCIvYeEdPEJvsLeKb7u2AWn5EyBl4mbkgJKUe3YP143okIQlNOlGq5TqzbIyI1oxzGOS9REBM6ID1oGSpICKo9mr4+xsdG6eIgkmaExlP178WIhEoNQ9shkT31aI3Ef/zWokOLtsjJuJEg6CzoCDhWEd40gPuMglU86EhEpmfsW0TySh2rQ1lyIYhcAY9OMu9jDMmUS+5ZqXx3XqjU0o6y6BAdoSJy0QWqoFtURXVE0RN6Qa/ozXq23q0P63O2mrHSmwM0B+vrFxiWmvM=</latexit>
slide-40
SLIDE 40

Camera RGB image foreground/background detection quantized RGB image binary P image extraction RGBP image composition Hot spots heat map accumulation and visualization people count CNN training set annotation

CNN based regression for people counting

slide-41
SLIDE 41

Supervised learning

Training Convolutional Neural Network for regression

} }

✓Input: RGBP image ✓Output: people count (real-number rounded to the nearest integer) ✓Several hyper-parameters experimented: C, K, F, L, U ✓Activation function: ReLU (except for the last layer) ✓large number of RGBP images collected and manually annotated with the people count

slide-42
SLIDE 42

Outline

  • 1. Overview
  • 2. Foreground detection & RGBP images
  • 3. CNN based regression for people counting
  • 4. Heat map generation for hot spot detection
  • 5. Experiments
  • 6. Conclusion & Future work

Camera RGB image foreground/background detection quantized RGB image binary P image extraction RGBP image composition Hot spots heat map accumulation and visualization people count CNN training set annotation

} }

50 100 150 200 250 time (s) 5 10 15 20 25 people count

ground truth predicted

bonus!

slide-43
SLIDE 43

Camera RGB image foreground/background detection quantized RGB image binary P image extraction RGBP image composition Hot spots heat map accumulation and visualization people count CNN training set annotation

Heat map generation for hot spot detection

slide-44
SLIDE 44

Camera RGB image foreground/background detection quantized RGB image binary P image extraction RGBP image composition Hot spots heat map accumulation and visualization people count CNN training set annotation

Heat map generation for hot spot detection

slide-45
SLIDE 45

Heat map generation for hot spot detection

✓accumulate P over time: ✓perform usual histogram equalization ✓color-code using a conventional colormap

Hot spots: high-traffic areas within retail stores

<latexit sha1_base64="3sHcfilBC6JTwDAXKWpuVzt1mZE=">ACF3icbVDLSsNAFJ34rPUVdelmsAiuSiKCIhQKbtwIFfqCJi2T6aQdOnkwcyOUkL9w46+4caGIW935N07aLT1wMDhnHu5c4XC67Asr6NldW19Y3N0lZ5e2d3b98OGyrKJGUtWgkItn1iGKCh6wFHATrxpKRwBOs401ucr/zwKTiUdiEaczcgIxC7nNKQEsDs+o4ZScgMKZEpHdZv4lr2PEloamdpc3MUkwSKFm506jDwOzYlWtGfAysQtSQUaA/PLGUY0CVgIVBClerYVg5sSCZwKlpWdRLGY0AkZsZ6mIQmYctNZrgyfamWI/UjqFwKeqb83UhIoNQ08PZlHUIteLv7n9RLwr9yUh3ECLKTzQ34iMEQ4LwkPuWQUxFQTQiXf8V0THQroKs6xLsxcjLpH1eta2qfX9RqV8XdZTQMTpBZ8hGl6iOblEDtRBFj+gZvaI348l4Md6Nj/noilHsHKE/MD5/APlTnyY=</latexit> <latexit sha1_base64="zxCyYQCci2LlfmfbJkn8rzWZoc0=">AB+XicbVDLSsNAFL3xWeMr6tLNYBFclUQExVXBjRuhQl/QxDKZTtqhk0mYmRK6J+4caGIW/EnX/jpM1CWw8MHM65l3vmhClnSrvut7W2vrG5tV3ZsXf39g8OnaPjtkoySWiLJDyR3RArypmgLc0p91UhyHnHbC8V3hdyZUKpaIp6mNIjxULCIEayN1Hc37f9GOsRwTx/mD01+07VrblzoFXilaQKJRp958sfJCSLqdCEY6V6npvqIMdSM8LpzPYzRVNMxnhIe4YKHFMV5PkM3RulAGKEme0Giu/t7IcazUNA7NZBFSLXuF+J/Xy3R0E+RMpJmgiwORlHOkFDWjAJCWaTw3BRDKTFZERlphoU5ZtSvCWv7xK2pc1z615j1fV+m1ZRwVO4QwuwINrqM9NKAFBCbwDK/wZuXWi/VufSxG16xy5wT+wPr8AfkhkzI=</latexit> <latexit sha1_base64="WhIwOnGzsOWmuIjuwQI65Y4hqw4=">ACGHicbVDLSgMxFM34rPVdekmWBRqDPdK4KblxWsA/o1Jb9vQTGZI7ghl6Ge48VfcuFDEbXf+jelD0NYDIYdziW5J4ilMOi6X87S8srq2npmI7u5tb2zm9vbr5o0RwqPJKRrgfMgBQKihQj3WwMJAQi3o34z92iNoIyJ1j4MYmiHrKtERnKGVWrkLP4CuUCmToqvOhtkTWn5A35/cKZ57Q8t9UO2fQCuXdwvuBHSReDOSJzOUW7mR3454EoJCLpkxDc+NsZkyjYJLGb9xEDMeJ91oWGpYiGYZjpZbEiPrdKmnUjbo5BO1N8TKQuNGYSBTYMe2beG4v/eY0EO1fNVKg4QVB8+lAnkRQjOm6JtoUGjnJgCeNa2L9S3mOacbRdZm0J3vzKi6RaLHhuwbsr5kvXszoy5JAckVPikUtSIrekTCqEkyfyQt7Iu/PsvDofzuc0uTMZg7IHzijbwNKnc=</latexit> <latexit sha1_base64="nGuR0r+mArjGRO7S6RTw6Ibxp4=">AB7XicbVBNS8NAEJ34WetX1aOXYBE8lUQExVPBi8cK9gPaUDabTbt2sxt2J4US+h+8eFDEq/Hm/GbZuDtj4YeLw3w8y8MBXcoOd9O2vrG5tb26Wd8u7e/sFh5ei4ZVSmKWtSJZTuhMQwSVrIkfBOqlmJAkFa4eju5nfHjNtuJKPOElZkJCB5DGnBK3U6o0jhaZfqXo1bw53lfgFqUKBRr/y1YsUzRImkQpiTNf3UgxyopFTwablXmZYSuiIDFjXUkSZoJ8fu3UPbdK5MZK25LoztXfEzlJjJkoe1MCA7NsjcT/O6GcY3Qc5lmiGTdLEozoSLyp297kZcM4piYgmhmtbXTokmlC0AZVtCP7y6ukdVnzvZr/cFWt3xZxlOAUzuACfLiGOtxDA5pA4Qme4RXeHOW8O/Ox6J1zSlmTuAPnM8fyQePOg=</latexit> <latexit sha1_base64="nGuR0r+mArjGRO7S6RTw6Ibxp4=">AB7XicbVBNS8NAEJ34WetX1aOXYBE8lUQExVPBi8cK9gPaUDabTbt2sxt2J4US+h+8eFDEq/Hm/GbZuDtj4YeLw3w8y8MBXcoOd9O2vrG5tb26Wd8u7e/sFh5ei4ZVSmKWtSJZTuhMQwSVrIkfBOqlmJAkFa4eju5nfHjNtuJKPOElZkJCB5DGnBK3U6o0jhaZfqXo1bw53lfgFqUKBRr/y1YsUzRImkQpiTNf3UgxyopFTwablXmZYSuiIDFjXUkSZoJ8fu3UPbdK5MZK25LoztXfEzlJjJkoe1MCA7NsjcT/O6GcY3Qc5lmiGTdLEozoSLyp297kZcM4piYgmhmtbXTokmlC0AZVtCP7y6ukdVnzvZr/cFWt3xZxlOAUzuACfLiGOtxDA5pA4Qme4RXeHOW8O/Ox6J1zSlmTuAPnM8fyQePOg=</latexit>
slide-46
SLIDE 46

Outline

  • 1. Overview
  • 2. Foreground detection & RGBP images
  • 3. CNN based regression for people counting
  • 4. Heat map generation for hot spot detection
  • 5. Experiments
  • 6. Conclusion & Future work

Camera RGB image foreground/background detection quantized RGB image binary P image extraction RGBP image composition Hot spots heat map accumulation and visualization people count CNN training set annotation

} }

50 100 150 200 250 time (s) 5 10 15 20 25 people count

ground truth predicted

slide-47
SLIDE 47

Training set

Large number of RGBP images collected and manually annotated with the people count in a real shoe store

✓1-megapixel surveillance camera ✓153 minutes of video ✓images ranging from 0 (empty) up to 30 people in the store ✓4 out of 5 consecutive images discarded due to similarity ✓37,768 manually annotated images

5 10 15 20 25 30 number of people 1000 2000 3000 number of images

slide-48
SLIDE 48

Training set

Large number of RGBP images collected and manually annotated with the people count in a real shoe store

✓1-megapixel surveillance camera ✓153 minutes of video ✓images ranging from 0 (empty) up to 30 people in the store ✓4 out of 5 consecutive images discarded due to similarity ✓37,768 manually annotated images

5 10 15 20 25 30 number of people 1000 2000 3000 number of images

slide-49
SLIDE 49

Training set

Large number of RGBP images collected and manually annotated with the people count in a real shoe store

✓1-megapixel surveillance camera ✓153 minutes of video ✓images ranging from 0 (empty) up to 30 people in the store ✓4 out of 5 consecutive images discarded due to similarity ✓37,768 manually annotated images ✓Annotation tool to ease the labeling: play the video and press buttons to

increase/decrease one unit

5 10 15 20 25 30 number of people 1000 2000 3000 number of images

slide-50
SLIDE 50

Hyper-parameters optimization & validation

✓Grid search over the hyper-parameters space ✓Cross-validation experiments: 75-25% avoiding similar images from

short periods in each subset (samples from distinct recordings)

✓Validation accuracy measure: ✓Adam optimizer of the Keras library, with 0.001 as

learning rate

✓Loss function: Mean Absolute Error (MAE)

<latexit sha1_base64="h7fgLUbPhNlMcA7TzZoCFOWz9eY=">ACQnicbZBNSxBEIZ7NB9m8+FGj7k0WQIGkmVGhEhAETwaMBVw85m6Omp0caenqG7Rhza/m1e/AXe/AFePCiSq4f0rHNINC80vDxVRXW9aSWFwTC8DGZmnz1/8XLuVe/1m7fv5vF3ZNWsOI17KUu+nzIAUCkYoUMJ+pYEVqYS9Gijre8dgzaiVDvYVDAp2IESueAMPUr6P+OC4SFn0m6NRpnwlSNQYbCTONeM2cla52NRFYsVa5H6pjp9iIr7GCdodVmrzC01ifh86qznLv7S/qDcBhORZ+aqDMD0mk76V/EWcnrAhRyYwZR2GFE8s0Ci7B9eLaQMX4ETuAsbeKFWAmdhqBo58yWheav8U0in9e8KywpimSH1ne7B5XGvh/2rjGvPViRWqhEUf1iU15JiSds8aSY0cJSN4xr4f9K+SHzAaFPvQ0henzyU7O7PIzCYfRjZbD+vYtjnwgH8kSicg3sk62yDYZEU7OyBW5IbfBeXAd3AW/H1pngm5mkfyj4P4PEJOyzA=</latexit>

} }

slide-51
SLIDE 51

Quantitative Results

<latexit sha1_base64="h7fgLUbPhNlMcA7TzZoCFOWz9eY=">ACQnicbZBNSxBEIZ7NB9m8+FGj7k0WQIGkmVGhEhAETwaMBVw85m6Omp0caenqG7Rhza/m1e/AXe/AFePCiSq4f0rHNINC80vDxVRXW9aSWFwTC8DGZmnz1/8XLuVe/1m7fv5vF3ZNWsOI17KUu+nzIAUCkYoUMJ+pYEVqYS9Gijre8dgzaiVDvYVDAp2IESueAMPUr6P+OC4SFn0m6NRpnwlSNQYbCTONeM2cla52NRFYsVa5H6pjp9iIr7GCdodVmrzC01ifh86qznLv7S/qDcBhORZ+aqDMD0mk76V/EWcnrAhRyYwZR2GFE8s0Ci7B9eLaQMX4ETuAsbeKFWAmdhqBo58yWheav8U0in9e8KywpimSH1ne7B5XGvh/2rjGvPViRWqhEUf1iU15JiSds8aSY0cJSN4xr4f9K+SHzAaFPvQ0henzyU7O7PIzCYfRjZbD+vYtjnwgH8kSicg3sk62yDYZEU7OyBW5IbfBeXAd3AW/H1pngm5mkfyj4P4PEJOyzA=</latexit>

1 2 3 4 5 6 absolute error (A) 10 20 30 40 frequency (%)

<latexit sha1_base64="4mag9dMoPyONqhriFTlIMKX+qGM=">ACEHicbVDLSgNBEJyNrxhfUY9eBoOYQAy7IiCEPHiMYJ5QDaE2ckDpmdXWZ6xWT/Dir3jxoIhXj978GyePgyYWNBRV3XR3eaHgGmz720otLC4tr6RXM2vrG5tb2e2dmg4iRVmVBiJQDY9oJrhkVeAgWCNUjPieYHWvfzXy6/dMaR7IW4hD1vJT/IupwSM1M4euj6BO0pEcjnMQzEuXAzgyAX2AIkKItkZ5uPCALvFdjZnl+wx8DxpiSHpqi0s19uJ6CRzyRQbRuOnYIrYQo4FSwYcaNAsJ7ZMeaxoqic90Kxk/NMQHRungbqBMScBj9fdEQnytY98znaPz9aw3Ev/zmhF0z1oJl2ETNLJom4kMAR4lA7ucMUoiNgQhU3t2J6RxShYDLMmBCc2ZfnSe245Ngl5+YkVz6fxpFGe2gf5ZGDTlEZXaMKqiKHtEzekVv1pP1Yr1bH5PWlDWd2UV/YH3+A6anIs=</latexit>

✓41.8% of the test images correctly predicted ✓Less than 8% of the test images resulted in

more than 2 people error

slide-52
SLIDE 52

Continuous prediction experiment

✓250 seconds of continuous video ✓gray polygon represents 10% relative error tolerance region

50 100 150 200 250 time (s) 5 10 15 20 25 people count

ground truth predicted

slide-53
SLIDE 53

Comparison with other image representations

1) is the CNN recognizing people in RGB images (i. e., from color information)? 2) is foreground detection a relevant step to improve people count accuracy? 3) is the CNN capable of learning to count people from the P image? 4) is the background color information relevant?

"Is RGBP representation indeed the best?”

slide-54
SLIDE 54

Case study on hot spots visualization

✓the peak of people flow is the entrance of the store, as expected ✓due to many people remaining at the counter for a long time, that place was considered a hot spot ✓there is a hot spot around the chairs used to try shoes ✓there is not much movement in the central area of the store, suggesting a repositioning

  • f that furniture

✓the corridors at the right of the image are also not much visited by customers ✓one hour of recording experiment

slide-55
SLIDE 55

Demo

slide-56
SLIDE 56

Demo

slide-57
SLIDE 57

Conclusion & Future Work

✓Robust results: may be potentially employed in real world situations ✓RGBP improves accuracy by combining color and foreground

information

  • Training is limited to a specific camera/store
  • Extrapolation not supported

★More results/comparisons (Yolo?) ★Investigate adaptations to detect/exclude salespeople ★Experiment other network architectures (exploit temporal

coherence, end-to-end network…)

★Analyse other aspects in retail stores…

Camera RGB image

5 10 15 20 25 30 number of people 1000 2000 3000 number of images
slide-58
SLIDE 58

Ex-future Work: Yolo comparisons

  • urs
  • Yolo v3 (Darknet-53 architecture) [1]
  • Pretrained COCO dataset [2]

[1] Redmon, Joseph, and Ali Farhadi. "Yolov3: An incremental improvement." arXiv preprint arXiv:1804.02767 (2018). [2] Lin, Tsung-Yi, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. "Microsoft coco: Common objects in context." In European conference on computer vision, pp. 740-755. Springer, Cham, 2014.

slide-59
SLIDE 59

Yolo: You Only Look Once

  • End-to-end network for object detection
  • Regression problem: returns spatially separated bounding boxes and

associated class probabilities

slide-60
SLIDE 60

Yolo: You Only Look Once

  • VIDEO
slide-61
SLIDE 61

Yolo: You Only Look Once

  • VIDEO
slide-62
SLIDE 62

Yolo: You Only Look Once

  • Divide the input image into an S × S grid
  • Each grid cell predicts B bounding boxes and confidence scores for those boxes
  • Each bounding box consists of 5 predictions: x, y, w, h, and confidence
slide-63
SLIDE 63

Ex-future Work: Yolo comparisons

  • urs
  • Yolo MAE: 6.24
  • Yolo : 51.48%

ε

1 2 3 4 5 6 absolute error (A) 10 20 30 40 frequency (%)

  • urs

Yolo

slide-64
SLIDE 64

Bad Yolo results

  • Yolo v3 (Darknet-53 architecture) [1]
  • Pretrained COCO dataset [2]
  • Temporal coherence is not exploited by

Yolo

slide-65
SLIDE 65

Thank you for your attention!

Questions?

RetailNet: Uma abordagem baseada em Deep Learning para contagem de pessoas e detecção de zonas quentes em lojas de varejo