Adaptive Power Management for Energy Harvesting Sensor Nodes using Reinforcement Learning
Shaswot Shresthamali, Masaaki Kondo, Hiroshi Nakamura The University of Tokyo
Reinforcement Learning Shaswot Shresthamali, Masaaki Kondo, Hiroshi - - PowerPoint PPT Presentation
Adaptive Power Management for Energy Harvesting Sensor Nodes using Reinforcement Learning Shaswot Shresthamali, Masaaki Kondo, Hiroshi Nakamura The University of Tokyo CONTEXT Energy Harvesting Sensor Nodes Sensor Node (capable of varying
Shaswot Shresthamali, Masaaki Kondo, Hiroshi Nakamura The University of Tokyo
Theoretically capable of perpetual operation
8 March 2017 NAKAMURA LABORATORY
3
(capable of varying the duty cycle)
http://www.libelium.com/resources/ima ges/content/products/plug- sense/details/solar_powered_photo.png
Say your battery is at 75% and there is plenty of sunshine Do you
sensor node. If so, then with what proportion?
8 March 2017 NAKAMURA LABORATORY
4
100 200 300 400 500 600 700 800 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
ENERGY HARVESTED BATTERY TIME
Policy 1 Policy 2 Policy 3 Energy Harvested
8 March 2017 NAKAMURA LABORATORY
5
Environmental Sensor Networks – P.I. Corke et. al. https://sites.google.com/site/sarmavrudhula/home/research/energy-management-of-wireless-sensor- networks http://www.mdpi.com/sensors/sensors-12-02175/article_deploy/html/images/sensors-12- 02175f5-1024.png
MOVING SENSORS DIFFERENT ENVIRONMENTS DIFFERENT SENSORS
BILLION AND TRILLIONS OF NODES
8 March 2017 NAKAMURA LABORATORY
6
https://bendeetech.files.wordpress.com/2015/12/theinternetofthings2-540x334.jpg?w=350&h=200&crop=1
When dealing with TRILLIONS of sensor nodes, Customizing each node is impractical, impossible
8 March 2017 NAKAMURA LABORATORY
7
ENERGY HARVESTING NODES NEED TO BE
To demonstrate how to overcome the challenges by using Reinforcement Learning (RL)
8 March 2017 NAKAMURA LABORATORY
8
Energy Neutral Operation (ENO)
Maximize Performance
Minimize Battery Downtime
Minimize Energy Waste
Energy Waste = Energy Harvested –Energy consumed by Node –Energy to charge battery
8 March 2017 NAKAMURA LABORATORY
10
8 March 2017 NAKAMURA LABORATORY
12
Solar Panel Battery Sensor Node Solar Energy Adaptive Power Manager Duty Cycle Battery Reserve Level Energy being harvested
(Ideal)
A BRIEF INTRODUCTION
Type of Machine Learning - Learns by interacting with Environment Suited for Sequential Decision Making Tasks Map situations (states) into actions – receive as much reward as possible Based on iterative process of trial and error – similar to how humans learn. (Search and Memory)
8 March 2017 NAKAMURA LABORATORY
14
By using RL, it is possible
human input.
8 March 2017 NAKAMURA LABORATORY
15
8 March 2017 NAKAMURA LABORATORY
16
REWARD, New State OBSERVATIONS: Battery Level Energy Harvested ACTION: Choose Duty Cycle What action should I take to accumulate total maximum reward?
Agent (Power Manager) Environment
http://wedreamabout.com/product/bb-8-droid- the-coolest-star-wars-toy-ever http://www.canstockphoto.com/go- green-icons-concept-tree-12796260.html
The question is:
WHICH ACTION TO TAKE WHEN YOU ARE IN A GIVEN STATE?
EXAMPLE: Lots of sunlight | Battery at 60% Do you
recharging?
charging?
8 March 2017 NAKAMURA LABORATORY
17
[1] https://spaceplace.nasa.gov/sun-corona/en/ [2] https://handyenergy.ru/ [1] [2]
Assign every state-action pair → Q-Value, Q(s,a)
8 March 2017 NAKAMURA LABORATORY
18
State X Action 1 Action 3 Action 2 Q(X,1) Q(X,2) Q(X,3)
Q(s,a) means if the agent
Higher the Q-value, better the action for that particular state
Challenge → Determining the Q-Values for all state- action pairs. Q-table -> contains Q-Values of all possible state- action pairs Accomplished by Q-Learning Algorithm
8 March 2017 NAKAMURA LABORATORY
19
Q-Learning Algorithm
8 March 2017 NAKAMURA LABORATORY
20
State is defined by :
Total possible states: 200 x 5 = 1000
8 March 2017 NAKAMURA LABORATORY
22
Action: Choose duty cycle of the sensor node 𝐵 = 𝑏 𝑢𝑙 ∈ 10%, 20%, 30% … .100% 10% 50 mW 50% 250 mW 100% 500 mW
8 March 2017 NAKAMURA LABORATORY
23
8 March 2017 NAKAMURA LABORATORY
24
The reward depends on:
∆𝑓
𝑜𝑓𝑣𝑢𝑠𝑏𝑚 𝑢𝑙 = 𝑓 ℎ𝑏𝑠𝑤𝑓𝑡𝑢 𝑢𝑙 − 𝑓 𝑜𝑝𝑒𝑓 𝑢𝑙
Training: Tokyo (2000 to 2009) Testing : Tokyo (2010)
8 March 2017 NAKAMURA LABORATORY
26
8 March 2017 NAKAMURA LABORATORY
27
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Naïve Kansal Our method using RL Efficiency(%) Energy Wasted(%)
Duty Cycle is proportional to battery level Fix duty cycle for present day by predicting total energy for next day Higher Efficiency Lower Waste
𝐵𝑑𝑢𝑣𝑏𝑚 𝐸𝑣𝑢𝑧 𝐷𝑧𝑑𝑚𝑓 𝐵𝑑ℎ𝑗𝑓𝑤𝑏𝑐𝑚𝑓 𝑁𝑏𝑦𝑗𝑛𝑣𝑛 𝐸𝑣𝑢𝑧 𝐷𝑧𝑑𝑚𝑓 𝑈𝑝𝑢𝑏𝑚 𝐹𝑜𝑓𝑠𝑧 𝑋𝑏𝑡𝑢𝑓𝑒 𝑈𝑝𝑢𝑏𝑚 𝐹𝑜𝑓𝑠𝑧 𝐼𝑏𝑠𝑤𝑓𝑡𝑢𝑓𝑒
𝐹𝑜𝑓𝑠𝑧 𝑋𝑏𝑡𝑢𝑓 = 𝐹𝑜𝑓𝑠𝑧 𝐼𝑏𝑠𝑤𝑓𝑡𝑢𝑓𝑒 −𝑂𝑝𝑒𝑓 𝐹𝑜𝑓𝑠𝑧 − 𝐷ℎ𝑏𝑠𝑗𝑜 𝐹𝑜𝑓𝑠𝑧
8 March 2017 NAKAMURA LABORATORY
28
8 March 2017 NAKAMURA LABORATORY
29
High Duty Cycle even during the night
8 March 2017 NAKAMURA LABORATORY
30
1360 1400 1440 1480 0% 40% 100% 20% 60% 80% EPOCH Duty Cycle (%) Harvested Energy (%) Battery (%) Lower duty cycle during the night
8 March 2017 NAKAMURA LABORATORY
31
Perfect Q-convergence takes too long. Instead, use ε-greedy approach with non-converged Q-table. ε-greedy approach:
8 March 2017 NAKAMURA LABORATORY
32
8 March 2017 NAKAMURA LABORATORY
33
sunshine)
and
(Online).
55 56 57 58 59 60 61 62 63 64 65
2010 2011 2012 2013 2014 2015
Average Duty Cycle (%)
Wakkanai Offline Wakkanai Online
(8) (14)
With -greedy implementation, the agent adapts to the environment and minimizes instances of battery exhaustion.
8 March 2017 NAKAMURA LABORATORY
34
Total number of times the battery was completely exhausted
(14)
Greedy (Non adaptive) -greedy (adaptive)
8 March 2017 NAKAMURA LABORATORY
35
8 March 2017 NAKAMURA LABORATORY
37
ANY COMMENTS OR QUESTIONS ARE WELCOME