Data Collection through Device- to-Device Communications for Mobile - - PowerPoint PPT Presentation
Data Collection through Device- to-Device Communications for Mobile - - PowerPoint PPT Presentation
Data Collection through Device- to-Device Communications for Mobile Big Data Sensing Hanshang Li, Ting Li, Xinghua Shi and Yu Wang College of Computing and Informatics University of North Carolina at Charlotte May 17 , 2016 @ The First
2
Data Collection through Device- to-Device Communications for Mobile Big Data Sensing
Hanshang Li, Ting Li, Xinghua Shi and Yu Wang
College of Computing and Informatics University of North Carolina at Charlotte
May 17, 2016 @ The First Workshop of Mission-Critical Big Data Analytics (MCBDA 2016)
OUTLINE
➤ Introduction ➤ Mobile Data Collection ➤ Relay Selection Problem ➤ Our Solutions ➤ Simulations ➤ Conclusions
3
OUTLINE
➤ Introduction ➤ Mobile Data Collection ➤ Relay Selection Problem ➤ Our Solutions ➤ Simulations ➤ Conclusions
4
MOBILE DEVICES
➤ Nowadays, more and more smart mobile devices are utilized by humans as the
primary personal devices, which have the functions of computing, sensing, communicating and so on.
5
MOBILE DEVICES AND USERS
An Introduction to Mobile Marketing: The Past, Present, and Future, Marketo, 2015 Cisco VNI Global Mobile Data Traffic Forecast, 2015 - 2020, Cisco, 2016
Source: Cisco VNI Mobile, 2016
MOBILE DATA EXPLOSION
➤ Mobile data traffic grows!
grew 74% in 2015, reached 3.7 exabytes/month, 4,000 times of the one in 2005 will surpass 30.6 exabytes per month in 2020
➤ Mainly came from smart devices
though smart devices
- nly represent 36% of
devices/connections, they account for 89%
- f all mobile traffics
Cisco VNI Global Mobile Data Traffic Forecast, 2015 - 2020, Cisco, 2016
MOBILE CROWD SENSING — “POWER OF THE CROWD”
➤ Individuals with sensing and computing devices collectively share data and extract
information to measure and map phenomena of common interests
➤ Widely used in many applications - human as sensors
8
ADVANTAGES OF MOBILE CROWD SENSING
➤ Leverages existing sensing and communication
infrastructures with less additional costs;
➤ Provides unprecedented spatial-temporal coverage,
especially for observing unpredictable events;
➤ Integrates human intelligence into the sensing
and data processing.
9
GENERAL FRAMEWORK OF MOBILE CROWD SENSING
Sensing Tasks Selection Mechanism Participants
Coverage Cost Incentive
Reward Sensing Data User Traces Task Assignment Tasks
10
➤ A large number of
mobile participants
➤ A set of crowd sensing
tasks
➤ Participant selection
mechanism - the focus
- f most current works
GENERAL FRAMEWORK OF MOBILE CROWD SENSING
Sensing Tasks Selection Mechanism Participants
Coverage Cost Incentive
Reward Sensing Data User Traces Task Assignment Tasks
10
➤ A large number of
mobile participants
➤ A set of crowd sensing
tasks
➤ Participant selection
mechanism - the focus
- f most current works
CHALLENGE TO CURRENT NETWORK INFRASTRUCTURE
➤ Current cellular network do not have enough capacity to support all of the fast
growing mobile big data from smart devices and mobile sensing
OUTLINE
➤ Introduction ➤ Mobile Data Collection ➤ Relay Selection Problem ➤ Our Solutions ➤ Simulations ➤ Conclusions
12
DATA COLLECTION IN MOBILE CROWD SENSING
➤ How to transfer sensing data back?
cellular network (piggyback) WiFi or femtocell offloading D2D/DTN relays
Sensing Tasks
Selection Mechanism Participants
Coverage Cost Incentive
Rewards Sensing Data
User Traces Task Assignment
Tasks
D2D: Device-to-Device DTN: Delay Tolerant Networks
DATA COLLECTION IN MOBILE CROWD SENSING
➤ How to transfer sensing data back?
cellular network (piggyback) WiFi or femtocell offloading D2D/DTN relays
Sensing Tasks
Selection Mechanism Participants
Coverage Cost Incentive
Rewards Sensing Data
User Traces Task Assignment
Tasks
+ low cost and easy to deploy D2D: Device-to-Device DTN: Delay Tolerant Networks
DATA COLLECTION IN MOBILE CROWD SENSING
➤ How to transfer sensing data back?
cellular network (piggyback) WiFi or femtocell offloading D2D/DTN relays
Sensing Tasks
Selection Mechanism Participants
Coverage Cost Incentive
Rewards Sensing Data
User Traces Task Assignment
Tasks
+ low cost and easy to deploy
- longer delay and low deliver ratio
D2D: Device-to-Device DTN: Delay Tolerant Networks
MOBILE DATA COLLECTION VIA D2D RELAYS
➤ Leverage user mobility to delivery the sensing data from the source to the sink(s)
14
RELATED WORKS
➤ Data Collection in Mobile Sensing
Wang et al. [UbiComp 2013] consider Bluetooth/Wifi offloading (one-hop) to reduce energy consumption and data cost of data-plan users Karaliopoulos et al. [InfoCom 2015] consider a joint user recruitment with D2D data collection (multi-hop), however, the time complexity of proposed greedy algorithm is large due to search over all space-time paths
➤ DTN/D2D Routing
Focus on point to point delivery over D2D relays, selecting relay node on ride
➤ Data Offloading
WiFi [Lee et al. 2010, Dimatteo et al. 2011], FemtoCell [Chandrasekhar et al. 2008] D2D [Han et al. 2012, Li et al. 2014, Zhu et al., 2013], broadcasting or point-to-point
OUTLINE
➤ Introduction ➤ Mobile Data Collection ➤ Relay Selection Problem ➤ Our Solutions ➤ Simulations ➤ Conclusions
16
MODEL AND ASSUMPTIONS
➤ n mobile users, User=u1,u2, …, un ➤ m locations, Location=l1,l2, …, lm ➤ T, time period for delivery ➤ Known probability p(i,j,t), mobile user ui visits
location lj at time t (learn from historical data)
➤ T
wo devices can transfer sensing data if they are visiting the same location within a particular time slot
➤ Collection task: sending the data from a source
node s to a sink node d (a mobile device or a location)
➤ Restricted flooding (Epidemic routing) is used within selected relay nodes U(s,d)
RELAY SELECTION PROBLEM
➤ Goal: minimize the number relay nodes U(s,d) while maximize the data delivery ➤ T
wo versions of the optimization problem
Minimum Relay Problem K Relay Problem
TWO CHALLENGES
➤ How to model the time-evolving D2D network and estimate the delivery
probability? weighted space-time graph and reliability calculation
➤ How to identify a small set of relay nodes from a huge candidate pool to guarantee
certain level of data delivery? greedy algorithm
OUTLINE
➤ Introduction ➤ Mobile Data Collection ➤ Relay Selection Problem ➤ Our Solutions ➤ Simulations ➤ Conclusions
20
SPACE-TIME GRAPH
➤ Space-time graph describes all characteristics among the selected relay nodes in both
spacial and temporal spaces
21
1
u
2
u
3 4
u u
1 1 1 1 1 1
u
2
u
3 4
u u s= d= u t=1 t=2 t=3 t=5 t=4 =s =d u
1
u
2
u
3 4
u u
5 5 5 5 5 5 5
u
➤ Each spacial link has a delivery probability ➤ With flooding, the delivery probability can be calculated
via the following dynamic programming Thus,
DELIVERY PROBABILITY OVER SPACE-TIME GRAPH
22
1
u
2
u
3 4
u u
1 1 1 1 1 1
u
2
u
3 4
u u s= d= u t=1 t=2 t=3 t=5 t=4 =s =d u
1
u
2
u
3 4
u u
5 5 5 5 5 5 5
u
p( − − − − → ut−1
j
ut
k) = (1 − m
Y
i=1
(1 − p(j, i, t)p(k, i, t))) · r( − − − − → ut−1
j
ut
k),
Q
delivery probability based on the ws p(U(s, d), s, d) = pG(s0, dT )
➤ Each spacial link has a delivery probability ➤ With flooding, the delivery probability can be calculated
via the following dynamic programming Thus,
DELIVERY PROBABILITY OVER SPACE-TIME GRAPH
22
1
u
2
u
3 4
u u
1 1 1 1 1 1
u
2
u
3 4
u u s= d= u t=1 t=2 t=3 t=5 t=4 =s =d u
1
u
2
u
3 4
u u
5 5 5 5 5 5 5
u
p( − − − − → ut−1
j
ut
k) = (1 − m
Y
i=1
(1 − p(j, i, t)p(k, i, t))) · r( − − − − → ut−1
j
ut
k),
Q
delivery probability based on the ws p(U(s, d), s, d) = pG(s0, dT )
RELAY SELECTION ALGORITHM
➤ Greedy Algorithm
in each step, greedily selects the user u which leads to maximal improvement
- f p(U(s, d), s, d) into U(s, d)
➤ Cold Start Problem
initially, the space-time is not connected at all, and adding a single user cannot solve this solution: simply pick the most active user
1,
arding the Algorithm 1 Relay Selection Algorithm Input: potential user set User, call probability p(i, j, t) for each user in User, the source s and the sink d. Output: selected relay nodes U(s, d).
1: U(s, d) = ∅ 2: while GU(s,d) is connected do 3:
Choose the most active user and add it into U(s, d)
4: while |U(s, d)| < K or p(U(s, d), s, d) < γ (for K relay
problem or minimum relay problem, respectively) do
5:
for all ui ∈ User and / ∈ U(s, d) do
6:
Calculate the improvement of p(U(s, d), s, d) by adding ui in to U(s, d)
7:
Select the user ui with the largest reliability improve- ment and add it into U(s, d)
8: return U(s, d)
RELAY SELECTION ALGORITHM
➤ Greedy Algorithm
in each step, greedily selects the user u which leads to maximal improvement
- f p(U(s, d), s, d) into U(s, d)
➤ Cold Start Problem
initially, the space-time is not connected at all, and adding a single user cannot solve this solution: simply pick the most active user
1,
arding the Algorithm 1 Relay Selection Algorithm Input: potential user set User, call probability p(i, j, t) for each user in User, the source s and the sink d. Output: selected relay nodes U(s, d).
1: U(s, d) = ∅ 2: while GU(s,d) is connected do 3:
Choose the most active user and add it into U(s, d)
4: while |U(s, d)| < K or p(U(s, d), s, d) < γ (for K relay
problem or minimum relay problem, respectively) do
5:
for all ui ∈ User and / ∈ U(s, d) do
6:
Calculate the improvement of p(U(s, d), s, d) by adding ui in to U(s, d)
7:
Select the user ui with the largest reliability improve- ment and add it into U(s, d)
8: return U(s, d)
OUTLINE
➤ Introduction ➤ Mobile Data Collection ➤ Relay Selection Problem ➤ Our Solutions ➤ Simulations ➤ Conclusions
24
D4D DATASET
➤ Cellular tracing data (anonymized Call Records) from Orange
50, 000 mobile users in Ivory Coast for one half year contains access records of each mobile user over every two-week period 46,254 active users and 1,097 cellular towers released for Data for Development (D4D) Challenge in 2013
25
EXPERIMENT SETTING
➤ 20 most popular towers with largest associated records ➤ Choose relay nodes from a 100 candidate user set ➤ For simplicity, link reliability as 0.5, i.e., the successful
transferring over a pair of nodes is 50% during their encountering
➤ For each data collection task, we randomly select a mobile user as the data source
and one location as the sink
➤ For each set of experiments, we test 15 tasks and 100 rounds per tasks. The average
performances over 1, 500 rounds are reported.
TESTED ALGORITHMS
➤ Three algorithms
Our Method: greedily choose the user with most improvement of delivery ratio Active: choose the most active user (visiting most location) Random: randomly choose a user at each step until K users or delivery ratio >= γ
27
RESULTS
RESULTS
K
10 15 20
Delivery Ratio
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Random Activity Our Method K
10 15 20
Delivery Ratio
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
- Est. DR
DR
K relay problem where K = 10, 15 or 20
RESULTS
K
10 15 20
Delivery Ratio
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Random Activity Our Method K
10 15 20
Delivery Ratio
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
- Est. DR
DR
K relay problem where K = 10, 15 or 20
γ
0.6 0.75 0.9
Delivery Ratio
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Random Activity Our Method γ
0.6 0.75 0.9
|U(s,d)|
5 10 15 20 25 30 35 40 45
Random Activity Our Method
minimum relay problem where γ = 0.6, 0.75 or 0.9
OUTLINE
➤ Introduction ➤ Mobile Data Collection ➤ Relay Selection Problem ➤ Our Solutions ➤ Simulations ➤ Conclusion
29
CONCLUSION
➤ Big data from mobile sensing bring new challenge in mobile data collection ➤ Consider a relay selection problem for mobile data collection via D2D relays
aim to use small relay set to guarantee certain data delivery via D2D flooding formate the problem as two optimization problems (K relay selection or minimum relay selection) on relay set selection propose a greedy based solution, which utilizes the historical records and space-time graph to estimate the expected delivery ratio tested via real-life D4D dataset
➤ Future work
hybrid data collection scheme
30
Tracing data provided by: Funded by:
31
Contact: yu.wang@uncc.edu PhD Students: Hanshang Li, Ting Li Collaborators: Xinghua Shi (UNCC) Joint works with:
ACKNOWLEDGEMENT
TIME COMPLEXITY
➤ Given the space-time graph G defined by r relay nodes, starting from a source node,
the dynamic programming algorithm can compute the delivery ratio of all other nodes within time of O(rT (log(rT ) + r))
➤ Greedy algorithm runs at most K or n times (for K relay problem or minimum relay
problem), thus the total time complexity is at most O(KrT (log(rT ) + r)) or O(nrT (log(rT ) + r)) in the worst case