Content Based Architectures for Networking Aaditeshwar Seth - - PowerPoint PPT Presentation

content based architectures for networking
SMART_READER_LITE
LIVE PREVIEW

Content Based Architectures for Networking Aaditeshwar Seth - - PowerPoint PPT Presentation

Content Based Architectures for Networking Aaditeshwar Seth Department of Computer Science, IIT Delhi Joint work with A.Ruhela, R.Tripathy, A.Mahla, D.Martin, I.Ahuja, Q.Niyaz, A.Dubey, S.Brahmi, A.Subramaniam, Z.Koradia, A.Singh and


slide-1
SLIDE 1

Aaditeshwar Seth Department of Computer Science, IIT Delhi

Joint work with A.Ruhela, R.Tripathy, A.Mahla, D.Martin, I.Ahuja, Q.Niyaz, A.Dubey, S.Brahmi, A.Subramaniam, Z.Koradia, A.Singh and A.Mahanti, S. Ardon, S. Triukose, H.Saran, A.Bagchi November 2011

Content Based Architectures for Networking

slide-2
SLIDE 2

Gen 1: Phone numbers carry path information

2

slide-3
SLIDE 3

Gen 2: Endpoints have addresses, nodes switch packets

[Baran, 1964]

slide-4
SLIDE 4

Today

slide-5
SLIDE 5

Internet ~ Content transfer

5

Cisco, 2009 Moon, et al, IMC 2007

slide-6
SLIDE 6

Content delivery networks, flattening Internet

6

Internet Atlas, NANOG, 2009

slide-7
SLIDE 7

Content sharing via social networking websites

Pew Internet, 2008

slide-8
SLIDE 8

Gen 3: Semantic content based networks

 Users care about content, not where it is available  Treat content objects as first class entities in the network

 Push/pull content objects  Content lookup servers  Routers can cache content  Semantic cache replacement, pre-fetching policies

 Utilize content metadata  Utilize OSN signals about content metadata

8

slide-9
SLIDE 9

OSN aided content distribution

slide-10
SLIDE 10

Network architectures

10

  • a. CDN guided by

data from online social networking websites

  • b. P2P gossip on social

network overlay

slide-11
SLIDE 11

Dataset

 7M users  196M tweets  Duration: June 11, 2009 to Sept 1, 2009  OpenCalais to identify tweet topics

 6M topics, reduced to 0.9M topics having at least 15 users  Sampled 4K topics for detailed analysis

 Yahoo geocoding API to

identify user locations

 4M users with locations

11

slide-12
SLIDE 12

Topic spread across geographies

12

  • Can use traffic spikes in originating

region to predict spikes in other regions

  • LDA for topic identification, CF and

follower-count for country similarity

slide-13
SLIDE 13

Popular topics have a large spread, unpopular topics confined to few countries

13

  • High degree of spatial

locality can be useful for content placement and caching

  • Explore at city/region

level too

slide-14
SLIDE 14

Does initiator popularity predict topic popularity?

14

slide-15
SLIDE 15

Tracking giant component growth can help

15

  • Dominant giant component in

popular topics, not as dominant in less popular topics

  • But growth of giant component seems

to always coincides with popularity

  • growth. Methods to track giant

component growth dynamically?

slide-16
SLIDE 16

Other interesting observations

16

Periodic topics Ephemeral Vs stable Sharp/slow growth and decay

slide-17
SLIDE 17

Next steps

 Online event detection algorithms  Predictors for geographic spread of topics  Simulations to evaluate CDN Vs. P2P content distribution

architectures

 Cache replacement policies  Pre-fetching  Centralized and distributed algorithms

17

slide-18
SLIDE 18

Content based networks for rural areas

slide-19
SLIDE 19

Community media in rural areas

 Variety of mechanisms

 Community radio  Community video  Wall newspapers  …

19

slide-20
SLIDE 20
  • Digital Green: 1500+ videos (5 states)
  • Community radio: 5GB new content per month
  • Rural news: 40,000+ calls per month per state
slide-21
SLIDE 21

Ideas and awareness for creating relevant programs

21

Produce impactful programs

  • Civic activism
  • Political change

Topic of the month

  • Employment
  • Right to Food
  • Water and sanitation
  • Maternal and child health
slide-22
SLIDE 22

Social networking and content sharing

22

slide-23
SLIDE 23

Digital Green dataset analysis

slide-24
SLIDE 24

A content distribution network for rural areas

24

Constraints Design principles Application use-cases: Publish-subscribe, broadcast, multicast, browsing and content download Content-based network. Content objects are first class entities; routers can cache content, examine metadata Local content production and consumption. Metadata can reveal access patterns Content transfer capabilities to/from local rendezvous points in villages Applications are tolerant of delays Delay tolerant data transfer. Always-on content channel for route initializations and content download/upload requests 2G coverage is not sufficient for large content

  • transfers. But ubiquitously available now
slide-25
SLIDE 25

Network stack

25

slide-26
SLIDE 26

Simulation analysis

 Topology layout

 Block-block, block-district roads  Villages clustered around blocks  Village-village, village-block

 Movement schedules

 Village-block by ad hoc means of transport. Once a day  Block-block, block to district, by bus. Few times a day

 Algorithms

 Unicast with caching, multicast, multicast with pre-fetching,

  • ptimal multicast

 Cache replacement: LRU, seasonal preference

26

slide-27
SLIDE 27

Download requirements at gateway

27

slide-28
SLIDE 28

Effects of network topology

Short circuiting across villages helps in mesh-like topologies

slide-29
SLIDE 29

Effects of consumption patterns

 Not much improvement with seasonal preference according

to indicated relevance periods

 DG screens videos throughout the year to sustain community

interest

 Not much improvement with cache sizes beyond 1GB

 DG makes rounds of villages screening the same set of videos,

then moves on to other videos

 Next steps

 More rigorous analysis of cache occupancy  Dataset and topology modeling to design generic policies  Small-scale field deployment

29

slide-30
SLIDE 30

Application framework for mobile devices with flaky Internet connections

slide-31
SLIDE 31

Mobile traffic

31

Cisco, 2011

slide-32
SLIDE 32

App server

Telcos are already putting caching proxies in their access networks

slide-33
SLIDE 33

Offline application development

 Applications run offline from a local cache

 Key-value get/put API to data-store  Data-store synchronization provided by the middleware itself

 Optimized transport layer  Control-data separation  Other features

 Data summarization  Namespace subscriptions  Security & access control  Transactions  Consistency

33

Middleware

slide-34
SLIDE 34

Evidence of traffic shaping in cellular data networks?

34

Download on GPRS

slide-35
SLIDE 35

Download on GPRS

Or, aggregate slot allocation on uplink?

35

Ack bunching at server trace Client trace is clean however

slide-36
SLIDE 36

Non-uniform latencies on uplink

36

Upload on GPRS

200ms 800ms 700ms 150ms 300ms

slide-37
SLIDE 37

Next steps

 Model traffic shaping and scheduling policies used in

different cellular data networks

 Optimize TCP for these conditions  Release application development framework for Android  Collect user data on WiFi mobility and content access

patterns to determine delivery latencies and usability insights

37

slide-38
SLIDE 38

Key messages

 Content based network architectures can improve

performance in today’s Internet usage context

 Semantic metadata  Social networking websites

 Challenges present themselves at different layers

 Architecture appropriateness  Prediction algorithms for pre-fetching  Tracking algorithms for event detection  Application development framework  Optimized transport layers

Thanks for listening!

slide-39
SLIDE 39

39

slide-40
SLIDE 40

Spread occurs to countries with followers in that country

40

LDA for topic identification, CF and follower-count for recommendation on country similarity