PIRE ExoGENI ENVRI preparation for Big Data science Stavros - - PowerPoint PPT Presentation

pire exogeni envri
SMART_READER_LITE
LIVE PREVIEW

PIRE ExoGENI ENVRI preparation for Big Data science Stavros - - PowerPoint PPT Presentation

System and Network Engineering MSc Research project PIRE ExoGENI ENVRI preparation for Big Data science Stavros Konstantaras, Ioannis Grafis February 5, 2014 Background Big Data science Software Defined Networking (SDN) Huge amount


slide-1
SLIDE 1

PIRE ExoGENI – ENVRI preparation for Big Data science

Stavros Konstantaras, Ioannis Grafis February 5, 2014

System and Network Engineering MSc Research project

slide-2
SLIDE 2

Background

Big Data science

  • Huge amount of data
  • Many sources
  • Data Movement (DM)

is very important

  • Described by “5V”s

(Volume, Velocity, Variety, Variability and Value) Software Defined Networking (SDN)

  • Separate control plane

from data plane

  • Single entity controls

the network

  • Forwarding intelligence

relies on programmers

2

slide-3
SLIDE 3

Research questions

The main research question is the following:

  • To what degree can the performance of the data movement protocols be
  • ptimized by using Software Defined Networking technology?

The main research question includes the following sub- questions:

  • What network level problems exist which limit the performance of

the data movement protocols?

  • How can SDN eliminate these problems?

3

slide-4
SLIDE 4

Outline

  • Theory part
  • Problem analysis
  • Solution profiles
  • Experimental part
  • Prototyping HIDE

(Hybrid Intelligent Data Enhancer)

  • Scenarios and Results
  • Conclusion

4

slide-5
SLIDE 5

Data Movement Application problems

5

Application Positives Negatives Network limits GridFTP (Globus)

  • Open source
  • High scalability
  • High reliability
  • Option to resume transfers

that are stopped because of failures

  • Difficult to deploy
  • Network speed limit:

(13 Gbps for TCP version)

  • Decrease window size

for every loss packet and resend the packet

  • Application is not aware

for the topology and the path that data flows

  • Most of times the speed
  • f transferring data is

limited due to network traffic bbFTP (NASA)

  • Open source
  • High scalability
  • High reliability
  • Multi-stream TCP
  • Easy to deploy
  • Resume file transfer session
  • Transfer only files, not

directories

  • Little industry adoption
  • Little documentation

FDT (CERN)

  • Open source
  • Runs on all major platforms

(Java application)

  • Multi-stream TCP
  • Resume file transfer session
  • Little industry adoption
  • Little documentation
  • Network speed limit

(4.5 Gbps)

slide-6
SLIDE 6

Does the application perform well? Is it a network problem? Do nothing Is it a corrupted/ broken link? Is it a busy link? Examine / Improve the Application YES YES Fix the link Proceed to the Decision tree to select a QoS solution Examine the other entities of the network YES YES NO NO NO NO

6

Performance problem?

slide-7
SLIDE 7

Available technologies

  • Traffic monitoring
  • Deep Packet Inspection (DPI)
  • Inspect client/server interfaces
  • Inspect flow counters
  • Flow management
  • Port level
  • Socket level (IP address and TCP port)
  • Network Controllability
  • Commands to the controller (API)
  • Commands to the switches

7

slide-8
SLIDE 8

How can we improve the Application’s performance?

Can we use the Data Application for that?

Can we use the network for that? Can we control network to boost performance?

YES

Can not provide network level solution.

NO

Do we have access to Controller? Do nothing

YES

How can we make use of the Controller?

YES

We need to grand some access rights

NO

Extend the source code

DPI

Use an API to communicate Network control

Commands to Controller

Traffic monito ring Inspect flow counters Flow management

Sockets Ports Can not use SDN to solve the problem

NO

Full Access level Some Access level

Do we have access to the Application?

YES

Can we modify the source code?

Full Access level

Can we monitor the traffic?

Can not provide Application level solution Use of an API to communicate with

NO

Some Access level

Build a separate component to solve Traffic monitoring at Interfaces

YES

Extend the source code Traffic monitoring Replace the Application

NO

Do nothing

YES

Extend the source code without network input

NO

Inspect client/server interfaces

Excluded options Parallel options Traffic monitoring Flow management Network control Flow counters sockets ports

Commands to switches

8

Decision tree

slide-9
SLIDE 9

How can we improve the Application’s performance?

Can we use the Data Application for that?

Can we use the network for that? Can we control network to boost performance?

YES

Can not provide network level solution.

NO

Do we have access to Controller? Do nothing

YES

How can we make use of the Controller?

YES

We need to grand some access rights

NO

Extend the source code

DPI

Use an API to communicate Network control

Commands to Controller

Traffic monito ring Inspect flow counters Flow management

Sockets Ports Can not use SDN to solve the problem

NO

Full Access level Some Access level

Excluded options Parallel options Traffic monitoring Flow management Network control Flow counters sockets ports

Commands to switches

9

Decision tree

slide-10
SLIDE 10

How can we improve the Application’s performance?

Can we use the Data Application for that?

Can we use the network for that?

Do we have access to the Application?

YES

Can we modify the source code?

Full Access level

Can we monitor the traffic?

Can not provide Application level solution Use of an API to communicate with

NO

Some Access level

Build a separate component to solve Traffic monitoring at Interfaces

YES

Extend the source code Traffic monitoring Replace the Application

NO

Do nothing

YES

Extend the source code without network input

NO

Inspect client/server interfaces

Excluded options Parallel options 10

Decision tree

slide-11
SLIDE 11

Solution development profiles

Requirements Application level Programmer Network Programmer (API) Network Programmer (full) Hybrid Programming

Develop at Application level YES NO NO YES Develop at Network level NO YES YES NO Make use of SDN Technology NO YES YES YES Access to the Application YES NO NO SOME Access to the Controller NO SOME YES SOME Network topology knowledge NO YES YES YES Network status knowledge SOME YES YES YES Traffic monitor using DPI NO NO YES NO Traffic monitor on flow level NO YES YES YES Traffic monitor at Interfaces YES NO NO NO Flow management NO YES YES YES Network controllability NO SOME YES YES 11

slide-12
SLIDE 12

How can we improve the Application’s performance?

Can we use the Data Application for that?

Can we use the network for that? Can we control network to boost performance?

YES

Can not provide network level solution.

NO

Do we have access to Controller? Do nothing

YES

How can we make use of the Controller?

YES

We need to grand some access rights

NO

Extend the source code

DPI

Use an API to communicate Network control Traffic monito ring Inspect flow counters Flow management

Sockets Ports Can not use SDN to solve the problem

NO

Full Access level Some Access level

Do we have access to the Application?

YES

Can we modify the source code?

Full Access level

Can we monitor the traffic?

Can not provide Application level solution Use of an API to communicate with

NO

Some Access level

Build a separate component to solve Traffic monitoring at Interfaces

YES

Extend the source code Traffic monitoring Replace the Application

NO

Do nothing

YES

Extend the source code without network input

NO

Inspect client/server interfaces

Traffic monitoring Flow management Network control Application approach Our approach Network approach (Full) Network approach (API)

Commands to Controller

Flow counters sockets ports

Commands to switches

12

Solution tracks

slide-13
SLIDE 13

Controller-Application relationship

Controller Dependent Application Application Independent Dependent Controller Independent

Our approach Network level Network level (API) Application level

13

slide-14
SLIDE 14

HIDE component

server1 client1 server2 client2

OpenFlow Controller 1 Gbps links 100Mbps links 10Mbps links

SW3 SW1 SW2 SW4

Floodlight FDT Iperf

Path1 Path2

Iperf FDT HIDE

slide-15
SLIDE 15

15

slide-16
SLIDE 16

HIDE overhead

FDT

COMPONENT

time time

6s 5s 5s 5s

t0 t1 t2 t3 t4

6s

t0 t1 t2 t2’

8s 5s 2s

new connection new connection 1st FDT

  • utput

2nd FDT

  • utput

3rd FDT

  • utput

4th FDT

  • utput

discover QoS problem send commands to change path Ignored output t3 t4 confirm that problem solved

Δt = FDT + HIDE = 16s

16

slide-17
SLIDE 17

Scenarios

  • Scenario 1
  • Transferring files via Path1 with and without

interfering traffic for getting reference points

  • Scenario 2
  • Transferring files via Path1 with interfering traffic

and component enabled

  • Scenario 3
  • Interfering traffic change path every 30s in order

to stress HIDE for longer period

17

slide-18
SLIDE 18

Scenario results

18

200 400 600 800 1000 1200 1400 1600 1800 1 12,5 25 37,5 50 62,5 75 87,5 100 125 250 375 500 625 750 875 1000 1250 2500 3750 5000 6250 7500 8750 9000 Time in seconds File sizes in Megabytes

FDT performance on transfering different files

FDT ideal Scenario 1 HIDE disable Scenario 2 HIDE enable Scenario 2 HIDE disabled Scenario 2 HIDE enabled

slide-19
SLIDE 19

Total transfer time

19

Scenario 1 Scenario 2 HIDE disabled Scenario 2 HIDE enabled Scenario 3 HIDE disabled Scenario 3 HIDE enabled 125 MB 17 29 23 43 23 1.25 GB 117 225 123 222 138 8.75 GB 773 1569 790 1387 892 200 400 600 800 1000 1200 1400 1600 1800 Time in seconds

Total time for transfering three different files

slide-20
SLIDE 20

Representative sample of Scenario 3

20

10 20 30 40 50 60 70 80 90 100 30 60 90 120 150 180 210 240 Speed in Mbps Time in seconds Component disabled Component enabled

File size 1.25Gb Data points every 5s

slide-21
SLIDE 21

Discussion

 Adequate level of abstraction and portability  Using SDN to enhance data movement  Intelligence based on real time input ― Lower bound of reaction time depended on FDT server ― Topology knowledge should be requested from controller

21

slide-22
SLIDE 22

Conclusion

  • Data Movement Applications

can gain benefits from SDN

  • Through the mentioned degrees of

solving the QoS problem we touched

  • ne and it was successful
  • ExoGENI is well designed environment to

deploy topologies and perform experiments

22

slide-23
SLIDE 23

Future work

  • Reduce reaction time

(highly depended on FDT)

  • Improve intelligence

(get topology knowledge from controller)

  • Investigate a prediction algorithm

(avoid network overload)

23

slide-24
SLIDE 24

24

Thank you