Air Traffic Management with Big Data Analytics Alessandro Ferreira - - PowerPoint PPT Presentation

air traffic management with big data analytics
SMART_READER_LITE
LIVE PREVIEW

Air Traffic Management with Big Data Analytics Alessandro Ferreira - - PowerPoint PPT Presentation

Air Traffic Management with Big Data Analytics Alessandro Ferreira Leite Understanding why a traffic is delayed is a difficult task Historical information Weather Availability of airplanes Concurrent flights .... Data is


slide-1
SLIDE 1

Air Traffic Management with Big Data Analytics

Alessandro Ferreira Leite

slide-2
SLIDE 2
slide-3
SLIDE 3

Understanding why a traffic is delayed is a difficult task

  • Historical information
  • Weather
  • Availability of airplanes
  • Concurrent flights
  • ....
slide-4
SLIDE 4

Data is growing faster than Moore’s law

Source: https://amplab.cs.berkeley.edu/for-big-data-moores-law-means-better-decisions/

slide-5
SLIDE 5

Data has always been big

slide-6
SLIDE 6

Big Data Examples

  • Facebook’s daily logs: 60 TB
  • Google web index: 10+ PB
  • Cost of 1 TB of disk: ~$35
  • Time to read 1 TB from disk: 3 hours

(100 MB/s)

slide-7
SLIDE 7

Big data V’s volume velocity variety veracity value

slide-8
SLIDE 8

Big data V’s volume velocity variety veracity value

not enough space to store all data

slide-9
SLIDE 9

Big data V’s volume velocity variety veracity value

not enough space to store all data not enough idle time to finish proper tuning

slide-10
SLIDE 10

Big data V’s volume velocity variety veracity value

not enough space to store all data not enough idle time to finish proper tuning unpredictable workload change

slide-11
SLIDE 11

Big data V’s volume velocity variety veracity value

not enough space to store all data not enough idle time to finish proper tuning unpredictable workload change not enough resources to process all data

slide-12
SLIDE 12

Big data V’s volume velocity variety veracity value

not enough space to store all data not enough idle time to finish proper tuning unpredictable workload change not enough resources to process all data

One possible solution is to distribute data over multiple machines

slide-13
SLIDE 13

Big data V’s volume velocity variety veracity value

not enough space to store all data not enough idle time to finish proper tuning unpredictable workload change not enough resources to process all data

One possible solution is to distribute data over multiple machines

slide-14
SLIDE 14

How do we split work across machines?

Data access Data processing data Analytics

  • Descriptive statistics
  • Machine Learning
  • MapReduce

data analytics workflow

slide-15
SLIDE 15

How do we find the longest flight for each company?

slide-16
SLIDE 16

How do we find the longest flight for each company?

1503 UA LAX

  • 5
  • 10

... 2536 540 PS BUR 13 5 186 1920 DL BOS 10 32 1876 1840 DL SFO 13 568 272 US BWI 4

  • 2

359 784 PS SEA 7 3 176 796 PS LAX

  • 2

2 237 1525 UA SFO 3

  • 5

1867 632 US SJC 2

  • 4

245 1610 UA MIA 60 34 1365 2032 DL EWR 10 16 789 2134 DL DFW 6 6 914

slide-17
SLIDE 17

How do we find the longest flight for each company?

1503 UA LAX

  • 5
  • 10

... 2536 540 PS BUR 13 5 186 1920 DL BOS 10 32 1876 1840 DL SFO 13 568 272 US BWI 4

  • 2

359 784 PS SEA 7 3 176 796 PS LAX

  • 2

2 237 1525 UA SFO 3

  • 5

1867 632 US SJC 2

  • 4

245 1610 UA MIA 60 34 1365 2032 DL EWR 10 16 789 2134 DL DFW 6 6 914

Flight ID

slide-18
SLIDE 18

How do we find the longest flight for each company?

1503 UA LAX

  • 5
  • 10

... 2536 540 PS BUR 13 5 186 1920 DL BOS 10 32 1876 1840 DL SFO 13 568 272 US BWI 4

  • 2

359 784 PS SEA 7 3 176 796 PS LAX

  • 2

2 237 1525 UA SFO 3

  • 5

1867 632 US SJC 2

  • 4

245 1610 UA MIA 60 34 1365 2032 DL EWR 10 16 789 2134 DL DFW 6 6 914

Flight ID Airline ID

slide-19
SLIDE 19

How do we find the longest flight for each company?

1503 UA LAX

  • 5
  • 10

... 2536 540 PS BUR 13 5 186 1920 DL BOS 10 32 1876 1840 DL SFO 13 568 272 US BWI 4

  • 2

359 784 PS SEA 7 3 176 796 PS LAX

  • 2

2 237 1525 UA SFO 3

  • 5

1867 632 US SJC 2

  • 4

245 1610 UA MIA 60 34 1365 2032 DL EWR 10 16 789 2134 DL DFW 6 6 914

Flight ID Airline ID Distance

slide-20
SLIDE 20

How do we find the longest flight for each company?

1503 UA LAX

  • 5
  • 10

... 2536 540 PS BUR 13 5 186 1920 DL BOS 10 32 1876 1840 DL SFO 13 568 272 US BWI 4

  • 2

359 784 PS SEA 7 3 176 796 PS LAX

  • 2

2 237 1525 UA SFO 3

  • 5

1867 632 US SJC 2

  • 4

245 1610 UA MIA 60 34 1365 2032 DL EWR 10 16 789 2134 DL DFW 6 6 914

Flight ID Airline ID Distance

slide-21
SLIDE 21

How do we find the longest flight for each company?

1503 UA LAX

  • 5
  • 10

... 2536 540 PS BUR 13 5 186 1920 DL BOS 10 32 1876 1840 DL SFO 13 568 272 US BWI 4

  • 2

359 784 PS SEA 7 3 176 796 PS LAX

  • 2

2 237 1525 UA SFO 3

  • 5

1867 632 US SJC 2

  • 4

245 1610 UA MIA 60 34 1365 2032 DL EWR 10 16 789 2134 DL DFW 6 6 914

Flight ID Airline ID Distance

{ UA: 2356, PS: 237, ... }

slide-22
SLIDE 22

And, what if the datasets are really big?

slide-23
SLIDE 23

And, what if the datasets are really big?

1503 UA LAX

  • 5
  • 10

... 2536 540 PS BUR 13 5 186 1920 DL BOS 10 32 1876 1840 DL SFO 13 568 272 US BWI 4

  • 2

359 784 PS SEA 7 3 176 796 PS LAX

  • 2

2 237 1525 UA SFO 3

  • 5

1867 632 US SJC 2

  • 4

245 1610 UA MIA 60 34 1365 2032 DL EWR 10 16 789 2134 DL DFW 6 6 914

slide-24
SLIDE 24

And, what if the datasets are really big?

1503 UA LAX

  • 5
  • 10

... 2536 540 PS BUR 13 5 186 1920 DL BOS 10 32 1876 1840 DL SFO 13 568 272 US BWI 4

  • 2

359 784 PS SEA 7 3 176 796 PS LAX

  • 2

2 237 1525 UA SFO 3

  • 5

1867 632 US SJC 2

  • 4

245 1610 UA MIA 60 34 1365 2032 DL EWR 10 16 789 2134 DL DFW 6 6 914

Flight ID

slide-25
SLIDE 25

And, what if the datasets are really big?

1503 UA LAX

  • 5
  • 10

... 2536 540 PS BUR 13 5 186 1920 DL BOS 10 32 1876 1840 DL SFO 13 568 272 US BWI 4

  • 2

359 784 PS SEA 7 3 176 796 PS LAX

  • 2

2 237 1525 UA SFO 3

  • 5

1867 632 US SJC 2

  • 4

245 1610 UA MIA 60 34 1365 2032 DL EWR 10 16 789 2134 DL DFW 6 6 914

Flight ID Airline ID

slide-26
SLIDE 26

And, what if the datasets are really big?

1503 UA LAX

  • 5
  • 10

... 2536 540 PS BUR 13 5 186 1920 DL BOS 10 32 1876 1840 DL SFO 13 568 272 US BWI 4

  • 2

359 784 PS SEA 7 3 176 796 PS LAX

  • 2

2 237 1525 UA SFO 3

  • 5

1867 632 US SJC 2

  • 4

245 1610 UA MIA 60 34 1365 2032 DL EWR 10 16 789 2134 DL DFW 6 6 914

Flight ID Airline ID Distance

slide-27
SLIDE 27

And, what if the datasets are really big?

1503 UA LAX

  • 5
  • 10

... 2536 540 PS BUR 13 5 186 1920 DL BOS 10 32 1876 1840 DL SFO 13 568 272 US BWI 4

  • 2

359 784 PS SEA 7 3 176 796 PS LAX

  • 2

2 237 1525 UA SFO 3

  • 5

1867 632 US SJC 2

  • 4

245 1610 UA MIA 60 34 1365 2032 DL EWR 10 16 789 2134 DL DFW 6 6 914

Flight ID Airline ID Distance

slide-28
SLIDE 28

And, what if the datasets are really big?

1503 UA LAX

  • 5
  • 10

... 2536 540 PS BUR 13 5 186 1920 DL BOS 10 32 1876 1840 DL SFO 13 568 272 US BWI 4

  • 2

359 784 PS SEA 7 3 176 796 PS LAX

  • 2

2 237 1525 UA SFO 3

  • 5

1867 632 US SJC 2

  • 4

245 1610 UA MIA 60 34 1365 2032 DL EWR 10 16 789 2134 DL DFW 6 6 914

Flight ID Airline ID Distance

slide-29
SLIDE 29

And, what if the datasets are really big?

1503 UA LAX

  • 5
  • 10

... 2536 540 PS BUR 13 5 186 1920 DL BOS 10 32 1876 1840 DL SFO 13 568 272 US BWI 4

  • 2

359 784 PS SEA 7 3 176 796 PS LAX

  • 2

2 237 1525 UA SFO 3

  • 5

1867 632 US SJC 2

  • 4

245 1610 UA MIA 60 34 1365 2032 DL EWR 10 16 789 2134 DL DFW 6 6 914

Flight ID Airline ID Distance

slide-30
SLIDE 30

And, what if the datasets are really big?

1503 UA LAX

  • 5
  • 10

... 2536 540 PS BUR 13 5 186 1920 DL BOS 10 32 1876 1840 DL SFO 13 568 272 US BWI 4

  • 2

359 784 PS SEA 7 3 176 796 PS LAX

  • 2

2 237 1525 UA SFO 3

  • 5

1867 632 US SJC 2

  • 4

245 1610 UA MIA 60 34 1365 2032 DL EWR 10 16 789 2134 DL DFW 6 6 914

Flight ID Airline ID Distance

{UA: 2536, PS: 186, DL: 1876}

slide-31
SLIDE 31

And, what if the datasets are really big?

1503 UA LAX

  • 5
  • 10

... 2536 540 PS BUR 13 5 186 1920 DL BOS 10 32 1876 1840 DL SFO 13 568 272 US BWI 4

  • 2

359 784 PS SEA 7 3 176 796 PS LAX

  • 2

2 237 1525 UA SFO 3

  • 5

1867 632 US SJC 2

  • 4

245 1610 UA MIA 60 34 1365 2032 DL EWR 10 16 789 2134 DL DFW 6 6 914

Flight ID Airline ID Distance

{UA: 2536, PS: 186, DL: 1876} {US: 359, PS: 237, UA:1867}

slide-32
SLIDE 32

And, what if the datasets are really big?

1503 UA LAX

  • 5
  • 10

... 2536 540 PS BUR 13 5 186 1920 DL BOS 10 32 1876 1840 DL SFO 13 568 272 US BWI 4

  • 2

359 784 PS SEA 7 3 176 796 PS LAX

  • 2

2 237 1525 UA SFO 3

  • 5

1867 632 US SJC 2

  • 4

245 1610 UA MIA 60 34 1365 2032 DL EWR 10 16 789 2134 DL DFW 6 6 914

Flight ID Airline ID Distance

{UA: 2536, PS: 186, DL: 1876} {US: 359, PS: 237, UA:1867} {US: 245, UA: 1365, DL: 914}

slide-33
SLIDE 33

And, what if the datasets are really big?

1503 UA LAX

  • 5
  • 10

... 2536 540 PS BUR 13 5 186 1920 DL BOS 10 32 1876 1840 DL SFO 13 568 272 US BWI 4

  • 2

359 784 PS SEA 7 3 176 796 PS LAX

  • 2

2 237 1525 UA SFO 3

  • 5

1867 632 US SJC 2

  • 4

245 1610 UA MIA 60 34 1365 2032 DL EWR 10 16 789 2134 DL DFW 6 6 914

Flight ID Airline ID Distance

{UA: 2536, PS: 186, DL: 1876} {US: 359, PS: 237, UA:1867} {US: 245, UA: 1365, DL: 914}

Machines 1 to 3

slide-34
SLIDE 34

And, what if the datasets are really big?

1503 UA LAX

  • 5
  • 10

... 2536 540 PS BUR 13 5 186 1920 DL BOS 10 32 1876 1840 DL SFO 13 568 272 US BWI 4

  • 2

359 784 PS SEA 7 3 176 796 PS LAX

  • 2

2 237 1525 UA SFO 3

  • 5

1867 632 US SJC 2

  • 4

245 1610 UA MIA 60 34 1365 2032 DL EWR 10 16 789 2134 DL DFW 6 6 914

Flight ID Airline ID Distance

{UA: 2536, PS: 186, DL: 1876} {US: 359, PS: 237, UA:1867} {US: 245, UA: 1365, DL: 914} {UA: 2536, PS: 237, DL: 1876, US: 359}

Machines 1 to 3

slide-35
SLIDE 35

And, what if the datasets are really big?

1503 UA LAX

  • 5
  • 10

... 2536 540 PS BUR 13 5 186 1920 DL BOS 10 32 1876 1840 DL SFO 13 568 272 US BWI 4

  • 2

359 784 PS SEA 7 3 176 796 PS LAX

  • 2

2 237 1525 UA SFO 3

  • 5

1867 632 US SJC 2

  • 4

245 1610 UA MIA 60 34 1365 2032 DL EWR 10 16 789 2134 DL DFW 6 6 914

Flight ID Airline ID Distance

{UA: 2536, PS: 186, DL: 1876} {US: 359, PS: 237, UA:1867} {US: 245, UA: 1365, DL: 914} {UA: 2536, PS: 237, DL: 1876, US: 359}

Machines 1 to 3 Machine 4

slide-36
SLIDE 36

And, what if the datasets are really big?

1503 UA LAX

  • 5
  • 10

... 2536 540 PS BUR 13 5 186 1920 DL BOS 10 32 1876 1840 DL SFO 13 568 272 US BWI 4

  • 2

359 784 PS SEA 7 3 176 796 PS LAX

  • 2

2 237 1525 UA SFO 3

  • 5

1867 632 US SJC 2

  • 4

245 1610 UA MIA 60 34 1365 2032 DL EWR 10 16 789 2134 DL DFW 6 6 914

Flight ID Airline ID Distance

{UA: 2536, PS: 186, DL: 1876} {US: 359, PS: 237, UA:1867} {US: 245, UA: 1365, DL: 914} {UA: 2536, PS: 237, DL: 1876, US: 359}

Machines 1 to 3 Machine 4

slide-37
SLIDE 37

And, what if the datasets are really big?

1503 UA LAX

  • 5
  • 10

... 2536 540 PS BUR 13 5 186 1920 DL BOS 10 32 1876 1840 DL SFO 13 568 272 US BWI 4

  • 2

359 784 PS SEA 7 3 176 796 PS LAX

  • 2

2 237 1525 UA SFO 3

  • 5

1867 632 US SJC 2

  • 4

245 1610 UA MIA 60 34 1365 2032 DL EWR 10 16 789 2134 DL DFW 6 6 914

Flight ID Airline ID Distance

{UA: 2536, PS: 186, DL: 1876} {US: 359, PS: 237, UA:1867} {US: 245, UA: 1365, DL: 914} {UA: 2536, PS: 237, DL: 1876, US: 359}

Machines 1 to 3 Machine 4 Results must fit in one machine

slide-38
SLIDE 38

We can employ the divide and conquer strategy to deal with memory limitation

1503 UA LAX

  • 5
  • 10

... 2536 540 PS BUR 13 5 186 1920 DL BOS 10 32 1876 1840 DL SFO 13 568 272 US BWI 4

  • 2

359 784 PS SEA 7 3 176 796 PS LAX

  • 2

2 237 1525 UA SFO 3

  • 5

1867 632 US SJC 2

  • 4

245 1610 UA MIA 60 34 1365 2032 DL EWR 10 16 789 2134 DL DFW 6 6 914

Flight ID Airline ID Distance

{UA: 2536, PS: 186, DL: 1876} {US: 359, PS: 237, UA:1867} {US: 245, UA: 1365, DL: 914}

Machines 1 to 3

{UA: 2536, (...)} {PS: 237, DL:1867, (...)} {US: 245, (...)}

Machines 1 to 3

slide-39
SLIDE 39

We can employ the divide and conquer strategy to deal with memory limitation

1503 UA LAX

  • 5
  • 10

... 2536 540 PS BUR 13 5 186 1920 DL BOS 10 32 1876 1840 DL SFO 13 568 272 US BWI 4

  • 2

359 784 PS SEA 7 3 176 796 PS LAX

  • 2

2 237 1525 UA SFO 3

  • 5

1867 632 US SJC 2

  • 4

245 1610 UA MIA 60 34 1365 2032 DL EWR 10 16 789 2134 DL DFW 6 6 914

Flight ID Airline ID Distance

{UA: 2536, PS: 186, DL: 1876} {US: 359, PS: 237, UA:1867} {US: 245, UA: 1365, DL: 914}

Machines 1 to 3

{UA: 2536, (...)} {PS: 237, DL:1867, (...)} {US: 245, (...)}

Machines 1 to 3

Map Reduce

slide-40
SLIDE 40

High Performance Analytics Workflow

Collect data Extract features Discovery patterns & develop models Relationships & Graphs Simulate & Analyze Recommend Data Driven Model “A” Data Driven Model “B” Data Driven Model “C”

Big data workload Predictive Analytics

Extract Knowledge

slide-41
SLIDE 41

Strengths & Limitations

  • Strengths
  • analytics are made easy when they fit in the map reduce approach
  • enables easy design of data exploration systems
  • there are mature data and cluster processing systems (e.g., Apache

Spark, Apache Flink)

  • the data processing frameworks automatically take into account

data locations when distributing the task

  • Limitations
  • considerable learning curve
  • we can still face some scalability problem
slide-42
SLIDE 42