Large-scale NetFlow Information Management
Adrien Raulot, Shahrukh Zaidi
University of Amsterdam Supervisor: Wim Biemolt (SURFnet)
February 5, 2018
Adrien Raulot, Shahrukh Zaidi (UvA) NetFlow Information Management February 5, 2018 1 / 24
Large-scale NetFlow Information Management Adrien Raulot, Shahrukh - - PowerPoint PPT Presentation
Large-scale NetFlow Information Management Adrien Raulot, Shahrukh Zaidi University of Amsterdam Supervisor: Wim Biemolt (SURFnet) February 5, 2018 Adrien Raulot, Shahrukh Zaidi (UvA) NetFlow Information Management February 5, 2018 1 / 24
Adrien Raulot, Shahrukh Zaidi (UvA) NetFlow Information Management February 5, 2018 1 / 24
Adrien Raulot, Shahrukh Zaidi (UvA) NetFlow Information Management February 5, 2018 2 / 24
Adrien Raulot, Shahrukh Zaidi (UvA) NetFlow Information Management February 5, 2018 3 / 24
Adrien Raulot, Shahrukh Zaidi (UvA) NetFlow Information Management February 5, 2018 4 / 24
Adrien Raulot, Shahrukh Zaidi (UvA) NetFlow Information Management February 5, 2018 5 / 24
Adrien Raulot, Shahrukh Zaidi (UvA) NetFlow Information Management February 5, 2018 6 / 24
Adrien Raulot, Shahrukh Zaidi (UvA) NetFlow Information Management February 5, 2018 7 / 24
Adrien Raulot, Shahrukh Zaidi (UvA) NetFlow Information Management February 5, 2018 8 / 24
Adrien Raulot, Shahrukh Zaidi (UvA) NetFlow Information Management February 5, 2018 9 / 24
Adrien Raulot, Shahrukh Zaidi (UvA) NetFlow Information Management February 5, 2018 10 / 24
Adrien Raulot, Shahrukh Zaidi (UvA) NetFlow Information Management February 5, 2018 11 / 24
1 Store NetFlow data into Parquet files on HDFS 2 Load Parquet files using PySpark (Python API) 3 Query the data using Spark SQL Adrien Raulot, Shahrukh Zaidi (UvA) NetFlow Information Management February 5, 2018 12 / 24
Adrien Raulot, Shahrukh Zaidi (UvA) NetFlow Information Management February 5, 2018 13 / 24
1 Convert NetFlow binary data to CSV
2 Write two Spark jobs in Python:
3 Write SQL query
4 Using the Querier, execute and cache the results 5 Proceed with next operations on the cached results
Adrien Raulot, Shahrukh Zaidi (UvA) NetFlow Information Management February 5, 2018 14 / 24
Adrien Raulot, Shahrukh Zaidi (UvA) NetFlow Information Management February 5, 2018 15 / 24
5min 30min 1hr 3.5hrs 7hrs 2 4 6 8 Execution time in minutes 0:08 0:33 1:05 3:33 6:42 NfDump Hadoop+Spark 5min 30min 1hr 3.5hrs 7hrs Time frame 1 2 3 4 5 Execution time in minutes 3:00 2:37 2:58 3:22 3:46
Adrien Raulot, Shahrukh Zaidi (UvA) NetFlow Information Management February 5, 2018 16 / 24
5min 30min 1hr 3.5hrs 7hrs 2 4 6 8 Execution time in minutes 0:08 0:28 1:06 3:39 6:52 NfDump Hadoop+Spark 5min 30min 1hr 3.5hrs 7hrs Time frame 1 2 3 4 5 Execution time in minutes 3:15 3:07 3:09 2:50 2:53
Adrien Raulot, Shahrukh Zaidi (UvA) NetFlow Information Management February 5, 2018 17 / 24
5min 30min 1hr 3.5hrs 7hrs 1 2 3 4 Execution time in minutes 0:09 0:49 2:09 NfDump Hadoop+Spark 5min 30min 1hr 3.5hrs 7hrs Time frame 1 2 3 4 Execution time in minutes 3:22 2:29 3:09 3:15 3:15
Adrien Raulot, Shahrukh Zaidi (UvA) NetFlow Information Management February 5, 2018 18 / 24
5min 30min 1hr 3.5hrs 7hrs 5 10 15 20 25 Execution time in minutes 0:19 1:25 4:04 11:12 23:22 NfDump Hadoop+Spark 5min 30min 1hr 3.5hrs 7hrs Time frame 1 2 3 4 5 Execution time in minutes 2:39 2:38 2:42 3:37 4:03
Adrien Raulot, Shahrukh Zaidi (UvA) NetFlow Information Management February 5, 2018 19 / 24
5min 30min 1hr 3.5hrs 7hrs 20 40 60 80 100 Execution time in minutes 1:02 5:22 11:53 41:44 89:23 NfDump Hadoop+Spark 5min 30min 1hr 3.5hrs 7hrs Time frame 2 4 6 Execution time in minutes 3:14 3:28 3:21 5:06 5:52
Adrien Raulot, Shahrukh Zaidi (UvA) NetFlow Information Management February 5, 2018 20 / 24
Adrien Raulot, Shahrukh Zaidi (UvA) NetFlow Information Management February 5, 2018 21 / 24
Adrien Raulot, Shahrukh Zaidi (UvA) NetFlow Information Management February 5, 2018 22 / 24
Adrien Raulot, Shahrukh Zaidi (UvA) NetFlow Information Management February 5, 2018 23 / 24
Adrien Raulot, Shahrukh Zaidi (UvA) NetFlow Information Management February 5, 2018 24 / 24