 
              Beta Distribution Lemma: Let π¨ 1 , π¨ 2 , β¦ , π¨ π be independent RVs, where π¨ π βΌ πΆππ’π π₯ π , 1 Then, max{π¨ π } βΌ πΆππ’π βπ₯ π , 1 43
Corollary β’ For every hash function, + = max β π π¦ π β π ~ max π 0,1 ~ max πΆππ’π 1,1 βΌ Beta n, 1 β’ Thus, estimating the value of n by Algorithm 1, is equivalent to estimating the + value of Ξ± in the Beta(Ξ±, 1) distribution of h k 44
The Unified Scheme For estimating the weighted sum: β’ Instead of associating each element with a uniform hashed value β’ β π π¦ π βΌ π(0,1) β’ We associate it with a RV taken from a Beta distribution β’ β π π¦ π βΌ πΆππ’π π₯ π , 1 β’ π₯ π is the elementβs weight 45
Generic Max Sketch Algorithm - Weighted Algorithm 2 β’ Use π different hash functions β’ For every β π and every input element π¦ π : β π (π¦ π ) 1. compute ^ π¦ π βΌ πΆππ’π π₯ transform to β π π , 1 2. + = max β π ^ π¦ π β’ Let β π be the maximum observed value for β π + ) to estimate the value of π₯ + , β 2 + , β¦ , β π β’ Invoke ππ πππΉπ‘π’ππππ’π(β 1 46
The Unified Scheme β’ Practically , if β π π¦ π βΌ π 0,1 β’ Then , 1/π₯ π βΌ πΆππ’π π₯ β π¦ π π π , 1 47
Distributions Summary + βΌ πΆππ’π π, 1 β π Unweighted + βΌ πΆππ’π(w = βπ₯ β π π , 1) Weighted 48
The Unified Scheme β’ The same algorithm that estimates π in the unweighted case can estimate π₯ in the weighted case β’ ππ πππΉπ‘π’ππππ’π() is exactly the same procedure used to estimate the unweighted cardinality in Algorithm 1 49
The Unified Scheme Lemma Estimating π₯ by Algorithm 2 is equivalent to estimating π by Algorithm 1. Thus, Algorithm 2 estimates π₯ with the same variance and bias as that of the underlying procedure used by Algorithm 1. 50
Stochastic Averaging β’ Presented by Flajolet in 1985 β’ Use 2 hash functions instead of π β’ Overcome the computational cost at the price of negligible statistical efficiency in the estimator β s variance 51
Stochastic Averaging β’ Use 2 hash functions: πΌ 1 (π¦ π ) βΌ 1,2, β¦ , π 1. 2. πΌ 2 π¦ π βΌ π(0,1) β’ Remember the maximum observed value of each bucket β’ The generalization to weighted estimator is similar 52
Generic Max Sketch Algorithm (Stochastic Averaging) Algorithm 3 Use 2 different hash functions 1. 1. πΌ 1 (π¦ π ) βΌ 1,2, β¦ , π 2. πΌ 2 π¦ π βΌ π(0,1 ) 2. For every input element π¦ π compute πΌ 1 π¦ π and πΌ 2 π¦ π + = max πΌ 2 π¦ π Let β π | πΌ 1 π¦ π = π 3. be the maximum observed value in the kβth bucket + ) to estimate π + , β 2 + , β¦ , β π Invoke ππ πππΉπ‘π’ππππ’πππ΅(β 1 4. 53
Corollary (Stochastic Averaging) β’ b k = |{πΌ 1 π¦ π = π}| = size of kβth bucket π π β π π = π Β± π π β’ For every hash function, ~ Beta b k , 1 βΌ Beta π + = max πΌ 2 π¦ π | πΌ 1 π¦ π = π β π π , 1 n β’ Thus, estimating the value of m by Algorithm 3, is equivalent to estimating the + value of Ξ± in the Beta(Ξ±, 1) distribution of h k 54
The Unified Scheme (Stochastic Averaging) For estimating the weighted sum: β’ Instead of associating each element with a uniform hashed value β’ πΌ 2 π¦ π βΌ π(0,1) β’ We associate it with a RV taken from a Beta distribution β’ πΌ 2 π¦ π βΌ πΆππ’π π₯ π , 1 β’ π₯ π is the element β s weight β’ b k = β πΌ 1 π¦ π =π π₯ π is the sum of the elements in the k β th bucket π₯ 1 β’ π π = 2 ) π Β± π( π βπ₯ π 55
Generic Max Sketch Algorithm - Weighted (Stochastic Averaging) Algorithm 4 Use 2 different hash functions 1. 1. πΌ 1 (π¦ π ) βΌ 1,2, β¦ , π 2. πΌ 2 π¦ π βΌ π(0,1) 2. For every input element π¦ π : πΌ 1 π¦ π and πΌ 2 π¦ π 1. compute ^ π¦ π βΌ πΆππ’π π₯ transform to πΌ 2 π , 1 2. + = max πΌ 2 ^ π¦ π Let β π | πΌ 1 π¦ π = π 3. be the maximum observed value in the k β th bucket + ) to estimate π₯ + , β 2 + , β¦ , β π 4. Invoke ππ πππΉπ‘π’ππππ’πππ΅(β 1 56
The Unified Scheme β’ Practically , if πΌ 2 π¦ π βΌ π 0,1 β’ Then , 1 π₯π βΌ πΆππ’π π₯ πΌ 2 π¦ π π , 1 57
Distributions Summary π + βΌ πΆππ’π β π m , 1 Unweighted π = βπ₯ + βΌ πΆππ’π(π₯ π Weighted β π π , 1) 58
The Unified Scheme β’ The same algorithm that estimates π in the unweighted case can estimate π₯ in the weighted case β’ ππ πππΉπ‘π’ππππ’πππ΅() is exactly the same procedure used to estimate the unweighted cardinality in Algorithm 3 59
The Unified Scheme Lemma Estimating π₯ by Algorithm 4 is equivalent to estimating π by Algorithm 3. Thus, Algorithm 4 estimates π₯ with the same variance and bias as that of the underlying procedure used by Algorithm 3. 60
Stochastic Averaging β Effect on Variance (Unweighted) β’ Brings computational efficiency at the cost of a delayed asymptotical regime (Lumbroso, 2010) β When n is sufficiently large, the variance of each bucket size π π is negligible β How large n should be to obtain negligible variance of π π in the unified scheme? β’ When the normalized standard deviation of each π π is < 10 β3 , there is negligible loss of statistical efficiency β For example, when n = 10 6 and π = 10 3 βΉ πππ π π π π = 10 β3 β πΉ π π 61
Stochastic Averaging β Effect on Variance (Weighted) 2 βπ₯ π π₯ 2 = 10 β6 βΉ β’ Assuming that 2 βπ₯ π π π β The normalized standard deviation = πππ π₯ 2 π = 10 β3 β πΉ π π β’ However, other choices of the weights may βdelayβ this bound for bigger values of n 62
Stochastic Averaging β Effect on Variance (weighted) Random Distribution of Weights β’ Assume that the weights π₯ π are drawn from a random distribution β’ Using the variance definition: The unified scheme can deal with unbounded number of weights as long as: 1. Weights are positive π ] /πΉ 2 [π₯ 63 2. πππ [π₯ π ] is a small constant
Transformation Between Distributions β’ Each element is hashed β π¦ π βΌ π(0,1) β’ Then, β Some estimators transform β π¦ π into another distribution β’ For example, HyperLogLog (Geometrical) β The unified scheme transforms β(π¦ π ) into a Beta distribution β’ β ^ (π¦ π ) βΌ πΆππ’π (π₯ π , 1) β’ Inverse-Transform Method: πΊ β1 π£ βΌ πΈ π£ βΌ π 0,1 β where, β’ F is the CDF of distribution D β’ F is monotonically non-decreasing function πΊ β1 is the inverse function β’ 64
Transformation Between Distributions β’ In general, β π¦ π is transformed into β ^ π¦ π = πΊ β1 β π¦ π β Inverse-Transfom Method β’ The estimator may keep the original uniform hashed value β Without transformation β In this case, πΊ(π¦) = π¦ 65
The Unified Scheme β’ The desired distribution is πΆππ’π (π₯ π , 1) β CDF: π» max (π¦) = π¦ π₯ π β1 (π£) = π£ 1/π₯ π β CDF inverse: π» πππ¦ 1 π₯ π βΌ πΆππ’π π₯ β1 β’ π» πππ¦ β π¦ π = β π¦ π π , 1 β Inverse-Transform Method To sum up: 1/π₯ π βΌ πΆππ’π π₯ β π π¦ π βΌ π 0,1 βΉ β π π¦ π π , 1 66
Weighted Generalization for Continuous U(0,1) with Stochastic Averaging β’ Chassaing estimator β’ Minimal variance unbiased estimator (MVUE) β’ The estimator uses uniform variables β No transformation is needed, F β1 π£ = π£ π(π β1) β’ Estimate = + ) β(1 ββ π β’ Standard error = 1/ π β’ Storage size 32 β π bits To generalize this estimator π(π β1) Estimate = + ) β(1 ββ π But now, + = max{β π ^ π¦ π } = max β π π¦ π 1/π₯ π β π 67
Weighted Generalization for Continuous U(0,1) with m hash functions β’ Maximum likelihood estimator β’ The estimator uses exponential random variables with parameter 1 β πΊ β1 (π£) = βln π£ βΌ πΉπ¦π(1) π β’ Estimate = + ββ π + = max{β ln(β π (π¦ π ))} β where β π β’ Standard error = 1/ π β’ Storage size 32 β π bits 68
Weighted Generalization for Continuous U(0,1) with m hash functions To generalize this estimator π Estimate = + ββ π But now, 1 + = max{βln(β π ^ π¦ π )} = max βln(β π π¦ π π₯ π ) β π This generalization is identical to the algorithm presented by Cohen, 1995 69
Weighted HyperLogLog with Stochastic Averaging β’ Best known algorithm in terms of the tradeoff between precision and storage size β’ The estimator uses geometric random variables with success probability Β½ β πΊ β1 (π£) = ββ log 2 π£β βΌ π»πππ (1/2) π½ π π 2 β’ Estimate = + β2 ββπ + = max{ββ log 2 πΌ 2 π¦ π β β where β π | πΌ 1 π¦ π = π} β’ Standard error = 1.04/ π β’ Storage size 5 β π bits 70
Weighted HyperLogLog with Stochastic Averaging To generalize this estimator π½ π π 2 Estimate = + β2 ββπ But now, + = max{ββ log 2 πΌ 2 π¦ π 1/π₯ π β β π | πΌ 1 π¦ π = π} β’ The extended algorithm offers the best performance, in terms of statistical accuracy and memory storage, among all the other known algorithms for the weighted problem 71
Conclusion β’ We showed how to generalize every min/max sketch to a weighted version β’ The scheme can be used for obtaining known estimators and new estimators in a generic way β’ The proposed unified scheme uses the unweighted estimator as a black box, and manipulates the input using properties of the Beta distribution β’ We proved that estimating the weighted sum by our unified scheme is statistically equivalent to estimating the unweighted cardinality β’ In particular, we showed that the new scheme can be used to extend the HyperLogLog algorithm to solve the weighted problem β’ The extended algorithm offers the best performance, in terms of statistical accuracy and memory storage, among all the other known algorithms for the weighted problem 72
Efficient Detection of Application Layer DDoS Attacks by a Stateless Device
DoS and DDoS Denial of Service Attack (DoS) β’ Malicious attempt to make a server or a network resource unavailable to users β’ The most common type is flooding the target resource with external requests. β The overload prevents/slows the resource from responding to legitimate traffic Distributed Denial of Service Attack (DDoS) β’ DoS attack where the attack traffic is launched from multiple distributed sources. β’ A DDoS attack is much harder to detect β Multiple attackers to defend against 74
Application DDoS Attacks β’ Seemingly legitimate and innocent requests whose goal is to force the server to allocate a lot of resources in response to every single request β’ Can be activated from a small number of attacking computers β’ Examples: β HTTP request attacks: β’ Legitimate, heavy HTTP requests are sent to a web server, in an attempt to consume a lot of its resources. β’ Each request is very short, but the server needs to work very hard to serve it. β HTTPS/SSL request attacks β’ Work against certain SSL handshake functions, taking advantage of the heavy computation use by SSL β DNS request attacks β’ The attacker overwhelms the DNS server with a series of legitimate or illegitimate DNS requests 75
Application DDoS Attacks Application DDoS attacks are more difficult to deal with than classical DDoS: β’ The traffic pattern is indistinguishable from legitimate traffic β’ The number of attacking machines can be significantly smaller β Typically, it is enough for the attacker to send only hundreds of resource intensive requests, instead of flooding the server with millions of TCP SYNs, as in a volumetric DDoS attack 76
DDoS Protection Architecture β’ Mostly multi-tier: 77
DDoS Protection Architecture β’ As strong as its weakest link β Often this weakest link is tier-2 or 3 β Will be the first to collapse in a targeted Application layer DDoS attack. β’ It is generally assumed that Application layer attacks cannot be detected by the first tier devices, but only by tier-2 and tier-3 devices, which are stateful, this is because: β Many devices β Does not have flow awareness, cannot perform per-flow tasks β Dedicated to fast performance, its processing tasks must be simple and cheap β Lacks deep knowledge of the end applications, and is unable to keep track of the association between packets-flows-applications 78
Previous Work β’ Stateless devices usually estimate the load imposed on a remote server by estimating the number of distinct flows β Cardinality estimation problem β’ Can detect anomalies when the number of distinct flows becomes suspiciously high β Possibly DDoS attack β Alternative: monitor the entropy of selected attributes in the received packets and compare to pre-computed profile β’ Previously proposed schemes have considered all flows as imposing the same load β This is clearly not true in a realistic case where high-workload requests require significantly more server efforts than simple ones β We solve this problem by preclassifying the incoming flows and associating them with different weights according to their load 79
Our Contribution β’ We show how a tier-1 stateless device can acquire significant Application layer information and detect Application layer attacks β’ Early detection will afford better overall protection β Triggers the opening of more tier-2 and tier-3 devices β Triggers the invocation of special tier-1 packet-based filtering rules, which will reduce the load 80
Basic Scheme β’ Main idea: β classify incoming flows according to the load each of them imposes on the server β flows that impose different loads should be mapped in advance into different TCP/UDP ports β’ Consequently, a stateless router that receives a packet can look at the Protocol field and the destination port number in the packet β s header in order to know the load imposed on the server by the flow to which the packet belongs β The total load imposed on the end server during a specific time interval is π· π₯ = β π=1 π₯ π β π π β’ π· is the number of weight classes β’ π π is the number of flows belonging to class π β execute an algorithm that estimates the number of flows for each class. 81
Basic Scheme Formally, The total load imposed on the end server during a specific time interval is π· π₯ = β π=1 π₯ π β π π β’ π· is the number of weight classes β’ π π is the number of flows belonging to class π The problem of measuring the total load imposed on the web server during a specified time is now translated into the problem of estimating the number of flows for each class of weights. 82
HyperLogLog 83
Example: HTTP Assign the same TCP port to all HTTP requests that impose the same load on server: β’ Requests that require a lot of processing can be assigned to port 8090 (weight π₯ 1 ) β’ Requests that require slightly less are assigned to port 8091 (with weight π π < π π ) β’ And so on β¦ 84
Implementation β’ Straightforward for every Application layer protocol that admits a one-to-one mapping to a TCP or a UDP port β Each TCP or UDP flow is associated with one application layer instance β’ However, not the case for HTTP, because of β persistent connection β property. β Allows the client to send multiple HTTP requests over the same TCP connection (flow) β Cannot tell in advance which or how many requests will be sent over the same connection β’ The solution we propose is to map all light requests to one port, and to map each heavier request to its own port β The weight associated with the light requests will take into account their resource consumption and the possibility that multiple light requests may share the same connection 85
Enhanced Scheme β’ Main idea: β Instead of solving the cardinality estimation problem once per each class, the enhanced scheme solves the weighted cardinality estimation problem β The total load is estimated directly, without estimating the number of flows in each class β’ The enhanced scheme with π/π« storage units performs better (has much better variance) than any configuration of the basic scheme, even if the latter uses factor π« more storage units. β Moreover, the enhanced scheme is agnostic to the distribution of the weights and does not need a priori information about the distribution of the weight classes 86
Weighted HyperLogLog 87
Basic Scheme vs. Enhanced Scheme π π π π β’ Minimal variance of basic scheme = π βππ« > π βπ = variance of enhanced scheme β’ The enhanced scheme has smaller variance than the minimal variance of the basic scheme π β’ When the number of different classes π· > 2 , then the variance of the basic scheme is infinite. β Moreover, even if there are only a few classes, and the statistical inefficiency can be tolerated, the basic scheme needs a priori information on the distribution of the weights, while the enhanced scheme does not. β’ The enhanced scheme with π/π· storage units performs better (has much better variance) than any configuration of the basic scheme, even if the latter uses factor π· more storage units. π β as long as the number of weight classes satisfy π· > 2 , and this requirement is satisfied because m is usually very small. 88
Basic Scheme vs. Enhanced Scheme π π π π β’ Minimal variance of basic scheme = π βππ« > π βπ = variance of enhanced scheme 89
Estimating the Load Variance β’ Main idea: β The weighted algorithm is useful for performing management tasks β’ Adding a virtual machine to a web server β’ Adjusting the load balancing criteria, etc β¦ β Not useful for detecting an extreme and sudden increase in the load imposed on the server due to an Application layer attack. β’ Definitions: β n(t) = number of active flows sampled at time t over the last T units of time β w(t) = weighted sum of these flows 90
Estimating the Load Variance β’ π₯ π’ is a random variable that estimates the weighted sum of the flows sampled during time interval [ t β T, t ] β’ Unbiased estimator, we get that 91
Load Variance β’ Variance can be affected not only by excessive load imposed by a few connections originated by an attacker, but also by an excessive number of new legitimate connections. β’ To distinguish between the two cases, we normalize the variance by dividing it by the number of flows n . 92
Normalized Load Variance 93
Simulation Results Detecting the load imposed on a server β’ We study the requests received by the main web server of the Technion campus β’ Assign to each request a weight that represents the load it imposes on the server β’ Compare the results of the weighted scheme to the results of two benchmarks: β’ Actual: β Determines the real load imposed on the web server during every considered time interval by computing the serverβs average response time. β Actual is expected to outperform our scheme β Of course, such a scheme cannot employed by a stateless intermediate device β’ Number of Flows: β Uses HyperLogLog to estimate the number of distinct flows during each time period. β’ How to determine in advance the load imposed on the server by every request? β Because we do not have access to the server, but only to its log files, we assign weights according to the average size of the response file sent by the server to each request 94
Simulation Results Detecting the load imposed on a server We can see a strong correlation between the load estimated by our scheme and Actual: β’ For example, Actual shows a temporary heavy load on the server after 17 minutes, a load that is clearly detected by our scheme (in blue) β’ Another peak, at t = 22, is also detected by our scheme (in green) 95
Simulation Results Detecting the load imposed on a server We can see a strong correlation between the load estimated by our scheme and Actual: β’ Actual shows temporary heavy loads on the server at t = 28 (yellow) and t = 32 (orange), both clearly detected by our scheme as well. 96
Simulation Results Detecting the load imposed on a server β’ For mathematical corroboration, we measured the Pearson correlation coefficient between Actual and our scheme. β’ Let π 0 be the vector of the values of Actual, and π 1 be the vector of the values of our scheme. Then: β’ This ratio varies between 1 and β 1: β the closer it is to either ( β 1) or to 1, the stronger the correlation between the variables; β the closer it is to 0, the weaker the correlation. β’ Actual vs. our scheme: β In the first trace we find that the correlation coefficient is 0 . 85, which indicates a very strong correlation between Actual and our scheme. β In the second trace, the correlation coefficient is 0 . 92, indicating even stronger correlation 97
Simulation Results Detecting the load imposed on a server β’ We then measured the Pearson correlation coefficient between Actual and Number-of-Flows β’ In contrast to the strong correlation between our scheme and Actual, we can see that the correlation between Number-of-Flows and Actual is very weak β In the first trace, the correlation coefficient is only 0 . 38 β In the second trace, the correlation coefficient is 0 . 23 β’ More specifically, β In the first trace, the peak after 22 minutes is not identified by the Number-of-Flows scheme β Moreover, the Number-of-Flows scheme identifies false heavy loads, for example after 1 minute 98
Simulation Results Detecting Application Layer DDoS Attacks β’ We use Wireshark to capture video sessions from YouTube, and manually add three Application DDoS attacks to the original data: a) attack-1 is represented by 30 downloads of a 1-minute video stream starting at 10:00; b) attack-2 is represented by 40 downloads of a 1-minute video stream starting at 20:00; c) attack-3 = 50 downloads of a 1-minute video stream starting at 06:00. β’ We estimate the load variance and the normalized load variance every Ξ = 60 seconds, for π = 1 minute. 99
Simulation Results Detecting Application Layer DDoS Attacks One can easily see that Normalized Load scheme does not detect any of the attacks. β’ This scheme is able to detect only attacks created by a small number of connections that generate a lot of traffic. β’ Although all the attacks added to our log files were triggered by only 30-50 connections, they nonetheless had only a slight effect on the average amount of traffic per connection. The three other schemes successfully detect the three attacks. 100
Recommend
More recommend