Thursday, June 12, 2014

Short and Long Flows (a.k.a) Mice and Elephant: A Primer

This post is in Q&A format. 

What are Elephants and Mice flows?
Elephants and Mice are flows defined based on it's size and length. Elephants are bigger in size and stays for longer duration of time. Mice are smaller in size and stays only for fews seconds (infact milliseconds). 

Are there any defined numbers for size and length ?
Today, It is left to the operator to decide these numbers based on their network. But there are few texts that define these.

In one text, a flow is considered short if size is < 10KB and stays only few hundred milliseconds. Any numbers above is considered as large flow. 
In other text, a flow is considered large if it occupies >= 10% of link bandwidth for >= 1 sec. Rest Mice flows. 

What are the characteristics of Elephant and Mice ?
Typically, Elephants are throughput sensitive where as Mice are delay sensitive. Given that Elephants stays longer, it make sense to detect and do something about it for better throughput. On the other hand, Mice ceases to exist even before they are detected. 

Also, Elephants occupies lot of resources in the network in-terms of Buffer and Queues leading to starvation of buffer for Mice. This impacts the delay sensitive nature of Mice.

What is the ratio between Elephant and Mice in a DC ?
According to measurements done in DC
             - 80% of flows are Mice and remaining are Elephants
             - Most of bytes are from top 10% of large flows. 

What are the examples of Elephant and Mice ?
Elephants: File transfers, VMotions, Video Streams, DDoS packets, etc.
Mice: Map-reduce applications, Request-Response protocols like HTTP, etc.

How are Elephants detected and mitigated ?
There are multiple solutions available in the market today. Few are discussed below.
1) vSwitch(OVS) detects elephants at the edge. Edge becomes a great place to detect due to it's proximity to applications and also can detect is more accurately.  Once detected at the edge various mitigation can be employed
        - Use OVSDB to inform underlay about new Elephant flow. So this becomes more on-fly thingy.
         - Use different VxLAN id's or IP-address to traffic engineer Elephant flows. 

2) Inmon and Brocade got together to detect DDoS using tools like sFlow and Openflow. Brocade switch exports sFlow samples to sFlow-RT module of inMon. sFlow-RT detects Elephant's based on the samples received and send's signature of the attack to a mitiagation application (SDN App). SDN app install's OF rules on to switch to stop such attacks.

3) DCTCP is another tool to handle Elephant and Mice better. It leverages the fact that Elephant lives longer to react to ECN. Destination DCTCP uses ECN to detect congestion in network and informs Source DCTCP about the congestion. Source DCTCP reacts by reducing the window by a factor that depends on the fraction of ECN marked packets.

What are the difference among various approaches described?
I don't see much of a difference between #1 & #2 except that the tools used are different. Probably, one plus with sFlow based approach is with physical servers connected to ToRs. Most of the ToRs today support sFlow at HW level but don't support OVS. If there are Elephant occurring from those physical servers (say storage replication) using sflow based approach would be a plus. 

#3 is completely different from #1 & #2. It uses a form of WRED and requires support from edge inform of DCTCP. DCTCP is probably more reactive and occurs deep in the network. would choose #1 or #2 instead of #3.

Can Elephants be detected a priori ?
Elephants cannot be detected a priori for all cases but for few cases Elephant could be detected based on control message. One such example is VMotion. A prior detection of Elephant would be help to traffic engineer flows better. 

No comments:

Post a Comment