[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug report: VegasTcpAgent



Problem:

Under certain circumstances, TCP Vegas may set its cwnd_ (congestion
window) to 0 and never transmit any packets again, even though there are
packets waiting to be sent and the network is idle.

Problem source and potential solutions:

tcp-vegas.cc, line 137 "cwnd_ = v_newcwnd_".  It's meant to restore cwnd_
to its value before it was inflated due to duplicate ACKs.  Obviously
VegasTcpAgent should check if cwnd_ has indeed been inflated before doing
this.  Problem is, the check (line 136) "if(dupacks_ > NUMDUPACKS &&
cwnd_ > v_newcwnd_)" is *not enough*.

Here's a case where things go wrong:
1. Duplicate ACKs are received, and "v_worried_" is set to 2 (line 305).
2. Before any new ACKs arrives, a timeout occurs, and therefore
   "v_newcwnd_" is set to 0 (line 379).
3. A new ACK arrives, and it's determined that an outstanding packet has
   "expired" (line 281; note that v_worried_ > 0) and therefore dupacks_ =
   NUMDUPACKS (= 3).
4. A duplicate ACK arrives, so ++dupcaks_ (line 290).  "v_newcwnd_ =
   double(win)" is not executed since we are in "CWND_ACTION_TIMEOUT" 
   state and hence v_newcwnd_ stays 0.
5. A new ACK arrives and cwnd_ is set to v_newcwnd_ (line 137), which is
   0.  The TCP sender drops dead.

I got this case using an error model on a link.

Potential solutions:

1. When timeout occurs, set v_worried_ to 0.  It seems to make no sense to
   "keep worried" in common cases, since usually all outstanding packets
   will be retransmitted anyway after a timeout.

However, the problem may still occur if there are more than 3 duplicate
ACKs right after a timeout and therefore v_newcwnd_ stays 0 and dupacks_
becomes > 3.

2. Inflate cwnd_ even though the test on lines 299, 300 fails.  This
   closes a loophole in trying to guarantee that if dupacks_ > 3, cwnd_
   must have been inflated.  I am not sure if there are other loopholes.
   In fact, RenoTcpAgent does this in similar cases (tcp-reno.cc, line 88,
   whether dupack_action() decides to "slow down" or not).  I've tried to
   fix it this way but haven't come up with a "clean" code that I'm happy
   with.

3. Another curious problem is the test "if(dupacks_ > NUMDUPACKS &&  cwnd_
   > v_newcwnd_)" itself (line 136).  I think it should be "if(dupacks_ >=
   NUMDUPACKS &&  cwnd_ > v_newcwnd_)".  In "normal" cases (i.e., the
   problems I mentioned above *not included*), dupacks_ == 3 means cwnd_
   has been inflated, and I don't see why it shouldn't be restored to its
   pre-inflation value in this particular case.

A quick look at Brakmo's TCP Vegas implementation (on x-kernel;
ftp://ftp.cs.arizona.edu/xkernel/new-protocols/Vegas.tar.Z) shows that it
has the same problems of lingering "v_worried_" (1. above) and not
"deflating" cwnd_ in some cases (3. above), so I'm not sure if they are
bugs or features of TCP Vegas.  The x-kernel implementation may not have
the "drop dead" syndrome because it doesn't set v_newcwnd_ to 0 when a
timeout occurs.  However, it may still set a "wrong" cwnd_ value due to
the "v_worried_" problem.

In any case, I think we can't accept the "drop dead" phenomenon as a
feature of VegasTcpAgent.  If anyone knows of a newer TCP Vegas
implementation (than the x-kernel version mentioned above) that has
addressed these three problems, please let us (me and other interested
people) know.  Thanks a lot!

========================================
Kuang-Yeh Wang		[email protected]
University of Maryland at College Park
Department of Computer Science
========================================