0-Window Behaviour in Linux =========================== Deepak Nagaraj 23/Jun/2008 This document contains notes about the behaviour of Linux TCP when it or its peer advertises receive window = 0 or < 1 MSS. (If that sounded like Greek, you probably shouldn't be reading this. Familiarity with TCP is assumed.) Note: I did not write this code, read at your own risk. I used Linux 2.4.36 for this study. There are four possibilities. 1) Linux is sender 1a) Receiver advertised 0 window 1b) Receiver advertised <1MSS window 2) Linux is receiver 2a) Sender advertised 0 window 2b) Sender advertised <1MSS window Cases 1a, 2a and 2b are simple. 1a: Linux sender got 0-window ============================== Linux sends a 0-byte probe with sequence number = snd_una - 1. This guarantees the probe to be a duplicate and elicits an ACK from the other host. See tcp_write_wakeup(). Linux will keep trying this "persist probe" with exponential backoff. See tcp_timer.c. 2a: Linux receiver got data when advertising 0-window ====================================================== Linux ignores incoming data when it's advertising 0 window. It sends ACK, however, and continues to advertise 0 window until space grows to more than 1 MSS (receiver-side SWS avoidance). See tcp_data_queue(). 2b: Linux receiver got data when advertising <1MSS window ========================================================== This happens when the receive buffer is almost full but application has not read data. Linux will accept as much data as its previous advertisement allowed. It then ACKs data and advertises (prevwin - curdata) bytes of window. See tcp_select_window(). Now we come to the complicated part. 1b: Linux sender got <1MSS window ================================== This is the case for sender-side SWS avoidance. Linux tries to accumulate data until it's at least 1 MSS, unless there are some special conditions (such as: last packet, PUSH flag set, Nagle's algorithm disabled). When a packet is asked to be sent but Linux cannot send it because it has not accumulated 1 MSS worth of data yet (and no other special conditions hold), it queues the data. It tries to empty this queue whenever it gets a data/ACK packet and sees a bigger window, or when >1MSS data accumulates. However, if no packet is in flight and no delayed-ack timers are set, Linux sets up a probe after 1 RTO. When this probe fires, it finally sends off as much data as the window allows at the time and hopes to get a bigger window with ACK. This is the same probe timer as the 0-window probe timer. The same function handles both 0-window and non-zero window cases. Usually this probe doesn't get a chance because packets will be in flight and the host may get data/ACKs with bigger window. I don't know what is the connection between delayed-ack timer and not having this probe, because it does not piggyback any data with delayed-ack. See tcp_delack_timer(). See also: tcp_check_probe_timer(), __tcp_push_pending_frames(), tcp_nagle_check(), tcp_data_snd_check(), tcp_ack_snd_check().