Tuning EC2 Network Stack

I recently had an issue with web requests taking 1.2-1.5 seconds from a service hosted in AWS. I had a small SSD-backed EC2 instance with a small SSD-backed RDS instance running a wordpress site and this type of performance was not acceptable. After a bit of troubleshooting I discovered that the network was suffering from major congestion.

To determine whether or not performance is suffering due to network congestion, I’d reccommend first ruling out swapping, CPU usage, and disk IO, as those can all contribute to network congestion and related symptoms. Once those items have been ruled out, issue the following command to review current network state:

# netstat -s | egrep 'backlog|queue|retrans|fail'
    75 input ICMP message failed.
    0 ICMP messages failed
    145 failed connection attempts
    1986800 segments retransmited
    1773 packets pruned from receive queue because of socket buffer overrun
    77017125 packets directly queued to recvmsg prequeue.
    15266317022 packets directly received from backlog
    114432650212 packets directly received from prequeue
    155427 packets dropped from prequeue
    104055915 packets header predicted and directly queued to user
    9660 fast retransmits
    885 forward retransmits
    547 retransmits in slow start
    126 sack retransmits failed
    130424 packets collapsed in receive queue due to low socket buffer

The netstat -s command will return summary statistics [since last reboot] for each protocol. The above command searches for specific items related to the backlog, queue, retransmissions, and failures, which will give us a good summary of how healthy the stack is under the current load.

With the above output, we can see that there are a number of retransmitted segments, packets pruned from the receive queue due to socket buffer overrun, packets collapsed in receive queue due to low socket buffer, and others.

To fix this issue, I took a look at the current default settings for 3 parameters, the network device backlog and tcp read and write buffer sizes:

# sysctl -a | egrep 'max_backlog|tcp_rmem|tcp_wmem'
net.core.netdev_max_backlog = 1000
net.ipv4.tcp_rmem = 4096        87380   6291456
net.ipv4.tcp_wmem = 4096        20480   4194304

The default backlog size is pretty small at 1000 units. The read and write memory are also pretty small for a production server that handles large amounts of web traffic. The first value is the minimum memory allocated to each network socket, the second is the default amount of memory alloated to each network socket, and the third is the maximum amount. The reason for these three numbers is to allow the OS to manage the buffers with regard to various load conditions.

To adjust these values, I typically make an entry or modification to /etc/sysctl.conf and then issue the ‘sysctl -p’ command to read them into memory.

I made a few adjustments to these values to get to the values seen below. After each modification, I would test service performance to ensure the system was operating smoothly and without bottlenecks on CPU, memory, disk, or network. Keep in mind that larger values result in larger memory consumption, especially with more network connections, which requires a careful analysis of memory and swap usage after tuning and performance testing.

net.core.netdev_max_backlog = 10000
net.ipv4.tcp_rmem = 20480 174760 25165824
net.ipv4.tcp_wmem = 20480 174760 25165824

These values produced a much quicker response from the web service and reduced page load times by over half a second for each page, as well as cleaning up network statistic failures, queue, and backlog errors.

I think that the default EC2 values are far too conservative and this might be due to the marketing around the larger instances being more IO performant. I’d highly recommend tuning the TCP stack to get more performance from smaller instances.

Comments

One response to “Tuning EC2 Network Stack”

Leave a Reply