Tuning EC2 Network Stack
By : Josh -
I recently had an issue with web requests taking 1.2-1.5 seconds from a service hosted in AWS. I had a small SSD-backed EC2 instance with a small SSD-backed RDS instance running a wordpress site and this type of performance was not acceptable. After a bit of troubleshooting I discovered that the network was suffering from major congestion.
To determine whether or not performance is suffering due to network congestion, I’d reccommend first ruling out swapping, CPU usage, and disk IO, as those can all contribute to network congestion and related symptoms. Once those items have been ruled out, issue the following command to review current network state:
# netstat -s | egrep 'backlog|queue|retrans|fail' 75 input ICMP message failed. 0 ICMP messages failed 145 failed connection attempts 1986800 segments retransmited 1773 packets pruned from receive queue because of socket buffer overrun 77017125 packets directly queued to recvmsg prequeue. 15266317022 packets directly received from backlog 114432650212 packets directly received from prequeue 155427 packets dropped from prequeue 104055915 packets header predicted and directly queued to user 9660 fast retransmits 885 forward retransmits 547 retransmits in slow start 126 sack retransmits failed 130424 packets collapsed in receive queue due to low socket buffer
The netstat -s command will return summary statistics [since last reboot] for each protocol. The above command searches for specific items related to the backlog, queue, retransmissions, and failures, which will give us a good summary of how healthy the stack is under the current load.
With the above output, we can see that there are a number of retransmitted segments, packets pruned from the receive queue due to socket buffer overrun, packets collapsed in receive queue due to low socket buffer, and others.
To fix this issue, I took a look at the current default settings for 3 parameters, the network device backlog and tcp read and write buffer sizes:
# sysctl -a | egrep 'max_backlog|tcp_rmem|tcp_wmem' net.core.netdev_max_backlog = 1000 net.ipv4.tcp_rmem = 4096 87380 6291456 net.ipv4.tcp_wmem = 4096 20480 4194304
The default backlog size is pretty small at 1000 units. The read and write memory are also pretty small for a production server that handles large amounts of web traffic. The first value is the minimum memory allocated to each network socket, the second is the default amount of memory alloated to each network socket, and the third is the maximum amount. The reason for these three numbers is to allow the OS to manage the buffers with regard to various load conditions.
To adjust these values, I typically make an entry or modification to /etc/sysctl.conf and then issue the ‘sysctl -p’ command to read them into memory.
I made a few adjustments to these values to get to the values seen below. After each modification, I would test service performance to ensure the system was operating smoothly and without bottlenecks on CPU, memory, disk, or network. Keep in mind that larger values result in larger memory consumption, especially with more network connections, which requires a careful analysis of memory and swap usage after tuning and performance testing.
net.core.netdev_max_backlog = 10000 net.ipv4.tcp_rmem = 20480 174760 25165824 net.ipv4.tcp_wmem = 20480 174760 25165824
These values produced a much quicker response from the web service and reduced page load times by over half a second for each page, as well as cleaning up network statistic failures, queue, and backlog errors.
I think that the default EC2 values are far too conservative and this might be due to the marketing around the larger instances being more IO performant. I’d highly recommend tuning the TCP stack to get more performance from smaller instances.