esnet/iperf

Server stops accepting clients: Too many open files

Open

#683 opened on Jan 7, 2018

View on GitHub
 (6 comments) (0 reactions) (1 assignee)C (4,946 stars) (1,083 forks)batch import
Help Wantedbug

Description

Context

  • Version of iperf3: iperf 3-CURRENT (cJSON 1.5.2) (same issue on git branch STABLE 3.2 and MASTER) (I minimally enhanced debugging output from commit 46cb4b4b904e45c2652006ca339c7cf99c995268)

  • Hardware: RHEL7 Linux VM on KVM

  • Operating system (and distribution, if any): RHEL7 Linux VM on KVM

Bug Report

We're using iperf3 to perform scheduled TCP BW checks from 6 nodes to a central server (every 10m for 10s per probe)

  • Expected Behavior iperf3 server continues to accept clients forever

  • Actual Behavior iperf3 server stops accepting clients after a few hours/days. New clients trigger the following msg (server side log):

### Error IESTREAMCONNECT on accept:iperf3: error - unable to connect stream: Too many open files

(I added the first words to easier detect the source of the msg. It happens at src/iperf_tcp.c#L111, function iperf_tcp_accept(), return from accept() syscall)

The message seems valid: (iperf3 -s -D -p 1234)

# lsof | grep iperf3 | wc -l       
1044
# netstat -n | grep 1234
[...]
tcp6       0      0 10.0.0.107:1234      10.0.1.182:29839     TIME_WAIT  
tcp6       0      0 10.0.0.107:1234      10.0.1.182:29770     TIME_WAIT  
tcp6       0      0 10.0.0.107:1234      10.0.1.182:29857     TIME_WAIT  
tcp6       1      0 10.0.0.107:1234      10.0.1.213:62330     CLOSE_WAIT 
tcp6       1      0 10.0.0.107:1234      10.0.1.213:62331     CLOSE_WAIT 
tcp6       0      0 10.0.0.107:1234      10.0.1.182:29849     TIME_WAIT  
tcp6       1      0 10.0.0.107:1234      10.0.1.213:62332     CLOSE_WAIT 
tcp6       0      0 10.0.0.107:1234      10.0.1.182:29775     TIME_WAIT  
tcp6       0      0 10.0.0.107:1234      10.0.1.182:29786     TIME_WAIT  
[...]
  • Steps to Reproduce
    • start iperf3 server daemon: iperf3 -s -D -p 1234
    • use 6 clients to connect to it. Every client connects once every 10m and performs 10s of measurement. If a client gets denied, it waits for a little more than 10s and tries again (up to 3m, then it gives up)
    • The clients disconnect from the network just after the test, so at least in theory, it could happen that the FIN gets lost.
    • issue typically happens after a few hours to a few days.

Contributor guide