To many TimeWait Connections when git clone (LFS) #9650

gabyx · 2020-01-08T11:57:56Z

Gitea version (or commit ref): 1.10.2
Git version: 2.22
Operating system: Linux, docker, Pushing from repo from: Windows
Proxy:
- nginx-proxy with docker
Database (use [x]):
- PostgreSQL
- MySQL
- MSSQL (mariadb:10.3.11)
- SQLite
Can you reproduce the bug at https://try.gitea.io:
- Yes (provide example URL)
- No
- Not relevant

Description

This issue is a sequel to #8273 and happend with the changes dbd0a2e probably due the used proxy and its settings...:

A clone with git clone of repository with lots of LFS objects (12200 objects, 12Gb) results in way too many connections (max. was approx 4500) in the TIME_WAIT state. The connections are not DB connections. This is crucial as the there is a hard limit on connections in TIME_WAIT state, which when reached crashes/stalls git clone/git lfs fetch commands etc...

The number of TIME_WAIT connections increases when LFS fetches files and it can be reproduced by

rm -rf .git/lfs/objects/*
git lfs fetch --all

Peak Output:

netstat -ant | awk '{print $6}' | sort | uniq -c | sort -n

gives ca 4000 connections in TIME_WAIT.

As seen below: The connections to the DB (port 3306) are not the problem.
There are lots of connections to ports (37312 - 41172). What are these for and why are these not reused?

All Output:

netstat -ant | awk '{print $5 " " $6}' | cut -d: -f2 | sort | uniq -c

is given below

      1 ::ffff:172.18.0.62:37312 TIME_WAIT
      1 ::ffff:172.18.0.62:37318 TIME_WAIT
      1 ::ffff:172.18.0.62:37322 TIME_WAIT
      1 ::ffff:172.18.0.62:37336 TIME_WAIT
      1 ::ffff:172.18.0.62:37338 TIME_WAIT
      1 ::ffff:172.18.0.62:37340 TIME_WAIT
      ... approx. 3000 lines more like this
      1 ::ffff:172.18.0.62:41166 TIME_WAIT
      1 ::ffff:172.18.0.62:41170 TIME_WAIT
      1 ::ffff:172.18.0.62:41172 TIME_WAIT
      1 Address Foreign
      1 and established)
      2 0.0.0.0:* LISTEN
      3 172.18.0.17:3306 ESTABLISHED
      3 :::* LISTEN
     90 172.18.0.17:3306 TIME_WAIT

The text was updated successfully, but these errors were encountered:

gabyx · 2020-01-08T12:56:28Z

It seems that all these connection come from our reverse-proxy nginx. We don't yet know why.
If we directly access gitea without proxy, all these connections are not seen (or a lot less)

  1 Foreign
  1 established)
  5 LISTEN
 62 ESTABLISHED
115 TIME_WAIT

gabyx · 2020-01-08T13:39:26Z

We use nginx-proxy
which automatically generates the following config. Is this config wrong or are the issues more on the gitea server-side?

# git.example.org
upstream git.example.org {
       server 172.18.0.19:3000; # IP of the docker container
}
server {
        server_name git.example.org;
        listen 80 ;
        access_log /var/log/nginx/access.log vhost;
        # Only allow traffic from internal clients
        include /etc/nginx/network_internal.conf;
        include /etc/nginx/vhost.d/git.example.org;
        location / {
                proxy_pass http://git.example.org;
        }
}
server {
        server_name git.example.org;
        listen 443 ssl http2 ;
        access_log /var/log/nginx/access.log vhost;
        return 500;
        ssl_certificate /etc/nginx/certs/default.crt;
        ssl_certificate_key /etc/nginx/certs/default.key;
}

guillep2k · 2020-01-08T16:45:30Z

Maybe you can use a fast-cgi setup to reduce the number of connections. I'm not sure how's that configured, however.

guillep2k · 2020-01-08T16:50:12Z

netstat -ant | awk '{print $5 " " $6}' | cut -d: -f2 | sort | uniq -c

1 ::ffff:172.18.0.62:37312 TIME_WAIT
1 ::ffff:172.18.0.62:37318 TIME_WAIT
1 ::ffff:172.18.0.62:37322 TIME_WAIT
...

Maybe awk '{print $5 " " $6}' it's not the right choice of fields for IPv6 (I don't use IPv6 so I can't say). It looks like you're seeing the client side of the socket for these connections. Try a simple netstat -ant | less to see if there's better information in the full output.

gabyx · 2020-01-09T10:10:56Z

With

netstat -ant | grep "TIME_WAIT" | grep "ffff" | awk '{print $6 " " $5 " <- " $4}' | sort -k 5 | uniq -c -f 4

I get 4546 connections from '::ffff:172.18.0.9:3000' to '::ffff:172.18.0.60:xxxxxx' in 'TIME_WAIT' where 'xxxxx' is a a long range of ports :

 4546 TIME_WAIT ::ffff:172.18.0.60:xxxxxx <- ::ffff:172.18.0.9:3000

::ffff:172.18.0.9 is the IP of the gitea container, and '172.18.0.60' is the proxy

gabyx · 2020-01-09T10:17:03Z

So it seems that gitea does not reuse the connections somehow? But this behavior is only seen when we switch the nginx-proxy... when we directly access the gitea container this bahvior is not seen?
Is there a missmatch in connection settings during communication or so?

We fiddled with keepalive = <number> settings in the upstream configuration of the nginx proxy settings from the useful link https://engineering.gosquared.com/optimising-nginx-node-js-and-networking-for-heavy-workloads
but this did not work either, probably due to the fact that its on the gitea go implementation side??

zeripath · 2020-01-09T12:10:17Z

Have you looked at https://engineering.gosquared.com/optimising-nginx-node-js-and-networking-for-heavy-workloads

This references:

http://nginx.org/en/docs/http/ngx_http_upstream_module.html#keepalive

Where it appears that nginx does not use http/1.1 by default. That might mean that simply adding keepalive, removing the connection header and setting the version as per below would work: (please note this is copied from the nginx docs and would need changing to match Gitea)

upstream http_backend {
    server 127.0.0.1:8080;

    keepalive 16;
}

server {
    ...

    location /http/ {
        proxy_pass http://http_backend;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        ...
    }
}

gabyx · 2020-01-09T14:23:36Z

We have tested this, exactly as stated.
The http version, connection header drop, was set globally, it unfortunately did not show any difference.

stale · 2020-03-09T14:24:31Z

This issue has been automatically marked as stale because it has not had recent activity. I am here to help clear issues left open even if solved or waiting for more insight. This issue will be closed if no further activity occurs during the next 2 weeks. If the issue is still valid just add a comment to keep it alive. Thank you for your contributions.

gabyx · 2020-03-09T16:33:33Z

Just today we upgraded to 1.11.2 and without the nginx proxy (which made trouble) and with
the settings:

[database]
DB_TYPE           = mysql
HOST              = git-db:3306
NAME              = gitea
USER              = gitea
PATH              = /data/gitea/gitea.db
MAX_IDLE_CONNS    = 10
CONN_MAX_LIFETIME = 45s
MAX_OPEN_CONNS    = 10

we dont see TIME_WAITS anymore.

m-a-v · 2020-03-09T18:22:45Z

Does the current configuration cheat sheet need to be adapted?

; Max idle database connections on connnection pool, default is 2
MAX_IDLE_CONNS = 2
; Database connection max life time, default is 0 or 3s mysql (See #6804 & #7071 for reasoning)
CONN_MAX_LIFETIME = 3s
; Database maximum number of open connections, default is 0 meaning no maximum
MAX_OPEN_CONNS = 0

Perhaps one should also point out that there may be problems with (nginx) proxies in connection with LFS?

stale · 2020-05-09T04:51:31Z

This issue has been automatically marked as stale because it has not had recent activity. I am here to help clear issues left open even if solved or waiting for more insight. This issue will be closed if no further activity occurs during the next 2 weeks. If the issue is still valid just add a comment to keep it alive. Thank you for your contributions.

wxiaoguang · 2023-04-25T16:55:06Z

Usually, TIME_WAIT is the expected result if you made a lot of short connections (unless there is a bug)

TCP TIME_WAIT is a normal TCP protocol operation, it means after delivering the last FIN-ACK, client side will wait for double maximum segment life (MSL) Time to pass to be sure the remote TCP received the acknowledgement of its connection termination request. By default, MSL is 2 minutes.

stale bot added the issue/stale label Mar 9, 2020

stale bot removed the issue/stale label Mar 9, 2020

stale bot added the issue/stale label May 9, 2020

zeripath added the issue/confirmed Issue has been reviewed and confirmed to be present or accepted to be implemented label May 9, 2020

stale bot removed the issue/stale label May 9, 2020

wxiaoguang closed this as completed Apr 25, 2023

github-actions bot locked as resolved and limited conversation to collaborators Jun 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

To many TimeWait Connections when git clone (LFS) #9650

To many TimeWait Connections when git clone (LFS) #9650

gabyx commented Jan 8, 2020 •

edited

Loading

gabyx commented Jan 8, 2020

gabyx commented Jan 8, 2020 •

edited

Loading

guillep2k commented Jan 8, 2020

guillep2k commented Jan 8, 2020

gabyx commented Jan 9, 2020 •

edited

Loading

gabyx commented Jan 9, 2020 •

edited

Loading

zeripath commented Jan 9, 2020 •

edited

Loading

gabyx commented Jan 9, 2020

stale bot commented Mar 9, 2020

gabyx commented Mar 9, 2020

m-a-v commented Mar 9, 2020 •

edited

Loading

stale bot commented May 9, 2020

wxiaoguang commented Apr 25, 2023

To many TimeWait Connections when git clone (LFS) #9650

To many TimeWait Connections when git clone (LFS) #9650

Comments

gabyx commented Jan 8, 2020 • edited Loading

Description

gabyx commented Jan 8, 2020

gabyx commented Jan 8, 2020 • edited Loading

guillep2k commented Jan 8, 2020

guillep2k commented Jan 8, 2020

gabyx commented Jan 9, 2020 • edited Loading

gabyx commented Jan 9, 2020 • edited Loading

zeripath commented Jan 9, 2020 • edited Loading

gabyx commented Jan 9, 2020

stale bot commented Mar 9, 2020

gabyx commented Mar 9, 2020

m-a-v commented Mar 9, 2020 • edited Loading

stale bot commented May 9, 2020

wxiaoguang commented Apr 25, 2023

gabyx commented Jan 8, 2020 •

edited

Loading

gabyx commented Jan 8, 2020 •

edited

Loading

gabyx commented Jan 9, 2020 •

edited

Loading

gabyx commented Jan 9, 2020 •

edited

Loading

zeripath commented Jan 9, 2020 •

edited

Loading

m-a-v commented Mar 9, 2020 •

edited

Loading