Skip to content

To many TimeWait Connections when git clone (LFS) #9650

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 of 8 tasks
gabyx opened this issue Jan 8, 2020 · 13 comments
Closed
3 of 8 tasks

To many TimeWait Connections when git clone (LFS) #9650

gabyx opened this issue Jan 8, 2020 · 13 comments
Labels
issue/confirmed Issue has been reviewed and confirmed to be present or accepted to be implemented

Comments

@gabyx
Copy link

gabyx commented Jan 8, 2020

  • Gitea version (or commit ref): 1.10.2
  • Git version: 2.22
  • Operating system: Linux, docker, Pushing from repo from: Windows
  • Proxy:
  • Database (use [x]):
    • PostgreSQL
    • MySQL
    • MSSQL (mariadb:10.3.11)
    • SQLite
  • Can you reproduce the bug at https://try.gitea.io:
    • Yes (provide example URL)
    • No
    • Not relevant

Description

This issue is a sequel to #8273 and happend with the changes dbd0a2e probably due the used proxy and its settings...:

A clone with git clone of repository with lots of LFS objects (12200 objects, 12Gb) results in way too many connections (max. was approx 4500) in the TIME_WAIT state. The connections are not DB connections. This is crucial as the there is a hard limit on connections in TIME_WAIT state, which when reached crashes/stalls git clone/git lfs fetch commands etc...

The number of TIME_WAIT connections increases when LFS fetches files and it can be reproduced by

rm -rf .git/lfs/objects/*
git lfs fetch --all

Peak Output:

netstat -ant | awk '{print $6}' | sort | uniq -c | sort -n

gives ca 4000 connections in TIME_WAIT.

As seen below: The connections to the DB (port 3306) are not the problem.
There are lots of connections to ports (37312 - 41172). What are these for and why are these not reused?

All Output:

netstat -ant | awk '{print $5 " " $6}' | cut -d: -f2 | sort | uniq -c

is given below

      1 ::ffff:172.18.0.62:37312 TIME_WAIT
      1 ::ffff:172.18.0.62:37318 TIME_WAIT
      1 ::ffff:172.18.0.62:37322 TIME_WAIT
      1 ::ffff:172.18.0.62:37336 TIME_WAIT
      1 ::ffff:172.18.0.62:37338 TIME_WAIT
      1 ::ffff:172.18.0.62:37340 TIME_WAIT
      ... approx. 3000 lines more like this
      1 ::ffff:172.18.0.62:41166 TIME_WAIT
      1 ::ffff:172.18.0.62:41170 TIME_WAIT
      1 ::ffff:172.18.0.62:41172 TIME_WAIT
      1 Address Foreign
      1 and established)
      2 0.0.0.0:* LISTEN
      3 172.18.0.17:3306 ESTABLISHED
      3 :::* LISTEN
     90 172.18.0.17:3306 TIME_WAIT
@gabyx
Copy link
Author

gabyx commented Jan 8, 2020

It seems that all these connection come from our reverse-proxy nginx. We don't yet know why.
If we directly access gitea without proxy, all these connections are not seen (or a lot less)

  1 Foreign
  1 established)
  5 LISTEN
 62 ESTABLISHED
115 TIME_WAIT

@gabyx
Copy link
Author

gabyx commented Jan 8, 2020

We use nginx-proxy
which automatically generates the following config. Is this config wrong or are the issues more on the gitea server-side?

# git.example.org
upstream git.example.org {
       server 172.18.0.19:3000; # IP of the docker container
}
server {
        server_name git.example.org;
        listen 80 ;
        access_log /var/log/nginx/access.log vhost;
        # Only allow traffic from internal clients
        include /etc/nginx/network_internal.conf;
        include /etc/nginx/vhost.d/git.example.org;
        location / {
                proxy_pass http://git.example.org;
        }
}
server {
        server_name git.example.org;
        listen 443 ssl http2 ;
        access_log /var/log/nginx/access.log vhost;
        return 500;
        ssl_certificate /etc/nginx/certs/default.crt;
        ssl_certificate_key /etc/nginx/certs/default.key;
}

@guillep2k
Copy link
Member

Maybe you can use a fast-cgi setup to reduce the number of connections. I'm not sure how's that configured, however.

@guillep2k
Copy link
Member

netstat -ant | awk '{print $5 " " $6}' | cut -d: -f2 | sort | uniq -c

1 ::ffff:172.18.0.62:37312 TIME_WAIT
1 ::ffff:172.18.0.62:37318 TIME_WAIT
1 ::ffff:172.18.0.62:37322 TIME_WAIT
...

Maybe awk '{print $5 " " $6}' it's not the right choice of fields for IPv6 (I don't use IPv6 so I can't say). It looks like you're seeing the client side of the socket for these connections. Try a simple netstat -ant | less to see if there's better information in the full output.

@gabyx
Copy link
Author

gabyx commented Jan 9, 2020

With

netstat -ant | grep "TIME_WAIT" | grep "ffff" | awk '{print $6 " " $5 " <- " $4}' | sort -k 5 | uniq -c -f 4

I get 4546 connections from '::ffff:172.18.0.9:3000' to '::ffff:172.18.0.60:xxxxxx' in 'TIME_WAIT' where 'xxxxx' is a a long range of ports :

 4546 TIME_WAIT ::ffff:172.18.0.60:xxxxxx <- ::ffff:172.18.0.9:3000

::ffff:172.18.0.9 is the IP of the gitea container, and '172.18.0.60' is the proxy

@gabyx
Copy link
Author

gabyx commented Jan 9, 2020

So it seems that gitea does not reuse the connections somehow? But this behavior is only seen when we switch the nginx-proxy... when we directly access the gitea container this bahvior is not seen?
Is there a missmatch in connection settings during communication or so?

We fiddled with keepalive = <number> settings in the upstream configuration of the nginx proxy settings from the useful link https://engineering.gosquared.com/optimising-nginx-node-js-and-networking-for-heavy-workloads
but this did not work either, probably due to the fact that its on the gitea go implementation side??

@zeripath
Copy link
Contributor

zeripath commented Jan 9, 2020

Have you looked at https://engineering.gosquared.com/optimising-nginx-node-js-and-networking-for-heavy-workloads

This references:

http://nginx.org/en/docs/http/ngx_http_upstream_module.html#keepalive

Where it appears that nginx does not use http/1.1 by default. That might mean that simply adding keepalive, removing the connection header and setting the version as per below would work: (please note this is copied from the nginx docs and would need changing to match Gitea)

upstream http_backend {
    server 127.0.0.1:8080;

    keepalive 16;
}

server {
    ...

    location /http/ {
        proxy_pass http://http_backend;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        ...
    }
}

@gabyx
Copy link
Author

gabyx commented Jan 9, 2020

We have tested this, exactly as stated.
The http version, connection header drop, was set globally, it unfortunately did not show any difference.

@stale
Copy link

stale bot commented Mar 9, 2020

This issue has been automatically marked as stale because it has not had recent activity. I am here to help clear issues left open even if solved or waiting for more insight. This issue will be closed if no further activity occurs during the next 2 weeks. If the issue is still valid just add a comment to keep it alive. Thank you for your contributions.

@stale stale bot added the issue/stale label Mar 9, 2020
@gabyx
Copy link
Author

gabyx commented Mar 9, 2020

Just today we upgraded to 1.11.2 and without the nginx proxy (which made trouble) and with
the settings:

[database]
DB_TYPE           = mysql
HOST              = git-db:3306
NAME              = gitea
USER              = gitea
PATH              = /data/gitea/gitea.db
MAX_IDLE_CONNS    = 10
CONN_MAX_LIFETIME = 45s
MAX_OPEN_CONNS    = 10

we dont see TIME_WAITS anymore.

@stale stale bot removed the issue/stale label Mar 9, 2020
@m-a-v
Copy link

m-a-v commented Mar 9, 2020

Does the current configuration cheat sheet need to be adapted?

; Max idle database connections on connnection pool, default is 2
MAX_IDLE_CONNS = 2
; Database connection max life time, default is 0 or 3s mysql (See #6804 & #7071 for reasoning)
CONN_MAX_LIFETIME = 3s
; Database maximum number of open connections, default is 0 meaning no maximum
MAX_OPEN_CONNS = 0

Perhaps one should also point out that there may be problems with (nginx) proxies in connection with LFS?

@stale
Copy link

stale bot commented May 9, 2020

This issue has been automatically marked as stale because it has not had recent activity. I am here to help clear issues left open even if solved or waiting for more insight. This issue will be closed if no further activity occurs during the next 2 weeks. If the issue is still valid just add a comment to keep it alive. Thank you for your contributions.

@stale stale bot added the issue/stale label May 9, 2020
@zeripath zeripath added the issue/confirmed Issue has been reviewed and confirmed to be present or accepted to be implemented label May 9, 2020
@stale stale bot removed the issue/stale label May 9, 2020
@wxiaoguang
Copy link
Contributor

Usually, TIME_WAIT is the expected result if you made a lot of short connections (unless there is a bug)

TCP TIME_WAIT is a normal TCP protocol operation, it means after delivering the last FIN-ACK, client side will wait for double maximum segment life (MSL) Time to pass to be sure the remote TCP received the acknowledgement of its connection termination request. By default, MSL is 2 minutes.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
issue/confirmed Issue has been reviewed and confirmed to be present or accepted to be implemented
Projects
None yet
Development

No branches or pull requests

5 participants