Skip to content

SSH host key errors with v0.23.0 #378

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jmriebold opened this issue Jun 7, 2022 · 16 comments · Fixed by #384
Closed

SSH host key errors with v0.23.0 #378

jmriebold opened this issue Jun 7, 2022 · 16 comments · Fixed by #384
Labels
bug Something isn't working

Comments

@jmriebold
Copy link

Updated to Flux v0.31.0 today (from v0.30.2) and as soon as there was something to commit, the image-automation-controller immediately started throwing errors about the SSH host key failing verification. For example:

{"level":"error","ts":"2022-06-07T02:38:41.735Z","logger":"controller.imageupdateautomation","msg":"Reconciler error","reconciler group":"image.toolkit.fluxcd.io","reconciler kind":"ImageUpdateAutomation","name":"flux-system","namespace":"flux-system","error":"unable to fetch-connect to remote 'ssh://git@git.company.com/repo': ssh: handshake failed: hostkey could not be verified"}

For context, our Flux repo is configured to use SSH, and we supply a known_hosts file along with the SSH key in our flux-system secret. In addition, we've been running the image-automation-controller (along with the others) with EXPERIMENTAL_GIT_TRANSPORT=true, so I'm a little surprised that things broke with this update instead of back when we enabled this functionality.

@aryan9600
Copy link
Member

aryan9600 commented Jun 7, 2022

hi @jmriebold, could you share more information about this, like the contents of the known_hosts file and the encryption algorithm used for the private key? Also, could you try and see if switching to the git implementation to go-git helps? If you own the ssh git server, I think turning on verbose logging for the server, and posting the logs here, could also help us understand why the handshake fails? Please feel free, to redact any private information. Thanks!

@ViBiOh
Copy link

ViBiOh commented Jun 7, 2022

I encounter the same issue here with Github and I'm using go-git (default value for v1beta2).

I refreshed my known_hosts value to try fix it.

  known_hosts: |-
    github.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg=
    github.com ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ==
    github.com ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOMqqnkVzrm0SdG6UOoqKLsabgH5C9okWi0dh2l9GKJl

@pjbgf pjbgf added the bug Something isn't working label Jun 7, 2022
@pjbgf pjbgf moved this to Up-Next in Maintainers' Focus Jun 7, 2022
@aryan9600
Copy link
Member

aryan9600 commented Jun 7, 2022

@jmriebold is it just IAC which is failing like this, or do you observe the same issue in source-controller as well?

@aryan9600
Copy link
Member

@ViBiOh IAC can't use go-git, it uses libgit2 only. Are you complaining about source-controller instead? If so, could you share the error logs of source-controller? Thanks!

@ViBiOh
Copy link

ViBiOh commented Jun 7, 2022

@ViBiOh IAC can't use go-git, it uses libgit2 only. Are you complaining about source-controller instead? If so, could you share the error logs of source-controller? Thanks!

The reconciliation problem only occurs on ImageAutomation, here an example of logs

{"level":"error","ts":"2022-06-07T14:56:45.421Z","logger":"controller.imageupdateautomation","msg":"Reconciler error","reconciler group":"image.toolkit.fluxcd.io","reconciler kind":"ImageUpdateAutomation","name":"flux-goweb","namespace":"default","error":"unable to fetch-connect to remote 'ssh://git@github.com/ViBiOh/goweb': ssh: handshake failed: hostkey could not be verified"}

@jmriebold
Copy link
Author

@aryan9600, yes, like @ViBiOh it's only the IAC which is failing. All other pods are working as expected. Our known_hosts file hasn't changed for about 2 years (nor have the server's keys, to be clear), so that seems unlikely to be the issue, especially since the source-controller and other pods are working fine.

@nniehoff
Copy link

nniehoff commented Jun 7, 2022

I am having the same issues with IAC, worked yesterday before my upgrade to 0.31.0. Interestingly I am only seeing this on the IAC, but if I switch the git repo over to libgit2 I start seeing it on the source controller as well

@nniehoff
Copy link

nniehoff commented Jun 7, 2022

I enabled debug logging on the source controller and the debug logs simply echoed the error logs. As a workaround I simply reverted the IAC to 0.22.1 and the automation reconciled fine.

@stefanprodan
Copy link
Member

I can't reproduce this with ECDSA, here is my secret:

  known_hosts: |-
    github.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg=
    github.com ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ==
    github.com ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOMqqnkVzrm0SdG6UOoqKLsabgH5C9okWi0dh2l9GKJl
  identity: |-
    -----BEGIN PRIVATE KEY-----
    ......
    -----END PRIVATE KEY-----
  identity.pub: |-
    ecdsa-sha2-nistp384 AAAAE2VjZHNhLXNoYTItbmlzdHAzODQAAAAIbmlzdHAzODQAAABhBBJteTPXiE0SoKdp5APOpDazQSRFB2LEGkQrrUAS5d5PkT4czZKxBj5y/21IP3ZHf0cqtsCHWrwhQ9etrcRf/U2PFD0bgnSLpBFFqzYitSFF1/gxH/F33cIHQNTgC70/WQ==

@stefanprodan
Copy link
Member

stefanprodan commented Jun 7, 2022

To reproduce this we'll need the details on how you've create the SSH key, I guess it wasn't generated with the Flux CLI.

@nniehoff
Copy link

nniehoff commented Jun 7, 2022

I just created my key a couple of days ago for this so I used:

ssh-keygen -t ecdsa -a 100 -f priv_key -C flux@example.com -b 521

I generated the known_hosts with ssh-keyscan github.com which rendered:

# github.com:22 SSH-2.0-babeld-18764741
github.com ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ==
# github.com:22 SSH-2.0-babeld-18764741
github.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg=
# github.com:22 SSH-2.0-babeld-18764741
github.com ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOMqqnkVzrm0SdG6UOoqKLsabgH5C9okWi0dh2l9GKJl
# github.com:22 SSH-2.0-babeld-18764741
# github.com:22 SSH-2.0-babeld-18764741

identity:

-----BEGIN OPENSSH PRIVATE KEY-----
...
-----END OPENSSH PRIVATE KEY-----

identity.pub:

ecdsa-sha2-nistp521 AAAA... flux@example.com

@ViBiOh
Copy link

ViBiOh commented Jun 7, 2022

I have the same kind of SSH key (ed25519), generated manually and pushed into the git repository secret.

@ViBiOh
Copy link

ViBiOh commented Jun 7, 2022

So, I have two ImageUpdateAutomation. The github-ssh secrets are generated from a script (so they are exactly the same).

First one is in namespace monitoring, that use the GitRepository in the flux-system (namespace of controller btw). This one works with the latest version.

---
apiVersion: image.toolkit.fluxcd.io/v1beta1
kind: ImageUpdateAutomation
metadata:
  name: infra
  namespace: monitoring
spec:
[ ... ]
  sourceRef:
    kind: GitRepository
    name: infra
    namespace: flux-system
  update:
    path: flux/k3s/monitoring/
    strategy: Setters

---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: GitRepository
metadata:
  name: infra
  namespace: flux-system
spec:
  gitImplementation: go-git
  interval: 120m
  ref:
    branch: main
  secretRef:
    name: github-ssh

Another automation with all things in the namespace default, this one doesn't work.

---
apiVersion: image.toolkit.fluxcd.io/v1beta1
kind: ImageUpdateAutomation
metadata:
  name: flux-goweb
  namespace: default
spec:
[ ... ]
  sourceRef:
    kind: GitRepository
    name: flux-goweb
    namespace: default
  update:
    path: ./infra
    strategy: Setters

---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: GitRepository
metadata:
  name: flux-goweb
  namespace: default
spec:
  gitImplementation: go-git
  interval: 120m
  ref:
    branch: main
  secretRef:
    name: github-ssh

I've removed extra details from the kubectl info (annotations, metadata, etc). If you need it, ask me. Hope it can help.

aryan9600 added a commit to aryan9600/pkg that referenced this issue Jun 7, 2022
Previously, KnownKey.Matches() accepted a SHA256 hasher as an argument,
which could lead to unintended bugs when calling it in a loop. This
eliminates that, by intializing a new hasher itself instead of relying
on the caller for the same.
Enables us to fix a regression in the source-controller: fluxcd/image-automation-controller#378

Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
aryan9600 added a commit to aryan9600/pkg that referenced this issue Jun 7, 2022
Previously, KnownKey.Matches() accepted a SHA256 hasher as an argument,
which could lead to unintended bugs when calling it in a loop. This
eliminates that, by initializing a new hasher itself instead of relying
on the caller for the same.
Enables us to fix a regression in the source-controller: fluxcd/image-automation-controller#378

Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
aryan9600 added a commit to aryan9600/pkg that referenced this issue Jun 8, 2022
Previously, KnownKey.Matches() accepted a SHA256 hasher as an argument,
which could lead to unintended bugs when calling it in a loop. This
eliminates that, by initializing a new hasher itself instead of relying
on the caller for the same.
Enables us to fix a regression in the source-controller: fluxcd/image-automation-controller#378

Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
aryan9600 added a commit to aryan9600/source-controller that referenced this issue Jun 8, 2022
Earlier, host key verification could potentially fail if there were
multiple entries in the known_hosts file and if the intended encryption
algorithm wasn't the first entry. This happened because we used the same
hasher object to compute the sum of all the public keys present in the
known_hosts file, which led to invalid hashes, resulting in a mismatch
when compared with the hash of the advertised public key. This is fixed,
by not creating the hasher ourselves and instead delegating that to the
function actually doing the matching, ensuring that a new hasher is used
for each comparison.

Regression introduced in v0.25.0 and reported in
fluxcd/image-automation-controller#378

Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
aryan9600 added a commit to aryan9600/source-controller that referenced this issue Jun 8, 2022
Earlier, host key verification could potentially fail if there were
multiple entries in the known_hosts file and if the intended encryption
algorithm wasn't the first entry. This happened because we used the same
hasher object to compute the sum of all the public keys present in the
known_hosts file, which led to invalid hashes, resulting in a mismatch
when compared with the hash of the advertised public key. This is fixed,
by not creating the hasher ourselves and instead delegating that to the
function actually doing the matching, ensuring that a new hasher is used
for each comparison.

Regression introduced in v0.25.0 and reported in
fluxcd/image-automation-controller#378

Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
aryan9600 added a commit to aryan9600/source-controller that referenced this issue Jun 8, 2022
Earlier, host key verification could potentially fail if there were
multiple entries in the known_hosts file and if the intended encryption
algorithm wasn't the first entry. This happened because we used the same
hasher object to compute the sum of all the public keys present in the
known_hosts file, which led to invalid hashes, resulting in a mismatch
when compared with the hash of the advertised public key. This is fixed,
by not creating the hasher ourselves and instead delegating that to the
function actually doing the matching, ensuring that a new hasher is used
for each comparison.

Regression introduced in v0.25.0 and reported in
fluxcd/image-automation-controller#378

Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
@pjbgf pjbgf closed this as completed in #384 Jun 8, 2022
Repository owner moved this from Up-Next to Done in Maintainers' Focus Jun 8, 2022
@pjbgf
Copy link
Member

pjbgf commented Jun 8, 2022

We have just released a new image with the fix: ghcr.io/fluxcd/image-automation-controller:v0.23.2.
Please let us know in case it does not fix your issue.

@ViBiOh
Copy link

ViBiOh commented Jun 8, 2022

Hello 👋

I've tested it and all automations are resolving now 👍 Thank you for the quick fix ;)

@jmriebold
Copy link
Author

Thanks all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

6 participants