Skip to content

operator cte-k8s-operator (1.5.10) #6020

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ssandur-thales
Copy link
Contributor

@ssandur-thales ssandur-thales commented Apr 19, 2025

Thanks submitting your Operator. Please check below list before you create your Pull Request.

New Submissions

Updates to existing Operators

  • Did you create a ci.yaml file according to the update instructions?
  • Is your new CSV pointing to the previous version with the replaces property if you chose replaces-mode via the updateGraph property in ci.yaml?
  • Is your new CSV referenced in the appropriate channel defined in the package.yaml or annotations.yaml ?
  • Have you tested an update to your Operator when deployed via OLM?
  • Is your submission signed?

Your submission should not

  • Modify more than one operator
  • Modify an Operator you don't own
  • Rename an operator - please remove and add with a different name instead
  • Modify any files outside the above mentioned folders
  • Contain more than one commit. Please squash your commits.

Operator Description must contain (in order)

  1. Description about the managed Application and where to find more information
  2. Features and capabilities of your Operator and how to use it
  3. Any manual steps about potential pre-requisites for using your Operator

Operator Metadata should contain

  • Human readable name and 1-liner description about your Operator
  • Valid category name1
  • One of the pre-defined capability levels2
  • Links to the maintainer, source code and documentation
  • Example templates for all Custom Resource Definitions intended to be used
  • A quadratic logo

Remember that you can preview your CSV here.

--

1 If you feel your Operator does not fit any of the pre-defined categories, file an issue against this repo and explain your need

2 For more information see here

Copy link
Contributor

Dear @ssandur-thales,
Some errors and/or warnings were found while doing the check of your operator (cte-k8s-operator/1.5.10) against the entire suite of validators for Operator Framework with Operator-SDK version v1.36.0 and the command $ operator-sdk bundle validate <bundle-path> --select-optional suite=operatorframework.

Errors (:bug:) must be fixed while warnings (:warning:) are informative, and fixing them might improve the quality of your solution.

Type Message
⚠️ Value : (cte-k8s-operator.v1.5.10) csv.Spec.minKubeVersion is not informed. It is recommended you provide this information. Otherwise, it would mean that your operator project can be distributed and installed in any cluster version available, which is not necessarily the case for all projects.

@ssandur-thales
Copy link
Contributor Author

Hello,

I am not able to figure out the reason why The operator deploy and operator upgrade tests are getting ImagePullBack errors? Could you please help me with this?

@ssandur-thales
Copy link
Contributor Author

ssandur-thales commented Apr 20, 2025

The same operator has been certified in the RH certified-operators project via PR redhat-openshift-ecosystem/certified-operators#5597

Signed-off-by: Suresh Sandur <suresh-s.sandur@thalesgroup.com>
@ssandur-thales
Copy link
Contributor Author

I found this in the logs of https://github.com/k8s-operatorhub/community-operators/actions/runs/14573282942/job/40874361228 for my PR.

image

The error logs say:
stderr: 'error: error from server (NotFound): pods "kind-registry-5000-test-operator-cte-k8s-operator-v1-5-10" not found in namespace "default"'

Whereas the pod is running in the test-upgrade namespace. Is that the cause for failure?

@ssandur-thales
Copy link
Contributor Author

@haripate @Allda

Can you please take a look at the failure and let me know if there is any issue with the operator bundle manifest or the CSV? I can not see any reason why the test should have failed.

@ssandur-thales
Copy link
Contributor Author

Hello @haripate, @tomasbakk @Allda @mporrato,

Please review the failure for this PR and let me know if any action needs to be taken from my side? From whatever is available, I do not think it is an Operator issue. There seems to be some issue with the test environment.

@ssandur-thales
Copy link
Contributor Author

@haripate @tomasbakk @mporrato @Allda ,

Helllo Folks,

Can someone please help me with this PR. I need to know the issue that is causing the deploy/upgrade test to fail. Is there any change required from my side or it is an environment issue?

@ssandur-thales
Copy link
Contributor Author

What should I do to get a response???

@Allda
Copy link
Collaborator

Allda commented May 2, 2025

I found this in the logs of https://github.com/k8s-operatorhub/community-operators/actions/runs/14573282942/job/40874361228 for my PR.

image

The error logs say: stderr: 'error: error from server (NotFound): pods "kind-registry-5000-test-operator-cte-k8s-operator-v1-5-10" not found in namespace "default"'

Whereas the pod is running in the test-upgrade namespace. Is that the cause for failure?

Hey @ssandur-thales, the log you mentioned is not causing the failure, as this Ansible task is getting ignored. The issue seems to be in the upgrade operator tests. From the logs, it looks like the operator can't be upgraded from N-1 version to N. There is a suggestion at the end of the tests to debug the issue locally. Have you tried that and was it successful in your environment?

The reason why the operator passed the certification in a different repository is that I believe there is no upgrade check at that test suite.

@ssandur-thales
Copy link
Contributor Author

I found this in the logs of https://github.com/k8s-operatorhub/community-operators/actions/runs/14573282942/job/40874361228 for my PR.
image
The error logs say: stderr: 'error: error from server (NotFound): pods "kind-registry-5000-test-operator-cte-k8s-operator-v1-5-10" not found in namespace "default"'
Whereas the pod is running in the test-upgrade namespace. Is that the cause for failure?

Hey @ssandur-thales, the log you mentioned is not causing the failure, as this Ansible task is getting ignored. The issue seems to be in the upgrade operator tests. From the logs, it looks like the operator can't be upgraded from N-1 version to N. There is a suggestion at the end of the tests to debug the issue locally. Have you tried that and was it successful in your environment?

The reason why the operator passed the certification in a different repository is that I believe there is no upgrade check at that test suite.

hi @Allda,

When I run the certification tests in my K8s cluster environment fail with OLM is already installed, If I uninstall OLM from my cluster and try the tests again, even then it fails.

I have verified that the Operator upgrade happens. successfully if I simply do an upgrade outside of certification tests.

@ssandur-thales
Copy link
Contributor Author

ssandur-thales commented May 5, 2025

I found this in the logs of https://github.com/k8s-operatorhub/community-operators/actions/runs/14573282942/job/40874361228 for my PR.
image
The error logs say: stderr: 'error: error from server (NotFound): pods "kind-registry-5000-test-operator-cte-k8s-operator-v1-5-10" not found in namespace "default"'
Whereas the pod is running in the test-upgrade namespace. Is that the cause for failure?

Hey @ssandur-thales, the log you mentioned is not causing the failure, as this Ansible task is getting ignored. The issue seems to be in the upgrade operator tests. From the logs, it looks like the operator can't be upgraded from N-1 version to N. There is a suggestion at the end of the tests to debug the issue locally. Have you tried that and was it successful in your environment?
The reason why the operator passed the certification in a different repository is that I believe there is no upgrade check at that test suite.

hi @Allda,

When I run the certification tests in my K8s cluster environment fail with OLM is already installed, If I uninstall OLM from my cluster and try the tests again, even then it fails.

I have verified that the Operator upgrade happens. successfully if I simply do an upgrade outside of certification tests.

@Allda,

I managed to get past the OLM issue. now my local test fail with name resolution error.

`TASK [operator_push_image : Push image 'kind-registry:5000/test-operator/cte-k8s-operator:v1.5.10'] ***
task path: /playbooks/upstream/roles/operator_push_image/tasks/main.yml:66
fatal: [localhost]: FAILED! => changed=true
attempts: 5
cmd: podman push --format=docker kind-registry:5000/test-operator/cte-k8s-operator:v1.5.10
delta: '0:00:00.092922'
end: '2025-05-05 12:44:20.596750'
msg: non-zero return code
rc: 125
start: '2025-05-05 12:44:20.503828'
stderr: |-
time="2025-05-05T12:44:20Z" level=warning msg="error reading certificate "/etc/containers/certs.d/kind-registry:5000/ca.crt": ope n /etc/containers/certs.d/kind-registry:5000/ca.crt: no such file or directory"
Getting image source signatures
Copying blob sha256:2a6ac67c456f958f35060bd0adb4dd9fcbf12cfbc5438a50a126ef7953afd980
Copying blob sha256:f96799ae868eea575967a641af451d7a04c58a02840a889b5facde65f048bca7
Copying blob sha256:7ec0a3107f490752ab9b3bbb6b79bf7e4aa09be1e1a87e02a8aa6c295cdf0029
Error: trying to reuse blob sha256:f96799ae868eea575967a641af451d7a04c58a02840a889b5facde65f048bca7 at destination: pinging contain er registry kind-registry:5000: Get "https://kind-registry:5000/v2/": dial tcp: lookup kind-registry: Temporary failure in name resolut ion
stderr_lines:
stdout: ''
stdout_lines:
...ignoring
META: role_complete for localhost

TASK [build_operator_version_bundle : Push bundle image with the timestamp] ****
task path: /playbooks/upstream/roles/build_operator_version_bundle/tasks/main.yml:343
included: /playbooks/upstream/roles/build_operator_version_bundle/tasks/preserve_sha.yml for localhost

TASK [build_operator_version_bundle : Set 'result_rc' to false] ****************
task path: /playbooks/upstream/roles/build_operator_version_bundle/tasks/main.yml:349
ok: [localhost] => changed=false
ansible_facts:
result_rc: false

TASK [build_operator_version_bundle : Failing when bundle was not created or not pushed] ***
task path: /playbooks/upstream/roles/build_operator_version_bundle/tasks/main.yml:371
fatal: [localhost]: FAILED! => changed=false
msg: 'Bundle ''kind-registry:5000/test-operator/cte-k8s-operator:v1.5.10'' was created and pushed : [FAIL]'
`

The image was created, but push to kind-registry did not happen
PR-6020-Error

@ssandur-thales
Copy link
Contributor Author

ssandur-thales commented May 5, 2025

attaching the log output of the local test.
log.zip

Is there any setting with respect to Kind or network that I have to do on my K8s Cluster? Ihave followed the steps mentioned here --> https://k8s-operatorhub.github.io/community-operators/operator-test-suite/

Specifically running tests from a local directory.

@ssandur-thales
Copy link
Contributor Author

ssandur-thales commented May 6, 2025

@Allda,

I was able to successfully complete the tests locally and found the reason for failure too. The cte-k8s-operator uses the kube-rbac-proxy image. Starting version v1.5.10 the image is pulled from registry.redhat.io instead from gcr.io/kubebuilder as in earlier versions. This is because of the deprecation notice for the latter.

registry.redhat.io being a private registry requires authentication to pull images. The cte-k8s-operator expects a secret by name rh-kube-proxy-secret being created with credentials for registry.redhat.io in the namespace in which the operator is installed i.e. test-operators.

In my local env, I ran the command to create the secret in a loop continuously till the test-operators namespace got created.

How can this be achieved in the automated tests?

@Allda
Copy link
Collaborator

Allda commented May 9, 2025

@ssandur-thales, it's good to hear that you found a root cause. If your operator is using images from registries that require authentication (registry.redhat.io) I don't think there is a way how to make it work with k8s operators environment, as users would not be able to install it without the need to subsribe to the registry first.

You can check out this guidance and pick any alternative solution:

@ssandur-thales
Copy link
Contributor Author

Hello @Allda,

We did explore the option of copying the kube-rbac-proxy from gcr.io/kubebuilder to our docker.io repo which is public. However, the operator then wouldn't pass RH certification tests. They required pre-certified image only to be used and hence we switched to the image from registry.redhat.io.

We have moved on to the upgrading the operator-sdk which eliminates the need for the kube-rbac-proxy image in the next version of the operator, however, even there the upgrade test would fail, because the upgrade path would be from 1.5.10 to 1.6.X

Are you suggesting there is no way of getting this and the next version of the operator certified with community-operators project in this scenario? Is there an exception available?

@Allda
Copy link
Collaborator

Allda commented May 14, 2025

Hello @Allda,

We did explore the option of copying the kube-rbac-proxy from gcr.io/kubebuilder to our docker.io repo which is public. However, the operator then wouldn't pass RH certification tests. They required pre-certified image only to be used and hence we switched to the image from registry.redhat.io.

We have moved on to the upgrading the operator-sdk which eliminates the need for the kube-rbac-proxy image in the next version of the operator, however, even there the upgrade test would fail, because the upgrade path would be from 1.5.10 to 1.6.X

Are you suggesting there is no way of getting this and the next version of the operator certified with community-operators project in this scenario? Is there an exception available?

As far as I know, the Red Hat certification process doesn't enforce the usage of only certified content. There are plenty of examples:
https://github.com/search?q=repo%3Aredhat-openshift-ecosystem%2Fcertified-operators%20gcr.io%2Fkubebuilder&type=code

@ssandur-thales
Copy link
Contributor Author

@Allda,

This is the response I got from Redhat Support, when I filed a case with them, then certification with my copied version of kube-rbac-proxy failed.

`We recommend that you use either Red Hat provided containers or already-certified 3rd party containers in your operands. This is neither, nor would it pass certification as it's presently constituted.

In addition, you should probably be using the Red Hat-supplied kube-rbac-policy container rather than one you have from some other source. We have up-to-date versions of this proxy that you can use. The Catalog entry is here: https://catalog.redhat.com/software/containers/openshift4/ose-kube-rbac-proxy-rhel9/652809a5244cb343fb4a4b66
`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants