Skip to content

[🐛 Bug]: Code verifying video integrity does not work #2743

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MJB222398 opened this issue Mar 31, 2025 · 12 comments
Closed

[🐛 Bug]: Code verifying video integrity does not work #2743

MJB222398 opened this issue Mar 31, 2025 · 12 comments

Comments

@MJB222398
Copy link

MJB222398 commented Mar 31, 2025

What happened?

I have a Docker Selenium Grid with a Hub, several nodes, and then a separate video container for each node. These video containers have SE_VIDEO_FILE_NAME=auto so the videos recorded have the filename of SessionId.mp4. My understanding is that function wait_for_file_integrity in video.sh should be called automatically when the driver session is ended. So what I am doing is disposing of the web driver (.NET bindings) at the end of each test and then making HTTP calls to the grid status endpoint to verify that the session has indeed finished. I then am grabbing the video file. What I am seeing though is that this video file is malformed and will not play because the file was not terminated yet - it was still being written/flushed to disk at the point I retrieved it.

Looking in the logs there is nothing there indicating that the wait_for_file_integrity function ran at all - though if the file is present and correct on first check there would be no logs. So either the function is not being called, or the function is not working properly and states the video is fine when it isn't, or perhaps the function is running later on in the background or something else? Is this function call blocking session termination untill the video integrity is good? Just not clear at all on how its supposed to work?

This Slack conversation includes some discussion on this issue:
https://seleniumhq.slack.com/archives/C0ABCS03F/p1743072688635059

Command used to start Selenium Grid with Docker (or Kubernetes)

N/A

Relevant log output

N/A

Operating System

Ubuntu

Docker Selenium version (image tag)

4.30.0-20250323 and ffmpeg-7.1.1.1.1-20250323

Selenium Grid chart version (chart version)

No response

Copy link

@MJB222398, thank you for creating this issue. We will troubleshoot it as soon as we can.


Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

@VietND96
Copy link
Member

VietND96 commented Apr 7, 2025

It will be fixed in #2742

@VietND96 VietND96 closed this as completed Apr 7, 2025
@MJB222398
Copy link
Author

MJB222398 commented Apr 8, 2025

@VietND96 I just tried using 4.31.0-20250404 and ffmpeg-7.1-20250404 and its not fixed, the behaviour is the same. Can we re-open this please? Could you outline what you have done that was expected to fix this? Could you answer the question above about how it is supposed to work?

@VietND96
Copy link
Member

VietND96 commented Apr 8, 2025

Actually, that function get removed.
How about the percentage rate of interrupted files that you could observe?

@MJB222398
Copy link
Author

Its basically all of them - virtually 100% are interrupted and are therefore invalid video files

@MJB222398
Copy link
Author

I have added in my own check for video file integrity before I pull the file. Perhaps this is preferable to Selenium doing it itself. I was just concerned because it seemed like Selenium was intending to do this check itself but it wasn't working

@VietND96
Copy link
Member

In CI, we also have the flow to record video and verify the output file integrity. The thing is number of output file might not match with number of actual sessions but integrity passed 100% for those files.
In this case, I am wondering how the container was stopped. Is it force-stopping immediately?

@MJB222398
Copy link
Author

No container is being stopped. To be clearer, what I am now doing (after my comment above where I started to perform the ffmpeg check myself) is:

  • A particular test (in NUnit framework) begins
  • Create a new remote web driver instance:
var webDriver = new RemoteWebDriver(
        remoteAddress: new Uri(_driverConfiguration.Url),
        capabilities: _driverOptionsFactory.Create(browser).ToCapabilities(),
        commandTimeout: TimeSpan.FromSeconds(_driverConfiguration.CommandTimeout));
  • Perform various actions in the browser for my test
  • Test ends
  • I end the web driver session by calling webDriver.Dispose();
  • I poll http://localhost:4444/status until it tells me that my session has definitely ended
  • I now poll the video container, executing ffmpeg -v error -i videoFilePath -f null - repeatedly until I get an exit code of 0 (good video) or until I hit the timeout I have imposed (5s).
  • If the video is good I will pull it and add it as a test attachment whilst I am still in the 'NUnit Context' of that test

So what I am seeing is that despite waiting for 5 seconds (slightly more actually with other delays and code execution in between) after the driver session has definitely ended, some videos (maybe 1%) are still not completed. Before I added in the ffmpeg check step above I was seeing 95% ish videos not completed. For me, it really shouldn't take more than 5 seconds for the recording to finish. Is there any way it can be made faster? When I look in the video container logs for such cases I see:

2025-04-17 09:39:51,205 [video.recorder] - Video recording in progress
2025-04-17 09:39:52,215 [video.recorder] - Video recording in progress
/opt/bin/video.sh: line 135: wait: pid 48004 is not a child of this shell
/opt/bin/video.sh: line 135: wait: pid 48053 is not a child of this shell
/opt/bin/video.sh: line 135: wait: pid 48100 is not a child of this shell
/opt/bin/video.sh: line 135: wait: pid 48147 is not a child of this shell
/opt/bin/video.sh: line 135: wait: pid 48194 is not a child of this shell

@MJB222398
Copy link
Author

So investigating this it looks like in your stop_ffmpeg function you are grabbing any and all ffmpeg processes, but these could belong to another shell - in this case they belong to mine. So I think this function needs to be changed to make sure you are finding your process - not mine

@VietND96
Copy link
Member

Ok, now I can understand where those PID come from. So, you insert your script and run it in the container

@MJB222398
Copy link
Author

I use the Docker CLI to run the command from the host machine, on the video container.

@MJB222398
Copy link
Author

@VietND96 is there any update on this? Also as this ticket is closed it should probably be dealt with on #2775.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants