Skip to content

Implementation of MapBuilder::build() could be more efficient #1072

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sebastianbergmann opened this issue Apr 26, 2025 · 5 comments
Open

Comments

@sebastianbergmann
Copy link
Owner

sebastianbergmann commented Apr 26, 2025

Here is a script for profiling the code in question.

<?php declare(strict_types=1);
use SebastianBergmann\CodeCoverage\Filter;
use SebastianBergmann\CodeCoverage\StaticAnalysis\CachingFileAnalyser;
use SebastianBergmann\CodeCoverage\StaticAnalysis\ParsingFileAnalyser;
use SebastianBergmann\CodeCoverage\Test\Target\MapBuilder;
use SebastianBergmann\FileIterator\Facade;

require __DIR__ . '/vendor/autoload.php';

$filter = new Filter;
$files = (new Facade)->getFilesAsArray(__DIR__ . '/src', '.php');
$filter->includeFiles($files);

$builder = new MapBuilder;

$map = $builder->build(
    $filter,
    new CachingFileAnalyser(
        '/tmp/coverage-cache',
        new ParsingFileAnalyser(true, false),
        true,
        false
    )
);
@sebastianbergmann
Copy link
Owner Author

CC @edorian @staabm

@staabm
Copy link
Contributor

staabm commented Apr 26, 2025

team analysis yieled

  •  file_get_contents in CachingFileAnalyser is called twice per file
  •  md5 for hashing is slower in recent php, because sha256 is optimized on modern cpus (and recent php versions)
  •  maybe we can drop a is_file right before realpath, or file_get_contents
  •  array_unqiue seems expensive and we maybe can instead write self-de-duplicating array offsets

@staabm
Copy link
Contributor

staabm commented Apr 26, 2025

I had a look for a few hours at the current implementation. Did a few experiments but nothing yielded a meaningful improvement.

either we need a more real world reproducer, or the bottleneck does not kick in as long as the cache is poppulated/warm.

@edorian
Copy link
Contributor

edorian commented Apr 27, 2025

Wrapping my head around this wasn't easy with me being unfamiliar with the code and the variables just being $filename and $directory.

But if I got everything right, we should be able to compute the file hash only once in https://github.com/sebastianbergmann/php-code-coverage/blob/main/src/StaticAnalysis/CachingFileAnalyser.php#L212 and store that in a $filename => hash_file($algom, $filename) lookup, which should get rid of a disk reads at least. If we can assume the source file doesn't change during a coverage run.

Preliminary testing suggests a small speed up in the test case you posted, but I will do some more testing

@sebastianbergmann
Copy link
Owner Author

I have uncommitted changes that, once complete, change the API of the static analysis from files to source, reducing the number of file_get_contents() calls to 1 as well as the hashing. Just need to find the time to finish this refactoring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants