-
Notifications
You must be signed in to change notification settings - Fork 185
Global symbol search #31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Global symbol search #31
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this a lot :)
The indexing will help implementation of #9 aswell
There are test failures coming from incorrect parameters (type hint fails), cannot get coverage report until tests pass.
Also needs at least a simple test of course for the workspace symbol request.
Regarding memory usage: 180MB is not that much compared to Chome RAM usage these days 😄
If it really imposes a problem we could serialize()
the PHPDocument
objects, limit the in-memory index (configurable) and if the index is full, write to disk with a FIFO or LRU strategy.
Regarding blocking calls: I think the blocking issue comes from using nextTick
instead of setTimeout
see, my comments. The windows limitation seems to only be that it cannot read single chars in a non-blocking way, only full lines. But that shouldn't apply to us as a request always comes in in one piece and spans multiple lines. As long as we leverage the event loop correctly this should work.
If the indexing is really too blocking, we could spawn a separate process (it doesn't need to communicate, just exit when it finished).
Could you comment how large the project was that took 10 seconds (amount of files, SLOC)?
It would be nice if we could compare this to other implementations, namely crane, PHPStorm, PHPTools for VS.
@@ -69,11 +82,18 @@ public function __construct(ProtocolReader $reader, ProtocolWriter $writer) | |||
*/ | |||
public function initialize(string $rootPath, int $processId, ClientCapabilities $capabilities): InitializeResult |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't aware $rootPath
can be null, need to remove the type hint then and adapt the @param
. Or even better, move it to the end as a parameter and give it default value of null
. There should also be a test for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I didn't know the dispatcher figures out the correct parameter position, cool!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
$processFile = function() use (&$fileList, &$processFile, &$rootPath){ | ||
if ($file = array_pop($fileList)) { | ||
|
||
$uri = 'file://'.(substr($file, -1) == '/' || substr($file, -1) == '\\' ? '' : '/').str_replace('\\', '/', $file); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make a helper function for this, will need it more often. I am for adding a utils.php
with plain functions that are autoloaded through composer.json files
property
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. Will also add a test for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
use PhpParser\PrettyPrinter\Standard as PrettyPrinter; | ||
use PhpParser\NodeVisitor\NameResolver; | ||
|
||
class PhpDocument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for this class, wanted to do that at one point aswell
$stmts = $this->parser->parse($content); | ||
} | ||
catch(Error $e) { | ||
// Parser still throws errors. e.g for unterminated comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really true? If yes, isn't that a bug in PHPParser?
Probably also shouldn't catch all Error
s but only Exception
or PHPParser's Exception type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this weird as well. I had the extension crash multiple times while writing a block comment.
But you are right, we should probably catch this specific exception and add it as a parse error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We clearly set ['throwOnError' => true]
so Imo it should never throw. If it does, could you open an issue at PHPParser, posting the exception? We are using the 3.0 alpha so bugs can happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably a misunderstanding. The comment indicates that this setting only prevents the parser from bailing out with an exception on the first parse error and instructs it to continue parsing anyway. The exception here seems to originate from the Lexer.
|
||
class Project | ||
{ | ||
private $documents; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need some @var
annotations here and document this is an associative array of URI => document
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
$class = get_class($node); | ||
if (!isset(self::NODE_SYMBOL_KIND_MAP[$class])) { | ||
return; | ||
} | ||
|
||
// if we enter a method or function, increase the function counter | ||
if ($class === Node\Stmt\Function_::class || $class === Node\Stmt\ClassMethod::class) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not instanceof
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No particular reason, just reused the $class variable. Instanceof would probably make the purpose clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
// if we leave a method or function, decrease the function counter | ||
$class = get_class($node); | ||
if ($class === Node\Stmt\Function_::class || $class === Node\Stmt\ClassMethod::class) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also here, why not instanceof
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No particular reason, just reused the $class variable. Instanceof would probably make the purpose clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
$this->project->getDocument($uri)->updateAst(file_get_contents($file)); | ||
|
||
Loop\nextTick($processFile); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be setTimeout
. nextTick
will queue the callback before any I/O callbacks. setTimeout
on the other hand will push it to the end of the queue, first executing all other callbacks (like the STDIN handling for example), then getting back to processing. See this note from the Node docs:
This is not a simple alias to setTimeout(fn, 0), it's much more efficient. It runs before any additional I/O events (including timers) fire in subsequent ticks of the event loop.
Note: the next tick queue is completely drained on each pass of the event loop before additional I/O is processed. As a result, recursively setting nextTick callbacks will block any I/O from happening, just like a while(true); loop.
The current way would mean that I/O is blocked (including STDIN), so the language server would become unresponsive until the indexing is finished.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. SetTimeout would also allow dynamic throttling later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
public function findSymbols(string $query) | ||
{ | ||
return array_filter($this->symbols, function($symbol) use(&$query) { | ||
return stripos($symbol->name, $query) !== false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fuzzy searching would be nice for a future update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to 😀 maybe you have a link?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not implement it myself but look for a library from composer that we can integrate (or improve so it can be integrated here).
* @param string $rootPath The rootPath of the workspace. Is null if no folder is open. | ||
* @return void | ||
*/ | ||
private function indexProject(string $rootPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you cannot pass null
if you type hint it as string
. It needs to be string $rootPath = null
then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we check this at the calling site? Indexing a null path doesn't make sense, after all.
I would then adjust the param
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you're right. I just went by the @param
desc
|
||
$numFiles = count($fileList); | ||
if (($numFiles % 100) == 0) { | ||
$this->client->window->logMessage(3, $numFiles.' PHP files remaining.'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to log the file name aswell
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean? Log the name of the next file to be parsed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah something like
Parsing src\LanguageServer.php (1/100)
Parsing src\LanguageClient.php (2/100)
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, also added a memory diagnostic message
Thanks :)
It was this project :) php-language-server with all dependencies included. I don't know the LOC but it has around 1300 Files.
I also think 180MB is fine, however there is room for improvement.
Maybe I am just overthinking this. I will test on a linux machine tomorrow and report back. |
Hi, I want just add a comment about memory consumption. Right now you are talking about relatively small project (1k-2k files). If amount of memory will grow linear then for larger project (e.g. magento 20k+) this value will be extremely high. I think keeping AST for all files from workspace is unnecessary because most of the time such detailed information aren't needed. Small, light model (abstract layer like namespace/class/method) created from AST would be also more useful from code readability point of view. |
@mniewrzal Tests are failing currently, I don't expect it to work 😄
That's why I asked how big the project is this was tested with. I think we could add to the Also see my comment about how we could limit the PHPDocument objects in memory and then either remove them or serialize and write them to a cache on disk, what do you think about this? |
Current coverage is 91.72% (diff: 92.15%)@@ master #31 diff @@
==========================================
Files 14 19 +5
Lines 189 290 +101
Methods 30 45 +15
Messages 0 0
Branches 0 0
==========================================
+ Hits 170 266 +96
- Misses 19 24 +5
Partials 0 0
|
Hold your horses. 😄 I just pushed some fixes, so I can test tomorrow. @mniewrzal I think most of the AST is of little interest to us, anyway. The only situation where the full AST is needed is code formatting, but the user expects this to be relatively slow, so we should be able to reparse in this case. @felixfbecker I had to adjust the expected result for the testDocumentSymbol test. Both methods and properties should have their class as containerName, right? I reverted the fgetc logic for now. It doesn't work on windows anymore (only parses a file after some message arrived) but if I understand the behaviour it should work now under linux. I will test this tomorrow... |
AST node contains lot of redundant information. At the beginning collecting only some of nodes would be ok but when more functions will be added to LS and memory usage will grow then I believe nodes should be transform into something more optimal. Parsing single file is fast and in many places can be done on demand.
I agree. Mechanism for storing collected data should be added later. Not only to reduce amount of used memory but also to avoid parsing after every initialization. Just not sure at the moment what will be the best approach ;-) |
What kind of structure would you propose? Just as a reminder, we also need to save stuff from the docblocks in addition to the node. We need
The protocol has structures to represent all these so it makes sense to convert the AST / PHPDocumentator objects to these structures directly and only save those. It might actually make sense though to save the FQN => symbol map at the |
* @param int $processId The process Id of the parent process that started the server. | ||
* @param ClientCapabilities $capabilities The capabilities provided by the client (editor) | ||
* @param string $rootPath The rootPath of the workspace. Is null if no folder is open. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type here must be string|null
(this is significant for the dispatcher)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
Yes, using protocol structures would be easiest solution. Of course if Symbolinformation/SignatureInfromation/... can store all necessary data. Also its possible that the same information will be duplicated across this structures. At this level I think it will be suitable. For later I'm thinking more about one structure, simpler and much smaller than AST, easy to produce protocol structures from it. For advanced code assist (type detection/type inference) language server need to collect many things like:
But as I wrote this is future. Before that this PR will be great extension to language server capabilities :-) |
* @return string | ||
*/ | ||
function pathToUri(string $filepath): string { | ||
return 'file://'.($filepath[0] == '/' || $filepath[0] == '\\' ? '' : '/').str_replace('\\', '/', $filepath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this implementation is not correct. A file URI must always start with file:///
(triple slash). In your case, you are checking if the first char is a backslash, but on windows it could be a drive letter. Also, special chars need to be URL-encoded. There is a very good implementation I've been using on NPM: https://github.com/sindresorhus/file-url/blob/master/index.js#L17-L24
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how we can test the Windows-blocking issue? |
I did a quick test with communicating over a socket. This worked flawlessly. We should put an OS switch into the client side script and spawn the Server with an argument, telling it to connect through a socket. |
But if this really is a Windows issue, not a PHP one, then how do all the other language servers solve this? TCP has a lot of overhead compared to STDIO. |
I think it is kind of both, but I am really not sure. stdin on windows behaves differently than on *nix but php doesn't include a workaround to be able to treat it as a normal stream. I tested the code on linux and didn't have the problems as on Windows. Loopback sockets don't seem to be so bad performance-wise after all. I'm really out of Ideas here. Personally, I didn't notice any difference in performance. |
This SO answer states that you can read STDIN non-blocking in Windows on a line-by-line basis, which is fine for our usecase as we don't need to process individual chars, only full responses. Maybe we have to look at the implementation of |
$server = new LanguageServer(new ProtocolStreamReader(STDIN), new ProtocolStreamWriter(STDOUT)); | ||
if (count($argv) >= 3 && $argv[1] == '--tcp') { | ||
$address = $argv[2]; | ||
$socket = stream_socket_client('tcp://' . $address, $errno, $errstr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means the IDE has to open a socket on its side, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. extension.js opens a listen socket at 127.0.0.1:random_port and immediately stops listening when the language server connects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get the second part of your sentence, why does it have to stop listening? I though it would then solely communicate through the socket?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only the listen socket is closed, the established connection keeps going. It just won't allow any additional connections after the first one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay so the server should only accept a single connection
The problem ist that |
@madhed are you working on more tests? Would like to base more implementations on your refactoring :) |
@felixfbecker I am :) just couldn't work on it the last few days. I hope to push a ready version later today. |
Tested this an looks good to me! There was one case though were I met this error here:
Any idea how we can prevent this? |
Ugh... didn't even think about this. Edit: will write a fix. |
Can you reproduce the StreamWriter bug? It should be fixed now. |
$totalBytesWritten = 0; | ||
|
||
while ($totalBytesWritten < $msgSize) { | ||
$bytesWritten = fwrite($this->output, substr($data, $totalBytesWritten)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should suppress the notice triggered with @
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
What about the exception? Should we throw another type? Another message?
throw new Error('Could not write message.'); | ||
} | ||
$totalBytesWritten += $bytesWritten; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain a bit more what you are doing here? As I understand we block until everything is written out, is this intended? Should we use a setTimeout or nextTick here?
Also, Error
is reserved for PHP internal errors, should be Exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we write more data than the output buffer can hold we get partial writes, so we loop until all data is written.
This didn't happen before because in blocking mode this loop is handled internally.
We need to do this synchronously because otherwise we could end up with partial messages intermingled with each other.
Will change to Exception
@madhed Do you feel this is ready to merge? I also did some final benchmark. Indexing this project on a Surface Pro 3 (i5-4300U):
|
@felixfbecker I am not aware of any remaining bugs, so if you want, go ahead. I am also using this version productively on an internal PHP Project with 2613 files and had no crashes or anything. Even though some features need improvement I think this will serve as a great starting point for future development. |
@Ahmad-Alhourani that question would be better suited for StackOverflow as a general question about LSP / JSON RPC. |
I don't know what you mean by "protocole doesn't call the method". I am sorry that you are having issues but it is really hard for me to help you, and this is not a support forum, it is an issue tracker. To be precise, this specific thread you are on is a 2 year old pull request. Please ask this in an appropriate forum like StackOverflow. More people can help you there and future readers can more easily find the answers. |
Closes #31
Parsing the whole php-language-server project takes about 10 seconds on my machine.
The php-cli process takes up about 180MB of ram with all symbol information parsed.
Changes made:
To do:
Can use NameResolver for generating FQN containerNames