-
Notifications
You must be signed in to change notification settings - Fork 18k
go/ast: add ParseDirective #68021
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
cc @griesemer per owners |
Perhaps it should be |
CC @lfolger, since we were just discussing the lack of an API like this one. At a high level I do think we should expose this in an API. The proposal looks reasonable to me. I suspect that the variadic form is overkill, preferring the original proposal, but I could be convinced otherwise. |
I'm not convinced the namespaces argument is necessary: the iterator must always visit every node, so there's no efficiency gain by having the iterator perform the filtering, and the non-monotonic behavior for len(namespaces)=0 seems undesirable. Make the iterator yield them all, and let the caller filter. Perhaps we should define a new type, |
👍 on a new type. Though maybe it should just be The variadic proposal may be overkill. I'm not entirely confident. The main argument is performance, as I imagine the iterator would parse the comments for each invocation. If you do need to check multiple namespaces you'd either need to parse the comments n times or iterate over all the directives and implement your own filtering logic and I based it off #67795 so I figured the non-monotonic behavior would be acceptable. OTOH a multi-namespace filter may need to do some allocations or preprocessing and those would likely be the same for the entire program so having it as a separate filter would I do think the most common case is going to be a tool looking for its directives so having some simple namespace filtering built in is going to simplify the majority of callers who would otherwise all have to implement their own filtering. For example, the code I was writing that led me to file this only cares about two directives in its custom namespace so it would be a lot simpler to just write for dir := range g.Directives() {
if dir.Namespace != "mystuff" {
continue
}
// ...
} Built in filtering can also handle the special cases of extern, export, and line, which have no namespace but you'd want to show up in a query for the "go" namespace. Of course, if there's a type it could export a method to handle this. |
But it's not a map, it's a sequence of pairs whose keys may be duplicates. |
I mean that if I wanted to collect results as |
That's true, but most clients will discard most directives (at least ones of the wrong namespace), so a |
For |
The updated proposal so far is: // Directives yields each [directive comment] in g as a [Directive].
//
// [directive comment]: https://go.dev/doc/comment#syntax
func (g *CommentGroup) Directives() iter.Seq[Directive] {
// ...
}
type Directive struct {
Namespace, Name, Arguments string
} For
1 is syntactically correct. There is no namespace for such directives as they are written. 2 is semantically correct. Those directives belong to Go and, as such, implicitly belong to the These directives need to be special cased in parsing so manually setting the namespace to For rendering to a string, these need to be special cased either way (to avoid rendering So for std these need to be special cased and documented either way. For user code, 2 removes a special case as However, it's unlikely that anyone would care about anything other than a small fixed set of namespaces, often a singleton, and it's likely that this set contains neither These are both right. 2 seems more right to me. But ultimately it doesn't seem likely to matter much in practice and should be documented whatever the decision. |
It looks like Directive → Arguments (notes):
If possible, it would be nice to further standardize that leading and trailing white space is ignored (and have gofmt normalize it to a single space). I can open another issue, if that's a possibility. edit, the respective hypothetical normalizations would be:
edit 2 corrected |
I filed #68061 to simplify the namespace question I posted earlier today by litigating it out of existence |
I'm not sure I understand
I would expect it to lead to
Why Is "A" and argument and not the name? |
That was a bug in the comment. Updated it to:
The A isn't lowercase so it's not part of the directive name so it's part of the arguments. Allowing and normalizing leading whitespace would make that clearer. |
I went ahead and filed #68092 for the argument whitespace, if it's possible to change that. |
Summary of how these proposals fit together. This proposal is for: func (g *CommentGroup) Directives() iter.Seq[Directive] {
// ...
}
type Directive struct {
Namespace, Name, Arguments string
} If #68061 is accepted, then it's invariant that If #68092 is accepted, then it's invariant that With both accepted, any |
Should it be |
This proposal has been added to the active column of the proposals project |
I don't think we need the filter as an argument. We also need to add file:line position information in some form (token.Pos or whatever is natural), which basically requires a struct at that point. https://go.dev/doc/comment#Syntax says "//toolname:directive". So something like:
(type Directive should not have field Directive, hence Name) |
@aclements my primary motivation was to make it simpler for third parties to use directives. Even if it can't be used at all internally for whatever reason it's still sufficiently useful. (If it can be used internally that's a nice bonus, of course) |
@jimmyfrasche I basically agree. I mostly want to see some concrete evidence that this API actually simplifies code that needs to read directives. I don't expect there to be many examples from std, which is totally fine, it's just a handy "unbiased" source of test cases. The one example @adonovan posted above seems like kind of a wash. |
One of the problems with comparing it to existing code is all existing code is going to be doing the parsing and filtering in one step: it parses only one particular directive instead of parsing all directives then selecting the one of interest. That's why the initial proposal had a built in filtering mechanism. With the filtering mechanism back in place, the code posted above would be somewhat simpler func extractMagicComments(f *ast.File) []string {
...
for _, cg := range f.Comments {
for dir := range cg.Directives("go:embed", "go:build") {
result := fmt.Sprintf("%s:%s %s", dir.Tool, dir.Name, dir.Args)
results = append(results, result) (and simpler still if Although both rewrites are incorrect as the original also looked for That said maybe the solution is to go lower level and just have a: func ParseDirective(text string) (Directive, bool) |
@gri points out that
Can we specify exactly what |
I'd been working on the assumption that they only go in After looking back, I'm not sure the new directive format is entirely specified. For the x:y family what's specified is only how to tell if a line is a directive but nothing past that point. In |
Ignoring everything else, if the solution is |
The current documentation on the directive format is here: https://go.dev/doc/comment#syntax. It's true that's not totally specified, but it's pretty close. As part of this proposal, we should tighten up that document and write a precise spec for |
Given the loose coupling of |
Have all remaining concerns about this proposal been addressed? The proposal is: package token
// A Directive is a comment of this form:
//
// //tool:name args
//
// For example, this directive:
//
// //go:generate stringer -type Op -trimprefix Op
//
// would have Tool "go", Name "generate", and Args "stringer -type Op -trimprefix Op".
// See https://go.dev/doc/comment#Syntax for specification.
type Directive struct {
Pos Pos // position of start of line comment
Tool string // may be "" if Name is "line", "extern", "export"
Name string
Args string // may contain whitespace
}
func ParseDirective(text string) (Directive, bool) And to refine https://go.dev/doc/comment#syntax into a precise syntax specification. |
Has anyone pulled non-Go directives out of the corpus/proxy to see how it's being used in the wild to help inform the syntax refinement? |
go/ast seems like a more consistent place for it, IMO. CommentMap is also in ast. |
I get not putting it in ast or parser but token also feels off. Another option would be just giving it its own package: package directive // "go/directive"
type Value struct { /* ... */ }
func Parse(string) (Value, bool) { /* ... */ } |
I wish I'd seen this sooner - I was just working on this last week and I hope it's not to late to add my two cents. When parsing For my use case, I actually prune the Go docs from the list of all Thanks so much for this proposal - it's going to shrink my code either way! |
Oops, good catch.
I don't feel strongly about this. I'll note that CommentMap has to live in go/ast because it's all about ast.CommentGroups.
I'm afraid I didn't quite follow this. But, generally speaking, the way you parse directives does depend whether you're looking for directives associated with Decls or just directives anywhere in the file. I'm not sure what would lead to having to parse some CommentGroups twice. One thing https://go.dev/doc/comment#syntax is not very clear on is the required context around a directive comment. E.g., The doc also doesn't capture how the "argument" to the directive is formatted. E.g., |
True, but if calling
It is important and quite intentional that directives may only appear at the start of a comment, as this prevents indented code blocks and arbitrarily wrapped paragraphs from being misinterpreted as a directive.
The |
@alandonovan I think what go generate is doing is basically fine. It's using an approximate heuristic to avoid parsing the file (at the expense of allowing false positives from a directive in a multiline string). I don't think that needs to be considered here. It's still following the directive format, it's just adding an additional constraint (and failure mode). |
Have all remaining concerns about this proposal been addressed? The proposal is: package ast
// A Directive is a comment of this form:
//
// //tool:name args
//
// For example, this directive:
//
// //go:generate stringer -type Op -trimprefix Op
//
// would have Tool "go", Name "generate", and Args "stringer -type Op -trimprefix Op".
// See https://go.dev/doc/comment#Syntax for specification.
type Directive struct {
Tool string // may be "" if Name is "line", "extern", "export"
Name string
Args string // may contain whitespace
}
func ParseDirective(text string) (Directive, bool) And to refine https://go.dev/doc/comment#syntax into a precise syntax specification. |
Based on the discussion above, this proposal seems like a likely accept. The proposal is: package ast
// A Directive is a comment of this form:
//
// //tool:name args
//
// For example, this directive:
//
// //go:generate stringer -type Op -trimprefix Op
//
// would have Tool "go", Name "generate", and Args "stringer -type Op -trimprefix Op".
// See https://go.dev/doc/comment#Syntax for specification.
type Directive struct {
Tool string // may be "" if Name is "line", "extern", "export"
Name string
Args string // may contain whitespace
}
func ParseDirective(text string) (Directive, bool) And to refine https://go.dev/doc/comment#syntax into a precise syntax specification. |
No change in consensus, so accepted. 🎉 The proposal is: package ast
// A Directive is a comment of this form:
//
// //tool:name args
//
// For example, this directive:
//
// //go:generate stringer -type Op -trimprefix Op
//
// would have Tool "go", Name "generate", and Args "stringer -type Op -trimprefix Op".
// See https://go.dev/doc/comment#Syntax for specification.
type Directive struct {
Tool string // may be "" if Name is "line", "extern", "export"
Name string
Args string // may contain whitespace
}
func ParseDirective(text string) (Directive, bool) And to refine https://go.dev/doc/comment#syntax into a precise syntax specification. |
Proposal Details
*ast.CommentGroup
has a helpfulText
method to extract the text of comments that are not directives, but there is no simple means to get the directives of the comment other than by parsing the raw comments.Now that directives are standardized and allow third party directives, while the format is simple, it would be nice to make them easier to access.
The most common case would be for a third party tool to want to see only the directives in its namespace so this use case should be made simplest.
I imagine something like:
The text was updated successfully, but these errors were encountered: