-
Notifications
You must be signed in to change notification settings - Fork 973
normalize HTTPD_COMBINEDLOG matching #280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* http-empty-fields: Change: avoid referrer/user-agent quotes in apache logs do not match apache access response code if empty "-" do not capture empty httpd ident/auth fields Test: refactor specs + spec pattern imperfection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reservations about the need for these breaking changes to the legacy patterns.
Since we have never shipped an ecs-v1
, we are free to do as we wish there, including making these captures optional, dequoting, etc., without it being breaking. When users explicitly opt into ecs_compatibility => v1
, they will be expecting changes to the shape of their data and will be poised to make the appropriate changes to their pipelines, but when users implicitly upgrade their grok plugin they will not be expecting changes to the shape of their data.
In this case, I think it is in our users best interest to diverge in the ecs-v1
, and to leave the legacy as-is.
patterns/legacy/httpd
Outdated
@@ -2,8 +2,8 @@ HTTPDUSER %{EMAILADDRESS}|%{USER} | |||
HTTPDERROR_DATE %{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{YEAR} | |||
|
|||
# Log formats | |||
HTTPD_COMMONLOG %{IPORHOST:clientip} %{HTTPDUSER:ident} %{HTTPDUSER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-) | |||
HTTPD_COMBINEDLOG %{HTTPD_COMMONLOG} %{QS:referrer} %{QS:agent} | |||
HTTPD_COMMONLOG %{IPORHOST:clientip} (?:-|%{HTTPDUSER:ident}) (?:-|%{HTTPDUSER:auth}) \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" (?:-|%{NUMBER:response}) (?:-|%{NUMBER:bytes}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure making the legacy patterns capture IFF value is not -
is an appropriate change, and feels "breaking" to me (since users may define pipelines that depend on the value being -
to represent empty).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, thought it would ease transition to have a two step migration process (esp. since there are other legacy patterns that are pretty much useless and a breaking change could make them more "useful").
but if we rather not break anything in the legacy parts (even if we do a major release) I am fine with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted, have only kept missing (-
) http response matching in legacy mode: 0887051
considering this a fix (previously failed to parse) + it's best to keep the legacy vs ecs patterns matching ~ the same. a spec is included in the changeset already.
... to reduce scope (confusion) of this PR
since this is now reduced to only a simple fix to parse access.log with missing ( |
NOTE: these changes are the base-line for ECS-ification (#267 will rebase off this) and target ecs-wip branch.
Summary of changes :
ident
, userauth
,response
(code) fields will no longer be capturedhttpd indicates these nulls as
-
, meaning that there won't be a'response' => "-"
for such linereferrer
and useragent
fields will be de-quoted:previosly:
'referrer' => '"http://semicomplete.com/presentations/logstash-monitorama-2013/"'
changes to:
'referrer' => 'http://semicomplete.com/presentations/logstash-monitorama-2013/'
These changes will be required for (later) ECS mode, the changes in null field matches are less concerning.
However matching
http.request.referrer
anduser_agent.original
needs to be "de-quoted" to align with ECS.Initially thought I'll just target this for master to be released "before" ECS is shipped.
Unfortunately, there isn't a version limiter in the grok filter (
s.add_runtime_dependency 'logstash-patterns-core'
) - so even if this was major it would eventually slip into the next LS (7.x) release.Reasoning here is that we rather want changes done in relation to ECS to be shipped in one release so users update to changes such as outlined here while still using the legacy set and than handle the new ECS capture names (and type-casts) as the next step. Seems more work and confusion to me to ship these before hand.