Skip to content

normalize HTTPD_COMBINEDLOG matching #280

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

kares
Copy link
Contributor

@kares kares commented Aug 4, 2020

NOTE: these changes are the base-line for ECS-ification (#267 will rebase off this) and target ecs-wip branch.

Summary of changes :

  • missing ident, user auth, response (code) fields will no longer be captured
    httpd indicates these nulls as -, meaning that there won't be a 'response' => "-" for such line
  • referrer and user agent fields will be de-quoted:
    previosly: 'referrer' => '"http://semicomplete.com/presentations/logstash-monitorama-2013/"'
    changes to: 'referrer' => 'http://semicomplete.com/presentations/logstash-monitorama-2013/'

These changes will be required for (later) ECS mode, the changes in null field matches are less concerning.
However matching http.request.referrer and user_agent.original needs to be "de-quoted" to align with ECS.


Initially thought I'll just target this for master to be released "before" ECS is shipped.
Unfortunately, there isn't a version limiter in the grok filter (s.add_runtime_dependency 'logstash-patterns-core') - so even if this was major it would eventually slip into the next LS (7.x) release.

Reasoning here is that we rather want changes done in relation to ECS to be shipped in one release so users update to changes such as outlined here while still using the legacy set and than handle the new ECS capture names (and type-casts) as the next step. Seems more work and confusion to me to ship these before hand.

kares added 6 commits August 4, 2020 12:33
* http-empty-fields:
  Change: avoid referrer/user-agent quotes in apache logs
  do not match apache access response code if empty "-"
  do not capture empty httpd ident/auth fields
  Test: refactor specs + spec pattern imperfection
Copy link
Contributor

@yaauie yaauie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reservations about the need for these breaking changes to the legacy patterns.

Since we have never shipped an ecs-v1, we are free to do as we wish there, including making these captures optional, dequoting, etc., without it being breaking. When users explicitly opt into ecs_compatibility => v1, they will be expecting changes to the shape of their data and will be poised to make the appropriate changes to their pipelines, but when users implicitly upgrade their grok plugin they will not be expecting changes to the shape of their data.

In this case, I think it is in our users best interest to diverge in the ecs-v1, and to leave the legacy as-is.

@@ -2,8 +2,8 @@ HTTPDUSER %{EMAILADDRESS}|%{USER}
HTTPDERROR_DATE %{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{YEAR}

# Log formats
HTTPD_COMMONLOG %{IPORHOST:clientip} %{HTTPDUSER:ident} %{HTTPDUSER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-)
HTTPD_COMBINEDLOG %{HTTPD_COMMONLOG} %{QS:referrer} %{QS:agent}
HTTPD_COMMONLOG %{IPORHOST:clientip} (?:-|%{HTTPDUSER:ident}) (?:-|%{HTTPDUSER:auth}) \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" (?:-|%{NUMBER:response}) (?:-|%{NUMBER:bytes})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure making the legacy patterns capture IFF value is not - is an appropriate change, and feels "breaking" to me (since users may define pipelines that depend on the value being - to represent empty).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, thought it would ease transition to have a two step migration process (esp. since there are other legacy patterns that are pretty much useless and a breaking change could make them more "useful").
but if we rather not break anything in the legacy parts (even if we do a major release) I am fine with that.

Copy link
Contributor Author

@kares kares Aug 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted, have only kept missing (-) http response matching in legacy mode: 0887051
considering this a fix (previously failed to parse) + it's best to keep the legacy vs ecs patterns matching ~ the same. a spec is included in the changeset already.

@kares kares requested a review from yaauie August 10, 2020 07:28
... to reduce scope (confusion) of this PR
@kares
Copy link
Contributor Author

kares commented Aug 10, 2020

since this is now reduced to only a simple fix to parse access.log with missing (-) response code, I'll move this elsewhere

@kares kares closed this Aug 10, 2020
@kares kares removed the request for review from yaauie August 10, 2020 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants