-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Event ID #36
base: master
Are you sure you want to change the base?
Add Event ID #36
Conversation
Thanks @wernerb! Some thoughts:
To sum it up:
Happy to hear feedback on the above! |
For anyone using an elasticsearch output in filebeat this works out of the box. The documentation isn't clear as filebeat elasticsearch output doesn't support updating documents anyway but here the elasticsearch team member confirms what I see in my setup which is that documents are not overwritten because it uses the Now where things become interesting is for other outputs such as
Elasticsearch can only deduplicate events going to the same index, because obviously a new daily index will not have that event. This is fine because by default daily index setups use the original timestamp to decide what index to go from and you can set as large an horizon as you see fit. To summarise the above:
It really depends on what you think. The rollover stuff with ILM can really be an issue but it is an upstream issue I expect to be fixed. We could remove the message stuff entirely to make things simpler and keep setting NextToken (if available) to state.
This is easy, this is because GetLogEvents doesn't include EventIDs otherwise I would not have changed it. See http://docs.amazonaws.cn/AmazonCloudWatchLogs/latest/APIReference/API_GetLogEvents.html and https://docs.aws.amazon.com/AmazonCloudWatchLogs/latest/APIReference/API_FilterLogEvents.html What is also nice about FilterLogEvents is that a pattern to filter can be added for filtering events (if desirable) and it is possible to query multiple streams in 1 call instead of doing lots of GetLogEvents per stream. See I am assuming results from GetLogEvents comes in, the NextToken is then written out only after handling all the events in it. Because if we have the events in memory, then they should be handled and beat buffers will handle processing everything correctly. |
This changes the API to
FilterLogEventsPages
instead ofGetLogEvents
. The problematic part of it is that NextToken is not in every return message.To fix this I reworked and use the pagination function from aws sdk that handles each page automatically. We then "move the needle" towards the last event time and use that as the next starting time instead.
I want to understand what other purpose the state has except for deduplication. If that is the problem then the new
Meta._id
field will probably fix this: https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-deduplication.htmlI did attempt to change NextToken to LastEventMessage instead.
Could you advise on what to do wrt multiline buffer? I am unsure if this works correctly with LastEventMessage being used as the starttime now when recovering registry. But again, I rather have the beat reprocess the logs it might have missed and send them again. I am curious what you think.
Closes #35