Log loss when using HTTP input without any errors #10229

ashish-kumar-glean · 2025-04-19T18:46:49Z

Bug Report

Describe the bug

i have a scenario where I am able to consistently produce log loss without any errors in fluentbit logs.

To Reproduce

Rubular link if applicable:
Example log message if applicable:

{"log":"YOUR LOG MESSAGE HERE","stream":"stdout","time":"2018-06-11T14:37:30.681701731Z"}

Steps to reproduce the problem:

With the fluentbit config that i have (attached below), I consistently see log loss without any error logs in fluentbit logs. My client script sends 10 requests of 10000 logs per request to the HTTP input (each log ~2KB), and gets back 201 success response. Which means that the http input should have the logs in its buffer. However the much logging is something that my rewrite_tag filter can't handle (memory given is 10MB), so it pauses, and resumes later. Since all the client requests get a 201 response, I would expect the logs to start getting processed when rewrite_tag resumes again. But somehow, I never get the logs delivered to S3.

These are the fluentbit logs:

[2025/04/20 00:03:21] [ info] [fluent bit] version=4.0.0, commit=, pid=22407
[2025/04/20 00:03:21] [ info] [storage] ver=1.5.2, type=memory+filesystem, sync=normal, checksum=off, max_chunks_up=1
[2025/04/20 00:03:21] [ info] [storage] backlog input plugin: storage_backlog.1
[2025/04/20 00:03:21] [ info] [simd    ] disabled
[2025/04/20 00:03:21] [ info] [cmetrics] version=0.9.9
[2025/04/20 00:03:21] [ info] [ctraces ] version=0.6.2
[2025/04/20 00:03:21] [ info] [input:http:http.0] initializing
[2025/04/20 00:03:21] [ info] [input:http:http.0] storage_strategy='memory' (memory only)
[2025/04/20 00:03:21] [ info] [input:storage_backlog:storage_backlog.1] initializing
[2025/04/20 00:03:21] [ info] [input:storage_backlog:storage_backlog.1] storage_strategy='memory' (memory only)
[2025/04/20 00:03:21] [ info] [input:storage_backlog:storage_backlog.1] queue memory limit: 4.8M
[2025/04/20 00:03:21] [ info] [input:emitter:emitter_for_rewrite_tag.0] initializing
[2025/04/20 00:03:21] [ info] [input:emitter:emitter_for_rewrite_tag.0] storage_strategy='memory' (memory only)
[2025/04/20 00:03:21] [ info] [input:emitter:emitter_for_rewrite_tag.4] initializing
[2025/04/20 00:03:21] [ info] [input:emitter:emitter_for_rewrite_tag.4] storage_strategy='memory' (memory only)
[2025/04/20 00:03:21] [ info] [output:s3:s3.0] Using upload size 10000000 bytes
[2025/04/20 00:03:21] [ info] [output:s3:s3.0] worker #0 started
[2025/04/20 00:03:21] [ info] [sp] stream processor started
[2025/04/20 00:03:34] [ warn] [input] emitter_for_rewrite_tag.4 paused (mem buf overlimit)
[2025/04/20 00:03:34] [ info] [input] pausing emitter_for_rewrite_tag.4
[2025/04/20 00:04:06] [ info] [output:s3:s3.0] Successfully uploaded object /glean-sensitive-logs-bigquery/2025/03/15/24/33:28-Pvzbn9ZL.log
[2025/04/20 00:04:06] [ info] [input] resume emitter_for_rewrite_tag.4
[2025/04/20 00:04:06] [ info] [input] emitter_for_rewrite_tag.4 resume (mem buf overlimit)
[2025/04/20 00:04:45] [ info] [output:s3:s3.0] Successfully uploaded object /sensitive-logs/2025/03/15/24/33:30-JKaFr7L6.log
[2025/04/20 00:05:26] [ info] [output:s3:s3.0] Successfully uploaded object /sensitive-logs/2025/03/15/24/33:33-YfkwU34g.log

This is my fluentbit config (please dont mind unoptimized values, i have been trying to create any scenario where http input would pause):

[SERVICE]
   Flush                     3
   Grace                     30
   Log_Level                 info
   Daemon                    off
   Parsers_File              parsers.conf
   storage.path              /Users/ashish/fluent-bit/flb-storage/
   storage.sync              normal
   storage.checksum          off
   storage.backlog.mem_limit 5M
   storage.max_chunks_up     1

[INPUT]
   name http
   listen 0.0.0.0
   port 9890
   buffer_max_size 10M
   buffer_chunk_size 2M
   mem_buf_limit 10MB
   storage.pause_on_chunks_overlimit on

[FILTER]
   Name rewrite_tag
   Match application.*
   Rule $log .*s3LogGroup.* console_log_s3 false
   Emitter_Mem_Buf_Limit 20M

# actual logs are wrapped in a 'log' key, get the actual log
[FILTER]
   Name parser
   Match console_log_s3
   Key_Name log
   Parser json
   Reserve_Data False

# filter out logs that have s3LogGroup key
[FILTER]
   Name grep
   Match console_log_s3_http
   Regex s3LogGroup .+

# parse timestamp and add year, month, day, hour to the record
[FILTER]
   Name lua
   Match console_log_s3_http
   Script parse_timestamp.lua
   Call parse_timestamp

[FILTER]
   Name rewrite_tag
   Match console_log_s3_http
   Rule $s3LogGroup ^(.*)$ console_log_s3_http.$1.$year.$month.$day.$hour false
   Emitter_Mem_Buf_Limit 10M

# all other input processing should ignore logs that have s3LogGroup in it
[FILTER]
   name                  grep
   match                 application.*
   exclude               log /.*s3LogGroup.*/

[OUTPUT]
   Name s3
   Match console_log_s3_http.*
   bucket some_bucket_name
   region us-west-1
   json_date_key timestamp
   json_date_format iso8601
   total_file_size 10M
   upload_timeout 10s
   use_put_object On
   s3_key_format /$TAG[1]/$TAG[2]/$TAG[3]/$TAG[4]/$TAG[5]/%M:%S-$UUID.log
   retry_limit 5
   store_dir /tmp/fluent-bit/s3

This is the script that i use to generate data and send the logs:

#!/bin/bash
X=10               # Maximum number of attempts
TOTAL_LOGS=10000   # Total number of logs to send
PAYLOAD_FILE="/tmp/log_payload.json"  # File to store the JSON payload
MAX_RETRIES=10    # Maximum number of retries per iteration
RETRY_DELAY=5     # Delay in seconds between retries

LOG_LINE='a 2KB json log that has key "s3LogGroup" set'

# Function to create the payload file with N log entries
create_payload_file() {
    local num_logs=$1
    echo "Creating payload file with $num_logs log entries..."
    
    # Start with opening bracket
    echo "[" > $PAYLOAD_FILE
    
    # Append log entries
    i=1
    while [ $i -le $num_logs ]; do
        echo -n "$LOG_LINE" >> $PAYLOAD_FILE
        if [ $i -lt $num_logs ]; then
            echo "," >> $PAYLOAD_FILE
        fi
        i=$((i + 1))
    done
    
    # Close with closing bracket
    echo "]" >> $PAYLOAD_FILE
    
    echo "Payload file created, size: $(du -h $PAYLOAD_FILE | cut -f1)"
}

# Function to send request and retry until status code 201
send_request_with_retry() {
    local retry_count=0
    local success=false
    
    while [ $retry_count -lt $MAX_RETRIES ] && [ "$success" = false ]; do
        # Send the request using the file
        echo "Attempt $((retry_count + 1)): Sending request with $TOTAL_LOGS logs from file"
        START_TIME=$(date +%s.%N)
        HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" -d @$PAYLOAD_FILE -X POST -H "content-type: application/json" http://localhost:9890/console_log_s3_http)
        END_TIME=$(date +%s.%N)
        
        # Calculate time taken in seconds
        TIME_TAKEN=$(echo "$END_TIME - $START_TIME" | bc)
        
        echo "HTTP Status Code: $HTTP_STATUS"
        echo "Request took: ${TIME_TAKEN} seconds"
        
        if [ "$HTTP_STATUS" -eq 201 ]; then
            echo "Success! Received status code 201."
            success=true
        else
            retry_count=$((retry_count + 1))
            if [ $retry_count -lt $MAX_RETRIES ]; then
                echo "Status code was not 201. Retrying in $RETRY_DELAY seconds..."
                sleep $RETRY_DELAY
            else
                echo "Maximum retry attempts reached. Moving to next iteration."
            fi
        fi
    done
    
    return $([ "$success" = true ] && echo 0 || echo 1)
}

# Create the payload file with all logs (do this only once)
create_payload_file $TOTAL_LOGS

count=0
successful_iterations=0
TOTAL_START_TIME=$(date +%s.%N)

while [ $count -lt $X ]; do
    count=$((count + 1))
    echo "======================================="
    echo "Iteration $count of $X"
    
    # Send request with retry logic
    if send_request_with_retry; then
        successful_iterations=$((successful_iterations + 1))
    fi
done

TOTAL_END_TIME=$(date +%s.%N)
TOTAL_TIME=$(echo "$TOTAL_END_TIME - $TOTAL_START_TIME" | bc)

echo "======================================="
echo "Summary:"
echo "Total iterations: $X"
echo "Successful iterations (status 201): $successful_iterations"
echo "Failed iterations: $((X - successful_iterations))"
echo "Total time taken: ${TOTAL_TIME} seconds"
echo "Average time per iteration: $(echo "$TOTAL_TIME / $X" | bc -l) seconds"

# Clean up at the end
rm -f $PAYLOAD_FILE

Expected behavior

Either the HTTP input should return non success status code if the pipeline is paused or buffers are full, or all the logs should be delivered, or there should be some error.
Screenshots

Your Environment

Running it locally on my Mac.

Version used: 4.0
Configuration:
Environment name and version (e.g. Kubernetes? What version?):
Server type and version:
Operating System and version:
Filters and plugins:

Additional context

The text was updated successfully, but these errors were encountered:

ashish-kumar-glean added the status: waiting-for-triage label Apr 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Log loss when using HTTP input without any errors #10229

Log loss when using HTTP input without any errors #10229

ashish-kumar-glean commented Apr 19, 2025

Log loss when using HTTP input without any errors #10229

Log loss when using HTTP input without any errors #10229

Comments

ashish-kumar-glean commented Apr 19, 2025

Bug Report