Intermittent Segfaults With HAProxy 3.0.x / Possibly Related To Cache Usage

by ADMIN 76 views

Intermittent Segfaults with HAProxy 3.0.x / Possibly Related to Cache Usage

Detailed Description of the Problem

We are experiencing intermittent general protection faults on a HAProxy instance that is using the cache feature. Other identical Loadbalancers, without it, are not showing these crashes. The captured coredump seems to indicate the possibility of a relationship between the cache and the crashes.

Expected Behavior

No general protection fault / crashes

Steps to Reproduce the Behavior

Unfortunately, we have not yet found a means for reproduction, as the crashes are very irregular and sometimes do not occur for weeks, only to suddenly become more regular. They have been occurring for a long time, possibly all 3.0.x versions.

Coredump and Backtrace

The coredump and backtrace are provided below:

Mar 28 00:11:38 hostname kernel: traps: haproxy[265522] general protection fault ip:72ce7b sp:7f68dcee9fe8 error:0 in haproxy[400000+988000]
Mar 28 23:21:59 hostname kernel: traps: haproxy[497514] general protection fault ip:72ce84 sp:7f073d68ef78 error:0 in haproxy[400000+988000]
Apr  6 05:38:37 hostname kernel: traps: haproxy[2648279] general protection fault ip:63a988 sp:7f3c50315f80 error:0 in haproxy[400000+988000]
Apr 12 19:11:13 hostname kernel: traps: haproxy[2180294] general protection fault ip:72ceb9 sp:7f0291e8fec8 error:0 in haproxy[400000+988000]
Apr 14 01:00:15 hostname kernel: traps: haproxy[2743833] general protection fault ip:72ceb9 sp:7f10b72aeec8 error:0 in haproxy[400000+988000]
Apr 15 19:35:19 hostname kernel: traps: haproxy[3422234] general protection fault ip:728c37 sp:7ffcc787e1f8 error:0 in haproxy[400000+988000]
Apr 21 19:11:59 hostname kernel: traps: haproxy[3977856] general protection fault ip:72ce7b sp:7f55fdf9df78 error:0 in haproxy[400000+988000]

HAProxy Configuration

The HAProxy configuration is as follows:

cache cache_one
      total-max-size 150    # MB
      max-object-size 50000 # bytes
      max-age 900           # seconds
      process-vary on

...

backend backend
        ...
        http-request  cache-use   cache_one
        http-response cache-store cache_one
        http-after-response set-header x-cache %[res.cache_hit,iif(Hit,Miss)]
        ...

HAProxy Version and Build Options

The HAProxy version and build options are as follows:

HAProxy version 3.0.9-7f0031e 2025/03/20 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2029.
 bugs: http://www.haproxy.org/bugs/bugs-3.0.9.html
Running on: Linux 4.18.0-553.50.1.el8_10.x86_64 #1 SMP Thu Apr 10 16:09:41 EDT 2025 x86_64
Build options :
  TARGET  = linux-glibc
  CC      = cc
  CFLAGS  = -O2 -g -fwrapv -fstack-protector-strong -D_FORTIFY_SOURCE=2 -Werror=format-security
  OPTIONS = USE_OPENSSL=1 USE_LUA=1 USE_SLZ=1 USE_TFO=1 USE_NS=1 USE_SYSTEMD=1 USE_QUIC=1 USE_PROMEX=1 USE_STATIC_PCRE2=1 USE_PCRE2_JIT=1 USE_QUIC_OPENSSL_COMPAT=1
  DEBUG   =

Feature list : -51DEGREES +ACCEPT4 +BACKTRACE -CLOSEFROM +CPU_AFFINITY +CRYPT_H -DEVICEATLAS +DL -ENGINE +EPOLL -EVPORTS +GETADDRINFO -KQUEUE -LIBATOMIC +LIBCRYPT +LINUX_CAP +LINUX_SPLICE +LINUX_TPROXY +LUA +MATH -MEMORY_PROFILING +NETFILTER +NS -OBSOLETE_LINKER +OPENSSL -OPENSSL_AWSLC -OPENSSL_WOLFSSL -OT -PCRE +PCRE2 +PCRE2_JIT -PCRE_JIT +POLL +PRCTL -PROCCTL +PROMEX -PTHREAD_EMULATION +QUIC +QUIC_OPENSSL_COMPAT +RT +SHM_OPEN +SLZ +SSL -STATIC_PCRE +STATIC_PCRE2 +SYSTEMD +TFO +THREAD +THREAD_DUMP +TPROXY -WURFL -ZLIB

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_TGROUPS=16, MAX_THREADS=256, default=8).
Built with OpenSSL version : OpenSSL 3.3.3 11 Feb 2025
Running on OpenSSL version : OpenSSL 3.3.3 11 Feb 2025
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
OpenSSL providers loaded : default
Built with Lua version : Lua 5.4.7
Built with the Prometheus exporter as a service
Built with network namespace support.
Built with libslz for stateless compression.
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE2 version : 10.45 2025-02-05
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 8.5.0 20210514 (Red Hat 8.5.0-26)

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using '' keyword)
       quic : mode=HTTP  side=FE     mux=QUIC  flags=HTX|NO_UPG|FRAMED
         h2 : mode=HTTP  side=FE|BE  mux=H2    flags=HTX|HOL_RISK|NO_UPG
  <default> : mode=HTTP  side=FE|BE  mux=H1    flags=HTX
         h1 : mode=HTTP  side=FE|BE  mux=H1    flags=HTX|NO_UPG
       fcgi : mode=HTTP  side=BE     mux=FCGI  flags=HTX|HOL_RISK|NO_UPG
  <default> : mode=TCP   side=FE|BE  mux=PASS  flags=
       none : mode=TCP   side=FE|BE  mux=PASS  flags=NO_UPG

Available services : prometheus-exporter
Available filters :
        [BWLIM] bwlim-in
        [BWLIM] bwlim-out
        [CACHE] cache
        [COMP] compression
        [FCGI] fcgi-app
        [SPOE] spoe
        [TRACE] trace

Last Outputs and Backtraces

The last outputs and backtraces are provided below:

[New LWP 3977856]
[New LWP 3977851]
[New LWP 3977854]
[New LWP 3977855]
[New LWP 3977853]
[New LWP 3977852]
[New LWP 3977850]
[New LWP 3977849]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/data/haproxy/sbin/haproxy -Ws -f /data/haproxy/haproxy.cfg'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000000000072ce7b in get_pipe () at src/pipe.c:69
69              ret = pool_alloc(pool_head_pipe);
[Current thread is 1 (Thread 0x7f55fdfa9700 (LWP 3977856))]
(gdb) t a a bt full

Thread 8 (Thread 0x7f5616494280 (LWP 3977849)):
#0  0x00007f56151ea397 in epoll_wait (epfd=6, events=0x2343330, maxevents=200, timeout=timeout@entry=9) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
        resultvar = 184467440737095<br/>
**Q&A: Intermittent Segfaults with HAProxy 3.0.x / Possibly Related to Cache Usage**

**Q: What is the problem you are experiencing with HAProxy?**

A: We are experiencing intermittent general protection faults on a HAProxy instance that is using the cache feature. Other identical Loadbalancers, without it, are not showing these crashes.

**Q: What is the expected behavior?**

A: No general protection fault / crashes

**Q: Have you found a way to reproduce the issue?**

A: Unfortunately, we have not yet found a means for reproduction, as the crashes are very irregular and sometimes do not occur for weeks, only to suddenly become more regular.

**Q: What is the HAProxy configuration?**

A: The HAProxy configuration is as follows:
```haproxy
cache cache_one
      total-max-size 150    # MB
      max-object-size 50000 # bytes
      max-age 900           # seconds
      process-vary on

...

backend backend
        ...
        http-request  cache-use   cache_one
        http-response cache-store cache_one
        http-after-response set-header x-cache %[res.cache_hit,iif(Hit,Miss)]
        ...

Q: What is the HAProxy version and build options?

A: The HAProxy version and build options are as follows:

HAProxy version 3.0.9-7f0031e 2025/03/20 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2029.
Known bugs: http://www.haproxy.org/bugs/bugs-3.0.9.html
Running on: Linux 4.18.0-553.50.1.el8_10.x86_64 #1 SMP Thu Apr 10 16:09:41 EDT 2025 x86_64
Build options :
  TARGET  = linux-glibc
  CC      = cc
  CFLAGS  = -O2 -g -fwrapv -fstack-protector-strong -D_FORTIFY_SOURCE=2 -Werror=format-security
  OPTIONS = USE_OPENSSL=1 USE_LUA=1 USE_SLZ=1 USE_TFO=1 USE_NS=1 USE_SYSTEMD=1 USE_QUIC=1 USE_PROMEX=1 USE_STATIC_PCRE2=1 USE_PCRE2_JIT=1 USE_QUIC_OPENSSL_COMPAT=1
  DEBUG   =

Feature list : -51DEGREES +ACCEPT4 +BACKTRACE -CLOSEFROM +CPU_AFFINITY +CRYPT_H -DEVICEATLAS +DL -ENGINE +EPOLL -EVPORTS +GETADDRINFO -KQUEUE -LIBATOMIC +LIBCRYPT +LINUX_CAP +LINUX_SPLICE +LINUX_TPROXY +LUA +MATH -MEMORY_PROFILING +NETFILTER +NS -OBSOLETE_LINKER +OPENSSL -OPENSSL_AWSLC -OPENSSL_WOLFSSL -OT -PCRE +PCRE2 +PCRE2_JIT -PCRE_JIT +POLL +PRCTL -PROCCTL +PROMEX -PTHREAD_EMULATION +QUIC +QUIC_OPENSSL_COMPAT +RT +SHM_OPEN +SLZ +SSL -STATIC_PCRE +STATIC_PCRE2 +SYSTEMD +TFO +THREAD +THREAD_DUMP +TPROXY -WURFL -ZLIB

Default settings :
  bufsize = 16384,rewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_TGROUPS=16, MAX_THREADS=256, default=8).
Built with OpenSSL version : OpenSSL 3.3.3 11 Feb 2025
Running on OpenSSL version : OpenSSL 3.3.3 11 Feb 2025
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
OpenSSL providers loaded : default
Built with Lua version : Lua 5.4.7
Built with the Prometheus exporter as a service
Built with network namespace support.
Built with libslz for stateless compression.
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE2 version : 10.45 2025-02-05
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 8.5.0 20210514 (Red Hat 8.5.0-26)

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
       quic : mode=HTTP  side=FE     mux=QUIC  flags=HTX|NO_UPG|FRAMED
         h2 : mode=HTTP  side=FE|BE  mux=H2    flags=HTX|HOL_RISK|NO_UPG
  <default> : mode=HTTP  side=FE|BE  mux=H1    flags=HTX
         h1 : mode=HTTP  side=FE|BE  mux=H1    flags=HTX|NO_UPG
       fcgi : mode=HTTP  side=BE     mux=FCGI  flags=HTX|HOL_RISK|NO_UPG
  <default> : mode=TCP   side=FE|BE  mux=PASS  flags=
       none : mode=TCP   side=FE|BE  mux=PASS  flags=NO_UPG

Available services : prometheus-exporter
Available filters :
        [BWLIM] bwlim-in
        [BWLIM] bwlim-out
        [CACHE] cache
        [COMP] compression
        [FCGI] fcgi-app
        [SPOE] spoe
        [TRACE] trace

Q: What is the last output and backtrace?

A: The last output and backtrace are provided below:

[New LWP 3977856]
[New LWP 3977851]
[New LWP 3977854]
[New LWP 3977855]
[New LWP 3977853]
[New LWP 3977852]
[New LWP 3977850]
[New LWP 3977849]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/data/haproxy/sbin/haproxy -Ws -f /data/haproxy/haproxy.cfg'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000000000072ce7b in get_pipe () at src/pipe.c:69
69              ret = pool_alloc(pool_head_pipe);
[Current thread is 1 (Thread 0x7f55fdfa9700 (LWP 3977856))]
(gdb) t a a bt full

Thread 8 (Thread 0x7f5616494280 (LWP 3977849)):
#0  0x00007f56151ea397 in epoll_wait (epfd=6, events=0x2343330, maxevents=200, timeout=timeout@entry=9) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
        resultvar = 18446744073709551612
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x00000000004a5820 in _do_poll (p=<optimized out>, exp=<optimized out>, wake=<optimized out>) at src/ev_epoll.c:232
        timeout = 9
        status = <optimized out>
        fd = <optimized out>
        count = <optimized out>
        updt_idx = <optimized out>
        wait_time = 9
        old_fd = <optimized out>
#2  0xffffffffffffffc0 in ?? ()
No symbol table info available.
#3  0x000000001ed06585 in ?? ()
No symbol table info available.
#4  0xffffffffffff5680 in ?? ()
No symbol table info available.
#5  0xffffffffffff7e08 in ?? ()
No symbol table info available.
#6  0xffffffffffff5678 in ?? ()
No symbol table info available.
#7  0x0000000000000000 in ?? ()
No symbol table info available.

Thread 7 (Thread 0x7f56150aa700 (LWP 3977850)):
#0  0x00007f56151ea397 in epoll_wait (epfd=77, events=0x7f561005e370, maxevents=200, timeout=timeout@entry=9) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
        resultvar = 18446744073709551612
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x00000000004a5820 in _do_poll (p=<optimized out>, exp=<optimized out>, wake=<optimized out>) at src/ev_epoll.c:232