Hello,
Our installation and configuration of HAProxy v1.5.3 on Debian GNU/Linux
Wheezy (v7.8, fully patched to date, and running on bare metal with no
virtualization) has been stable. I have an active/passive server
deployment using keepalived, and they have been running without issue on
this version since 7/31/14. HAProxy interfaces with a backend Windows
Server 2008 R2/IIS v7.5 web farm.
The physical servers are Dell PowerEdge R310 with (1) Intel Xeon X3430
(4 cores) @2.4GHz and 32GB of RAM (@800 MHz). Each server has bond0
configured, which is comprised of eth0 and eth1, and each physical
interface connects to a switch stack (Cisco Catalyst 3750) using
802.3ad. The on-board network cards are Broadcom Corporation NetXtreme
II BCM5716 Gigabit Ethernet (rev 20). Cisco 3750 switch interface
configuration and statistic reporting (i.e. input/output errors, CRCs,
etc.) is clean. The backend servers are physically connected to the same
Cisco 3750 switch stack. Active/passive high availability for HAProxy
using keepalived works as expected.
HAProxy Statistics under normal weekly workloads reflect the following:
Queue/Cur - 0, Max - some #, Limit --
Session rate/Cur - 1 to 200 per server
Session rate/Max - 300 to 500
Session rate/Limit - blank
Sessions/Cur - 1 to 30 per server; could spike to 50
Sessions/Max - 50
Sessions/Limit - 50
Denied Req/Resp - 0
Errors/Req -
Errors/Conn - 0
Errors/Resp - usually 1+, but not incrementing fast (i.e., in six hours'
time today there are 41 total)
Warnings/Ret/Redis - 0
In January 2015, I tried to catch up on HAProxy maintenance releases by
upgrading only our active server from v1.5.3 to v1.5.10 (before 1.5.11
was announced) late on a Tuesday night. Immediately post upgrade, the
active server seemingly behaved per testing. Unfortunately, v1.5.10
surfaced a new problem early the next morning around 9:00 a.m. which
forced me to fail over to our passive server (still running v1.5.3) in
order to restore service to our customers, which was followed by
downgrading our active server to v1.5.3 in order to stabilize the system
and restore the high availability pair.
*The problem exhibited the following behaviors on the active server: *
* HAProxy Statistics (HPS) showed many, but not all, web farm servers
with Queue/Cur in the low thousands, and they would remain there
with minor queue count fluctuations both incrementing and
decrementing by < 100 every stats page refresh. For these same
servers, the Sessions/Cur was stuck at 50, which is the configured
Max & Limit, which explains the queuing and why some customers
weren't able to use our service.
* HPS would intermittently flash yellow horizontal lines, also noting
a very high 2000ms L7 response time, typically on the servers with
the high queue count.
* Stopping and starting the HAProxy service would shuffle around the
numbers in HPS as to which server had the high queues, but not all
servers would have high queues (only two or three would have them).
Waiting for five or ten minutes wouldn't self heal the queues
through session processing.
* HPS would rarely flash a red horizontal line, and that server's
sessions would seem to zero out its Queue/Cur.
* CPU utilization (30%) and memory consumption (< 5GB) on the active
node during the event are within standard trends.
None of the backend web farm servers, per active cacti graphing,
displayed any CPU, memory, or disk anomalies during this time. At the
time, I decided to table any further upgrade attempts until I could
research the issue further.
On the night of 2/13/15, I thought I would try again with v1.5.11 even
though I struggled to find anything relevant to my former experience in
the /HAProxy ChangeLog/ or problems with my configuration. All weekend
and early this morning, v1.5.11 behaved up until more customers came
online and started using our services. Looking at our cacti graph, from
8:50 a.m. EST to 9:00 a.m. EST, our total ingress and egress traffic
combined jumped from 80Mbps to 170Mbps. It was during this time that the
problem described above surfaced again, causing a service failure for
large amount of our customers.
* @ 9:05 a.m. stopping and starting HAProxy v1.5.11 didn't resolve the
problem. Waited six minutes for processing which didn't catch up.
* @ 9:12 a.m. I downgraded HAProxy from v1.5.11 to v1.5.3 and
everything normalized in less than a minute.
* @ 9:16 a.m. I upgraded HAProxy from v1.5.3 to v1.5.5 and the problem
surfaced again and didn't heal in five minutes' time.
* @ 9:22 a.m. I downgraded HAProxy from v1.5.5 to v1.5.4 and
everything normalized in less than a minute. It has been stable all
day so far.
Each time I would build HAProxy I would
* wget http://haproxy.1wt.eu/download/1.5/src/haproxy-1.x.x.tar.gz
* tar -xf haproxy-1.x.x.tar.gz
* cd haproxy-1.x.x
* service haproxy stop
* make TARGET=linux2628 CPU=generic USE_PCRE=1 USE_OPENSSL=1 USE_ZLIB=1
* make install
* service haproxy start
I've reviewed the ChangeLog found here:
http://www.haproxy.org/download/1.5/src/CHANGELOG, but I haven't been
able to pinpoint any specific change in v1.5.5 which might be affecting
my deployment based on my configuration.
*root@server:/#uname -a*
Linux p01 3.2.0-4-amd64 #1 SMP Debian 3.2.65-1+deb7u1 x86_64 GNU/Linux
*root@server:/#cat /etc/sysctl.conf*
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.secure_redirects = 1
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.all.log_martians = 1
net.core.somaxconn=10000
net.ipv4.ip_local_port_range = 5700 65000
*root@server:/#cat /etc/haproxy/haproxy.conf*
global
log 127.0.0.1 local0
maxconn 32000
user (some user)
group (some group)
daemon
maxsslconn 32000
maxconnrate 32000
chroot /(some path)/chroot/haproxy
node (some name)
stats socket /(some path)/haproxy
tune.ssl.default-dh-param 1024
defaults
log global
mode http
option httplog
option dontlognull
retries 3
option redispatch
maxconn 32000
timeout connect 35s
timeout client 35s
timeout server 35s
frontend web
mode http
timeout client 1200s
option forwardfor except 127.0.0.1
bind *:80
bind 0.0.0.0:443 ssl crt (path to cer file) ca-file (path to crt)
redirect scheme https if !{ ssl_fc }
acl url_imaging path_beg /(custom path 1)
acl url_report path_beg /(custom path 2)
acl url_wlog path_beg /(custom path 3)
use_backend sweb-farm if url_imaging or url_report or url_wlog
capture request header Host len 32
capture request header User-Agent len 200
capture request header Content-length len 200
capture request header X-Forwarded-For len 32
default_backend web-farm
backend web-farm
mode http
# This ridiculous timeout is required due to bad application design
for reporting purposes.
timeout server 1200s
option httpchk HEAD /index.html
option http-server-close
balance hdr(host)
hash-type consistent
stick-table type ip size 10m expire 30m
stick on src
stats enable
stats hide-version
stats scope .
stats uri (my stats URI)
stats realm Haproxy\ Statistics
stats auth (username:pass)
stats refresh 2s
stats show-legends
stats show-node (city)
server web01 x.x.x.x:80 maxconn 50 weight 30 check inter 2000 rise
2 fall 2 ca-file (path to crt)
server web02 x.x.x.x:80 maxconn 50 weight 30 check inter 2000 rise
2 fall 2 ca-file (path to crt)
server web03 x.x.x.x:80 maxconn 50 weight 15 check inter 2000 rise
2 fall 2 ca-file (path to crt)
server web04 x.x.x.x:80 maxconn 50 weight 30 check inter 2000 rise
2 fall 2 ca-file (path to crt)
server web05 x.x.x.x:80 maxconn 50 weight 15 check inter 2000 rise
2 fall 2 ca-file (path to crt)
server web06 x.x.x.x:80 maxconn 50 weight 30 check inter 2000 rise
2 fall 2 ca-file (path to crt)
server web07 x.x.x.x:80 maxconn 50 weight 30 check inter 2000 rise
2 fall 2 ca-file (path to crt)
server web08 x.x.x.x:80 maxconn 50 weight 30 check inter 2000 rise
2 fall 2 ca-file (path to crt)
server web09 x.x.x.x:80 maxconn 50 weight 30 check inter 2000 rise
2 fall 2 ca-file (path to crt)
backend sweb-farm
mode http
# This ridiculous timeout is required due to bad application
design for reporting purposes.
timeout server 1200s
option httpchk HEAD /index.html
option http-server-close
stick match src table web-farm
server sweb01 x.x.x.x:443 maxconn 50 weight 30 check ssl inter 2000
rise 2 fall 2 ca-file (path to crt)
server sweb02 x.x.x.x:443 maxconn 50 weight 30 check ssl inter 2000
rise 2 fall 2 ca-file (path to crt)
server sweb03 x.x.x.x:443 maxconn 50 weight 15 check ssl inter 2000
rise 2 fall 2 ca-file (path to crt)
server sweb04 x.x.x.x:443 maxconn 50 weight 30 check ssl inter 2000
rise 2 fall 2 ca-file (path to crt)
server sweb05 x.x.x.x:443 maxconn 50 weight 15 check ssl inter 2000
rise 2 fall 2 ca-file (path to crt)
server sweb06 x.x.x.x:443 maxconn 50 weight 30 check ssl inter 2000
rise 2 fall 2 ca-file (path to crt)
server sweb07 x.x.x.x:443 maxconn 50 weight 30 check ssl inter 2000
rise 2 fall 2 ca-file (path to crt)
server sweb08 x.x.x.x:443 maxconn 50 weight 30 check ssl inter 2000
rise 2 fall 2 ca-file (path to crt)
server sweb09 x.x.x.x:443 maxconn 50 weight 30 check ssl inter 2000
rise 2 fall 2 ca-file (path to crt)
frontend print-proxy
mode tcp
# This timeout is required due to bad application design for
reporting purposes.
timeout client 2m
option tcplog
bind *:808
default_backend print-farm
backend print-farm
mode tcp
balance roundrobin
# This timeout is required due to bad application design for
reporting purposes.
timeout server 2m
stick match src table web-farm
server web01 x.x.x.x:808
(truncated for brevity)
server web09 x.x.x.x:808
*root@server:/#haproxy -vv*
HA-Proxy version 1.5.4 2014/09/02
Copyright 2000-2014 Willy Tarreau <w@1wt.eu>
Build options :
TARGET = linux2628
CPU = generic
CC = gcc
CFLAGS = -O2 -g -fno-strict-aliasing
OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE=1
Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200
Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.7
Compression algorithms supported : identity, deflate, gzip
Built with OpenSSL version : OpenSSL 1.0.1e 11 Feb 2013
Running on OpenSSL version : OpenSSL 1.0.1e 11 Feb 2013
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 8.30 2012-02-04
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with transparent proxy support using: IP_TRANSPARENT
IPV6_TRANSPARENT IP_FREEBIND
Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.
*root@server:/#echo "show info" | socat unix-connect:/tmp/haproxy stdio*
Name: HAProxy
Version: 1.5.4
Release_date: 2014/09/02
Nbproc: 1
Process_num: 1
Pid: 13579
Uptime: 0d 3h03m29s
Uptime_sec: 11009
Memmax_MB: 0
Ulimit-n: 64051
Maxsock: 64051
Maxconn: 32000
Hard_maxconn: 32000
CurrConns: 7251
CumConns: 210523
CumReq: 8374386
MaxSslConns: 32000
CurrSslConns: 7094
CumSslConns: 292816
Maxpipes: 0
PipesUsed: 0
PipesFree: 0
ConnRate: 20
ConnRateLimit: 32000
MaxConnRate: 577
SessRate: 20
SessRateLimit: 0
MaxSessRate: 577
SslRate: 19
SslRateLimit: 0
MaxSslRate: 576
SslFrontendKeyRate: 11
SslFrontendMaxKeyRate: 323
SslFrontendSessionReuse_pct: 42
SslBackendKeyRate: 0
SslBackendMaxKeyRate: 8
SslCacheLookups: 168401
SslCacheMisses: 3426
CompressBpsIn: 0
CompressBpsOut: 0
CompressBpsRateLim: 0
ZlibMemUsage: 0
MaxZlibMemUsage: 0
Tasks: 7278
Run_queue: 1
Idle_pct: 74
node: (server)
description:
*root@server:/etc# dpkg -s openssl*
Package: openssl
Status: install ok installed
Priority: optional
Section: utils
Installed-Size: 1082
Maintainer: Debian OpenSSL Team <pkg-openssl-devel@lists.alioth.debian.org>
Architecture: amd64
Version: 1.0.1e-2+deb7u14
Depends: libc6 (>= 2.7), libssl1.0.0 (>= 1.0.1e-2+deb7u5), zlib1g (>=
1:1.1.4)
Suggests: ca-certificates