Hello,
We have two D6 boxes (HP DL360 G7) configured in ClusterXL (new HA mode). OS - SPLAT R75.30.
For 1st sync network we are using direct cross-over cable between two nodes (on s0p0 interface).
We have a bonding interface (lacp L3+L4 hash) including s0p1,s0p2,s1p0 and s1p1 physical interfaces (all interfaces are 1Gbps on different modules s0 and s1 and are:
lspci | grep Ether
03:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
03:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
04:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
04:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
08:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) (rev 06)
08:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) (rev 06)
09:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) (rev 06)
09:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) (rev 06)
Bonding seems to be working fine and ethtool -S doesn't show any errors on all physical ports.
--------------------- cat /proc/net/bonding/bond0 ---------------------
# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 200
Down Delay (ms): 200
802.3ad info
LACP rate: slow
Active Aggregator Info:
Aggregator ID: 4
Number of ports: 4
Actor Key: 17
Partner Key: 20
Partner Mac Address: d0:d0:fd:a5:e3:80
Slave Interface: s1p0
MII Status: up
Link Failure Count: 1
Permanent HW addr: a0:36:9f:15:11:71
Aggregator ID: 4
Slave Interface: s1p1
MII Status: up
Link Failure Count: 1
Permanent HW addr: a0:36:9f:15:11:70
Aggregator ID: 4
Slave Interface: s0p1
MII Status: up
Link Failure Count: 1
Permanent HW addr: e4:11:5b:d4:30:a4
Aggregator ID: 4
Slave Interface: s0p2
MII Status: up
Link Failure Count: 1
Permanent HW addr: e4:11:5b:d4:30:ae
Aggregator ID: 4
-----------------------------------------------
On this bond interface we have five 802.1q vlans.
Rulebase is pretty simple - it contains 180 rules and 4 NAT rules. We don't use nothing more (no IPS, no AV, etc).
This device has a high CPU usage - between 40-55%. Total traffic going through CP firewall is 1.1Gbps (total in+out on all interfaces). Peak concurrent connections are 170000.
According to the specifications this device could handle 25Gbps and 5M connections max and we are using much much below that (maxumum connections is change from 25000 to 800000)
Actually these D6 boxes replaced cisco ASA5550 pair which was CPU utilised 60%-70%.
Our expectation was not more than 25-30% of CPU load on D6 boxes and we think that there is something wrong in our setup.
CPU usage is caused by fw_worker_0, fw_worker_1 and fw_worker_2 processes. (CPU is E5620 - 4 core with disabled HT, cpuinfo attached). CoreXL is with default settings - all NIC interrupts are handle by CPU0 and fw_worker_0 use CPU1, fw_worker_1 use CPU2 and fw_worker_2 use CPU3:
-------------- part of top ---------------
top - 08:42:19 up 5 days, 18:30, 1 user, load average: 1.83, 1.69, 1.58
Tasks: 97 total, 2 running, 95 sleeping, 0 stopped, 0 zombie
Cpu0 : 0.0%us, 0.0%sy, 0.0%ni, 61.8%id, 0.0%wa, 6.0%hi, 32.2%si, 0.0%st
Cpu1 : 0.0%us, 0.3%sy, 0.0%ni, 39.3%id, 0.0%wa, 0.0%hi, 60.3%si, 0.0%st
Cpu2 : 0.0%us, 0.7%sy, 0.0%ni, 46.8%id, 0.0%wa, 0.0%hi, 52.5%si, 0.0%st
Cpu3 : 0.0%us, 0.7%sy, 0.0%ni, 59.1%id, 0.0%wa, 0.0%hi, 40.2%si, 0.0%st
Mem: 6221296k total, 1917612k used, 4303684k free, 226084k buffers
Swap: 13631144k total, 0k used, 13631144k free, 218220k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2797 root 15 0 0 0 0 R 61 0.0 1075:00 fw_worker_2
2716 root 16 0 0 0 0 S 53 0.0 1130:56 fw_worker_1
2653 root 15 0 0 0 0 S 40 0.0 1033:12 fw_worker_0
15084 root 15 0 401m 47m 18m S 1 0.8 106:19.91 fw
25670 root 15 0 2068 1016 780 R 0 0.0 0:00.05 top
------------------ fw ctl affinity -l -r --------------
fw ctl affinity -l -r
CPU 0: s0p2 s0p1 s0p0 s1p0 s1p1
CPU 1: fw_2
CPU 2: fw_1
CPU 3: fw_0
All: mpdaemon dtlsd fwd in.geod in.asessiond in.aufpd vpnd cprid cpd
[Expert@ukdxsdfshll013]#
------------- fw ctl multik stat -----------------
fw ctl multik stat
ID | Active | CPU | Connections | Peak
-------------------------------------------
0 | Yes | 3 | 54258 | 57484
1 | Yes | 2 | 55262 | 58884
2 | Yes | 1 | 54806 | 57974
----------- cpstat -t multi_cpu os --------------
cpstat -f multi_cpu os
Processors load
---------------------------------------------------------------------------------
|CPU#|User Time(%)|System Time(%)|Idle Time(%)|Usage(%)|Run queue|Interrupts/sec|
---------------------------------------------------------------------------------
| 1| 0| 39| 60| 40| ?| 0|
| 2| 0| 42| 58| 42| ?| 0|
| 3| 0| 56| 44| 56| ?| 0|
| 4| 0| 45| 55| 45| ?| 0|
---------------------------------------------------------------------------------
I believe that the problem is in SecureXL:
---------------------------- fwaccel stat----------------
fwaccel stat
Accelerator Status : on
Accept Templates : enabled
Drop Templates : disabled
Accelerator Features : Accounting, NAT, Cryptography, Routing,
HasClock, Templates, Synchronous, IdleDetection,
Sequencing, TcpStateDetect, AutoExpire,
DelayedNotif, TcpStateDetectV2, CPLS, WireMode,
DropTemplates, Streaming, MultiFW, AntiSpoofing,
DoS Defender, Nac
Cryptography Features : Tunnel, UDPEncapsulation, MD5, SHA1, NULL,
3DES, DES, CAST, CAST-40, AES-128, AES-256,
ESP, LinkSelection, DynamicVPN, NatTraversal,
EncRouting, AES-XCBC, SHA256
------------------ fwaccel stats ------------------
fwaccel stats
Name Value Name Value
-------------------- --------------- -------------------- ---------------
conns created 47848743 conns deleted 46594114
temporary conns 152420 templates 5202
nat conns 104 accel packets 1009641643
accel bytes 718151584061 F2F packets 2376587399
ESP enc pkts 0 ESP enc err 0
ESP dec pkts 0 ESP dec err 0
ESP other err 0 espudp enc pkts 0
espudp enc err 0 espudp dec pkts 0
espudp dec err 0 espudp other err 0
AH enc pkts 0 AH enc err 0
AH dec pkts 0 AH dec err 0
AH other err 0 memory used 0
free memory 0 acct update interval 3600
current total conns 153686 TCP violations 15215
conns from templates 3469212 TCP conns 152013
delayed TCP conns 0 non TCP conns 1673
delayed nonTCP conns 0 F2F conns 81233
F2F bytes 1303498849580 crypt conns 0
enc bytes 0 dec bytes 0
partial conns 0 anticipated conns 0
dropped packets 161 dropped bytes 33038
nat templates 0 port alloc templates 0
conns from nat tmpl 0 port alloc conns 0
port alloc f2f 0 PXL templates 148
PXL conns 216 PXL packets 516566685
PXL bytes 285833923732 PXL async packets 516589579
------------------ fwaccel stats -s ----------------
Accelerated conns/Total conns : 71941/154182 (46%)
Accelerated pkts/Total pkts : 1013954607/3913942827 (25%)
F2Fed pkts/Total pkts : 2382649836/3913942827 (60%)
PXL pkts/Total pkts : 517338384/3913942827 (13%)
It seems that most of the connections are not accelerated and I don't know why.
This is part of debugging of secureXL:
Nov 12 22:08:00 firewall kernel: [fw_1];cphwd_offload_conn: dir=1, cdir=1, vm_conn=<1X8.58.164.136,53420,1X4.142.120.18,53,17 >
Nov 12 22:08:00 firewall kernel: [fw_1];get_conn_flags: no handler for this conn (no sticky F2F)
Nov 12 22:08:00 firewall kernel: [fw_1];get_conn_flags: sticky_f2f=0 for <1X8.58.164.136,53420,1X4.142.120.18,53,17>
Nov 12 22:08:00 firewall kernel: [fw_1];cphwd_offload_conn: calling cphwd_api_add_connection_, flags 0x0, flags_ex 0x0
Nov 12 22:08:00 firewall kernel: [fw_1];cphwd_add_conn_stat_cb: received add status for <1X8.58.164.136,53420,1X4.142.120.18,53,17>(flags= 0x0, cb_flags=0x0): success
Nov 12 22:08:00 firewall kernel: [fw_1];cphwd_add_conn_stat_cb: received add status for <1X8.58.164.136,0,14.142.120.18,53,17>(flags=0x800 , cb_flags=0x0): success
Nov 12 22:08:00 firewall kernel: [fw_1];cphwd_add_conn_stat_cb: CPHWD_F_TEMPLATE
Nov 12 22:08:00 firewall kernel: [fw_2];cphwd_offload_conn: dir=1, cdir=1, vm_conn=<1X4.142.120.157,38122,1X4.142.121.92,135, 6>
Nov 12 22:08:00 firewall kernel: [fw_2];get_conn_flags: MORE_INSPECT is on -> F2F
Nov 12 22:08:00 firewall kernel: [fw_2];get_conn_flags: sticky_f2f=1 for <1X4.142.120.157,38122,1X4.142.121.92,135,6>
Nov 12 22:08:00 firewall kernel: [fw_2];cphwd_pslglue_provide_conn_opaque: conn is streamed (both sides) -> F2F both dirs
Nov 12 22:08:00 firewall kernel: [fw_2];cphwd_offload_conn: pxl - turning on sticky f2f on conn <1X4.142.120.157:38122 -> 1X4.142.121.92:135 IPP 6>
Nov 12 22:08:00 firewall kernel: [fw_2];cphwd_offload_conn: conn <1X4.142.120.157,38122,1X4.142.121.92,135,6> has sticky f2f (2)
Nov 12 22:08:00 firewall kernel: [fw_2];cphwd_offload_conn: calling cphwd_api_add_connection_, flags 0x20001, flags_ex 0x8
Nov 12 22:08:00 firewall kernel: [fw_2];cphwd_add_conn_stat_cb: received add status for <1X4.142.120.157,38122,1X4.142.121.92,135,6>(flags =0x20001, cb_flags=0x8): success
Nov 12 22:08:00 firewall kernel: [fw_0];cphwd_offload_conn: dir=1, cdir=1, vm_conn=<1X4.142.121.165,1757,1X4.142.120.87,4288, 6>
Nov 12 22:08:00 firewall kernel: [fw_0];get_conn_flags: MORE_INSPECT is on -> F2F
Nov 12 22:08:00 firewall kernel: [fw_0];get_conn_flags: sticky_f2f=1 for <1X4.142.121.165,1757,1X4.142.120.87,4288,6>
Nov 12 22:08:00 firewall kernel: [fw_0];cphwd_pslglue_provide_conn_opaque: conn is streamed (both sides) -> F2F both dirs
Nov 12 22:08:00 firewall kernel: [fw_0];cphwd_offload_conn: pxl - turning on sticky f2f on conn <1X4.142.121.165:1757 -> 1X4.142.120.87:4288 IPP 6>
Nov 12 22:08:00 firewall kernel: [fw_0];cphwd_offload_conn: conn <1X4.142.121.165,1757,1X4.142.120.87,4288,6> has sticky f2f (2)
Nov 12 22:08:00 firewall kernel: [fw_0];cphwd_offload_conn: calling cphwd_api_add_connection_, flags 0x20001, flags_ex 0x8
Nov 12 22:08:00 firewall kernel: [fw_0];cphwd_add_conn_stat_cb: received add status for <1X4.142.121.165,1757,1X4.142.120.87,4288,6>(flags =0x20001, cb_flags=0x8): success
Nov 12 22:08:00 firewall kernel: [fw_0];cphwd_offload_conn: dir=1, cdir=1, vm_conn=<1X4.142.120.149,6632,193.189.13.39,22180, 6>
Nov 12 22:08:00 firewall kernel: [fw_0];get_conn_flags: MORE_INSPECT is on -> F2F
Nov 12 22:08:00 firewall kernel: [fw_0];get_conn_flags: sticky_f2f=1 for <1X4.142.120.149,6632,193.189.13.39,22180,6>
Nov 12 22:08:00 firewall kernel: [fw_0];cphwd_pslglue_provide_conn_opaque: conn is streamed (both sides) -> F2F both dirs
Nov 12 22:08:00 firewall kernel: [fw_0];cphwd_offload_conn: pxl - turning on sticky f2f on conn <1X4.142.120.149:6632 -> 193.189.13.39:22180 IPP 6>
Nov 12 22:08:00 firewall kernel: [fw_0];cphwd_offload_conn: conn <1X4.142.120.149,6632,193.189.13.39,22180,6> has sticky f2f (2)
-------------------------
Any suggestions are welcome!