GPFS profile

On the SUSE LINUX ES 9 distribution, it is recommended you adjust the vm.min_free_kbytes kernel tunable. This tunable controls the amount of free memory that Linux kernel keeps available (i.e. not used in any kernel caches). When vm.min_free_kbytes is set to its default value, on some configurations it is possible to encounter memory exhaustion symptoms when free memory should in fact be available. Setting vm.min_free_kbytes to a higher value (Linux sysctl utility could be used for this purpose), on the order of magnitude of 5-6% of the total amount of physical memory, should help to avoid such a situation.

`kernel parameter`	`recommended value`	`usage`
`vm/min_free_kbytes`	`>= 5% RAM <= 6% RAM`	`Used to force the VM to keep a minimum # of Kbytes free. This number is used to compute the # of reserved free pages for each memory zone in the system.`

Reference [3]) suggests what follows

Some customers with large GPFS Linux-only clusters (128 or more nodes) have experienced occasional command lockups when starting or stopping the GPFS subsystem. This problem was identified as being caused by occasional failures of the Linux TCP layer to correctly handle listen queue overflows on passive TCP sockets. This behavior is detailed below along with some help to reduce this problem.

As a part of the normal GPFS startup procedure, many (or all) nodes in the cluster may try to communicate with the primary or secondary GPFS cluster data server node and attempt to execute a command on that node through remote shell. This procedure involves establishing a connection to the server TCP socket created by the remote shell daemon on each these nodes. If the listen queue of the server sockets is not large enough, a queue overflow will occur if too many nodes try to initiate remote shell connections simultaneously.

In a large GPFS environment, prolonged listen queue overflows can present a substantial problem. If the listen queue on the remote shell server is kept full for an extended period of time, serve-side TCP starts dropping incoming connection requests without notifying clients about its decision to drop them. This leaves some of the connections in a half-opened state, thereby causing some of the remote shell client processes to hang.

Preventive steps that may reduce the likelihood that prolonged listen queue overflows will occur in your GPFS environment include:

A first solution that implies recompiling and reinstalling the ssh package. Thus it cannot be part of a profile.
If recompilation is not possible in your environment, or if GPFS is configured to use rsh (or any other remote execution command that relies on xinetd), we suggest that you instead increase the number of attempts TCP makes when adding connections the server's listen queue. This can be done by modifying the value of the tcp_synack_retries kernel tunable on primary and secondary GPFS cluster servers and we suggest increasing this parameter to at least 10.

`kernel parameter`	`recommended value`	`usage`
`net/ipv4/tcp_synack_retries`	`10`	`How many times to establish a passive TCP connection that was started by another host.`

`kernel parameter`	`recommended value`	`usage`
`ifconfig <interface> mtu 9000 up`	`9000`	`MTU size for the communication adapter (if GPFS is configured over Gigabit Ethernet, in order to enable Jumbo Frames)`
`net/ipv4/tcp_window_scaling`	`1`	`Enable TCP to negociate the use of window scaling (> 64K buffers) with the other end during connection setup`
`net/core/rmem_max`	`0x800000 (8 MB)`	`Maximum receive buffer size. Overrides tcp_rmem max value if max(tcp_rmem) > rmem_max`
`net/core/wmem_max`	`0x800000 (8 MB)`	`Maximum send buffer size. Overrides tcp_wmem max value if max(tcp_wmem) > wmem_max`
`net/ipv4/tcp_rmem`	`0x001000 0x040000 0x800000`	`memory size of the TCP send buffers: minimum default maximum`
`net/ipv4/tcp_wmem`	`0x001000 0x040000 0x800000`	`memory size of the TCP receive buffers: minimum size default size maximum size`
`net/core/netdev_max_backlog`	`2500`	`Max # of received packets that will be processed before resulting in congestion`

Tuning Linux kernel for GPFS

Overview

Reference [1] suggests the following

Reference [2] suggest the following (SLES 9 specific)

Reference [3]) suggests what follows

End of Document