Tuning Linux kernel for NFSv4

December 2005

Nadia Derbey
Nadia.Derbey@bull.net

Overview

The objective of this paper is to define a profile in terms of kernel tunables for Linux machines running nfsv4 over TCP.
It should be noted that the recommendations found in the various documents only give the nature of the system parameters to tune (network buffer sizes), but no precise value is recommended. That is why we had to run a benchmark ourselves, in order to check what should be the most appropriate values to enhance nfsv4 performances.

Testing machines

  1. nfs server:
  2. nfs client:

Benchmark

The chosen benchmark is iozone (http://www.iozone.org), since it is known to be an easy benchmark with a wide coverage, including an extensive spread of file sizes, and of IO types - reads, & writes, rereads & rewrites, etc.

IOzone has been called with the following syntax:
iozone -azc -g 2G -i 0 -i 1 -i 6 -i 7 -b result.wks

The web site gives full documentation of the parameters, but the specific options used above are:

Iozone was the only running application on the client side. Nothing was running on the server side.

Tunables variations

The benchmark has been ran once for each variation of the following network parameters:

  1. Core network parameters (/proc/sys/net/core)
    parameter
    usage
    default value on the server
    default value on the client
    rmem_max
    Maximum receive buffer size.
    If ipv4/tcp_rmem max value is >, rmem_max overrides it
    131071
    131071
    rmem_default
    Default receive buffer size.
    Overriden by ipv4/tcp_rmem default value
    101376
    106496
    wmem_max
    Maximum send buffer size.
    If ipv4 value is >, wmem_max overrides it
    131071
    131071
    wmem_default
    Default send buffer size.
    Overriden by ipv4/tcp_wmem default value
    101376
    106496

  2. TCP parameters (/proc/sys/net/ipv4)
    parameter
    usage
    default value on the server
    default value on the client
    tcp_rmem
    Memory size of the TCP receive buffers (for a single TCP socket):


    minimum
    4096 4096
    default
    87380 87380
    maximum
    174760 174760
    tcp_wmem
    Memory size of the TCP send buffers (for a single TCP socket):


    minimum
    4096
    4096
    default
    16384
    16384
    maximum
    131072
    131072
    tcp_mem
    TCP overall memory


    low: below this number of pages TCP is not bothered about its memory consumption. 98304 393216
    pressure: when amount of memory allocated by TCP exceeds this number  of pages, TCP moderates its memory consumption and enters memory pressure mode, which is exited when memory consumption falls under "low". 131072 524288
    high: number of pages allowed for queuing by all TCP sockets. When this value is reached TCp streams and packets start getting dropped until a lower memory usage is reached again 196608 786432


There were 6 variations on these parameters:

Results analysis

Since the cache size is 1MB, the record sizes that have been studied are 1, 2, 4, 8 and 16 MB. The other record sizes are lesser than the cache size: thus we can consider that they only test the CPU cache performance.

The measurements results (details can be found at http://libtune.sourceforge.net/doc/nfsv4) show that:
These conclusions should be applied "carefully": nothing but iozone was running on the client side and nothing was running on the server side. Thus all the memory was dedicated to the benchmark.

Synthetic results

First of all, let's name the different variations:

var1
standard values + 10%
var2
standard values + 50%
var3
standard values * 2
var4
standard values * 2.5
var5
standard values * 3
var6
standard values * 4

The following arrays synthetize what have been said in the previous chapter (only the file / record sizes for which non-standard values are more interesting are reported):
  1. write / re-write operations:

    write
    re-write

    record sizes != 4MB

    var5
    var5

    client
    server
    client
    server
    rmem_max
    393215
    393215
    393215 393215
    rmem_default
    319488 304128 319488 304128
    wmem_max
    393215 393215 393215 393215
    wmem_default
    319488 304128 319488 304128
    tcp_rmem
    4096
    262140
    524280
    4096
    262140
    524280
    4096
    262140
    524280
    4096
    262140
    524280
    tcp_wmem
    4096
    49152
    393216
    4096
    49152
    393216
    4096
    49152
    393215
    4096
    49152
    393216
    tcp_mem
    179648
    1572864
    2359296
    294912
    393216
    589824
    179648
    1572864
    2359296
    294912
    393216
    589824

  2. read / re-read operations:

    read
    re-read (!= 16MB records)

    8MB files
    1GB files

    2GB files
    8MB files
    1Gb files (2 and 4 MB records)

    2Gb files

    var2
    var6
    var2
    var3

    client
    server
    client
    server
    client
    server
    client
    server
    rmem_max
    196607
    196607
    524287
    524287
    196607 196607 262143
    262143
    rmem_default
    159744 152576 425984 405504 159744 152576 212992 202752
    wmem_max
    196607 196607 524287 524287 196607 196607 262143 262143
    wmem_default
    159744 152576 425984 405504 159744 152576 212992 202752
    tcp_rmem
    4096
    131070
    262140
    4096
    131070
    262140
    16384
    349520
    699040
    16384
    349520
    699040
    4096
    131070
    262140
    4096
    131070
    262140
    8192
    174760
    349520
    8192
    174760
    349520
    tcp_wmem
    4096
    24576
    196608
    4096
    24576
    196608
    16384
    65536
    524288
    16384
    65536
    524288
    4096
    24576
    196608
    4096
    24576
    196608
    8192
    32768
    262144
    8192
    32768
    262144
    tcp_mem
    589824
    786432
    1179648
    147456
    196608
    294912
    1572864
    2097152
    3145728
    393216
    524288
    786432
    589824
    786432
    1179648
    147456
    196608
    294912
    786432
    1048576
    1572864
    196608
    262144
    393216

  3. fwrite operations:

    read

    up to 32MB files
    64MB files (4 and 8MB records)
    256MB files (16MB records)
    1GB files
    512MB files (up to 4MB records)
    2GB files

    var5
    var6
    var3

    client
    server
    client
    server
    client
    server
    rmem_max
    393215 393215 524287
    524287
    262143 262143
    rmem_default
    319488 304128 425984 405504 212992 202752
    wmem_max
    393215 393215 524287 524287 262143 262143
    wmem_default
    319488 304128 425984 405504 212992 202752
    tcp_rmem
    4096
    262140
    524280
    4096
    262140
    524280
    16384
    349520
    699040
    16384
    349520
    699040
    8192
    174760
    349520
    8192
    174760
    349520
    tcp_wmem
    4096
    49152
    393215
    4096
    49152
    393216
    16384
    65536
    524288
    16384
    65536
    524288
    8192
    32768
    262144
    8192
    32768
    262144
    tcp_mem
    179648
    1572864
    2359296
    294912
    393216
    589824
    1572864
    2097152
    3145728
    393216
    524288
    786432
    786432
    1048576
    1572864
    196608
    262144
    393216

  4. fread operations:

    read

    4MB files
    8MB files (4 and 8MB records)
    512MB files (4 and 16MB records)
    8MB files (2MB records)
    1 and 2GB files

    var5
    var6
    var2
    var4

    client
    server
    client
    server
    client
    server
    client
    server
    rmem_max
    393215 393215 524287
    524287
    262143 262143 327679
    327679
    rmem_default
    319488 304128 425984 405504 212992 202752 266240 253440
    wmem_max
    393215 393215 524287 524287 262143 262143 327679 327679
    wmem_default
    319488 304128 425984 405504 212992 202752 266240 253440
    tcp_rmem
    4096
    262140
    524280
    4096
    262140
    524280
    16384
    349520
    699040
    16384
    349520
    699040
    8192
    174760
    349520
    8192
    174760
    349520
    10240
    218450
    436900
    10240
    218450
    436900
    tcp_wmem
    4096
    49152
    393215
    4096
    49152
    393216
    16384
    65536
    524288
    16384
    65536
    524288
    8192
    32768
    262144
    8192
    32768
    262144
    10240
    40960
    327680
    10240
    40960
    327680
    tcp_mem
    179648
    1572864
    2359296
    294912
    393216
    589824
    1572864
    2097152
    3145728
    393216
    524288
    786432
    786432
    1048576
    1572864
    196608
    262144
    393216
    983040
    1310720
    1966080
    245760
    327680
    491520

Note

While running the iozone tests, we noticed that the following scenario doesn't work:
  1. stop nfs on the server
  2. apply some tunables settings on both client and server machines
  3. start nfs on the server
  4. nfs-mount the tested directory
  5. run iozone
  6. unmount the tested directory
  7. stop nfs on the server
  8. apply some other tunables setting
  9. restart nfs on the server
  10. nfs-mount the tested directory
Step 10 leads to a hang.


End of Document