Posts Tagged ‘MTU’

Fix staled NFS on server with dmesg error log nfs: server nfs-server not responding, still trying

Saturday, March 16th, 2019

NFS_Filesystem-fix-staled-NFS-System-dmesg-error-nfs-server-not-responding-still-trying

On a server today I've found to have found a number of NFS mounts mounted through /etc/fstab file definitions that were hanging;
 

nfs-server:~# df -hT


 command kept hanging as well as any attempt to access the mounted NFS directory was not possible.
The server with the hanged Network File System is running SLES (SuSE Enterprise Linux 12 SP3) a short investigation in the kernel logs (dmesg) as well as /var/log/messages reveales following errors:

 

nfs-server:~# dmesg
[3117414.856995] nfs: server nfs-server OK
[3117595.104058] nfs: server nfs-server not responding, still trying
[3117625.032864] nfs: server nfs-server OK
[3117805.280036] nfs: server nfs-server not responding, still trying
[3117835.209110] nfs: server nfs-server OK
[3118015.456045] nfs: server nfs-server not responding, still trying
[3118045.384930] nfs: server nfs-server OK
[3118225.568029] nfs: server nfs-server not responding, still trying
[3118255.560536] nfs: server nfs-server OK
[3118435.808035] nfs: server nfs-server not responding, still trying
[3118465.736463] nfs: server nfs-server OK
[3118645.984057] nfs: server nfs-server not responding, still trying
[3118675.912595] nfs: server nfs-server OK
[3118886.098614] nfs: server nfs-server OK
[3119066.336035] nfs: server nfs-server not responding, still trying
[3119096.274493] nfs: server nfs-server OK
[3119276.512033] nfs: server nfs-server not responding, still trying
[3119306.440455] nfs: server nfs-server OK
[3119486.688029] nfs: server nfs-server not responding, still trying
[3119516.616622] nfs: server nfs-server OK
[3119696.864032] nfs: server nfs-server not responding, still trying
[3119726.792650] nfs: server nfs-server OK
[3119907.040037] nfs: server nfs-server not responding, still trying
[3119936.968691] nfs: server nfs-server OK
[3120117.216053] nfs: server nfs-server not responding, still trying
[3120147.144476] nfs: server nfs-server OK
[3120328.352037] nfs: server nfs-server not responding, still trying
[3120567.496808] nfs: server nfs-server OK
[3121370.592040] nfs: server nfs-server not responding, still trying
[3121400.520779] nfs: server nfs-server OK
[3121400.520866] nfs: server nfs-server OK


It took me a short while to investigate and check the NetApp remote NFS storage filesystem and investigate the Virtual Machine that is running on top of OpenXen Hypervisor system.
The NFS storage permissions of the exported file permissions were checked and they were in a good shape, also a reexport of the NFS mount share was re-exported and on the Linux
mount host the following commands ran to remount the hanged Filesystems:

 

nfs-server:~# umount -f /mnt/nfs_share
nfs-server:~# umount -l /mnt/nfs_share
nfs-server:~# umount -lf /mnt/nfs_share1
nfs-server:~# umount -lf /mnt/nfs_share2
nfs-server:~# mount -t nfs -o remount /mnt/nfs_share


that fixed one of the hanged mount, but as I didn't wanted to manually remount each of the NFS FS-es, I've remounted them all with:

nfs-server:~# mount -a -t nfs


This solved it but, the fix seemed unpermanent as in a time while the issue started reoccuring and I've spend some time
in further investigation on the weird NFS hanging problem has led me to the following blog post where the same problem was described and it was pointed the root cause of it lays
in parameter for MTU which seems to be quite high MTU 9000 and this over the years has prooven to cause problems with NFS especially due to network router (switches) configurations
which seem to have a filters for MTU and are passing only packets with low MTU levels and using rsize / wzise custom mount NFS values in /etc/fstab could lead to this strange NFS hangs.

Below is a list of Maximum Transmission  Unit (MTU) for Media Transport excerpt taken from wikipedia as of time of writting this article.

https://www.pc-freak.net/images/Maximum-Transmission-Unit-for-Media-Transport-diagram-3.png

In my further research on the issue I've come across this very interesting article which explains a lot on "Large Internet" and Internet Performance

I've used tracepath command which is doing basicly the same as traceroute but could be run without root user and discovers hops (network routers) and shows MTU between path -> destionation.

Below is a sample example

nfs-server:~# tracepath bergon.net
 1?: [LOCALHOST]                      pmtu 1500
 1:  192.168.6.1                                           0.909ms
 1:  192.168.6.1                                           0.966ms
 2:  192.168.222.1                                         0.859ms
 3:  6.192.104.109.bergon.net                              1.138ms reached
     Resume: pmtu 1500 hops 3 back 3

 

Optiomal pmtu for this connection is to be 1500 .traceroute in some cases might return hops with 'no reply' if there is a router UDP  packet filtering implemented on it.

The high MTU value for the Storage network connection interface on eth1 was evident with a simple:

 

 nfs-server:~# /sbin/ifconfig |grep -i eth -A 2
eth0      Link encap:Ethernet  HWaddr 00:16:3E:5C:65:74
          inet addr:100.127.108.56  Bcast:100.127.109.255  Mask:255.255.254.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth1      Link encap:Ethernet  HWaddr 00:16:3E:5C:65:76
          inet addr:100.96.80.94  Bcast:100.96.83.255  Mask:255.255.252.0
          UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1


The fix was as simple to lower MTU value for eth1 Ethernet interface to 1500 which is the value which most network routers are configured too.

To apply the new MTU to the eth1 interface without restarting the SuSE SLES networking , I first used ifconfig one time with:

 

 nfs-server:~# /sbin/ifconfig eth1 mtu 1500
 nfs-server:~# ip addr show
 …


To make the setting permanent on next  SuSE boot:

I had to set the MTU=1500 value in

 

nfs-server:~#/etc/sysconfig/network/ifcfg-eth1
nfs-server:~#  ip address show eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 8c:89:a5:f2:e8:d8 brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.1/24 brd 192.168.0.255 scope global eth1
       valid_lft forever preferred_lft forever

 


Then to remount the NFS mounted hanged filesystems once again ran:
 

nfs-server:~# mount -a -t nfs


Many network routers keeps the MTU to low as 1500 also because a higher values causes IP packet fragmentation when using NFS over UDP where IP packet fragmentation and packet
reassembly requires significant amount of CPU at both ends of the network connection.
Packet fragmentation also exposes network traffic to greater unreliability, since a complete RPC request must be retransmitted if a UDP packet fragment is dropped for any reason.
Any increase of RPC retransmissions, along with the possibility of increased timeouts, are the single worst impediment to performance for NFS over UDP.
This and many more is very well explained in Optimizing NFS Performance page (which is a must reading) for any sys admin that plans to use NFS frequently.

Even though lowering MTU (Maximum Transmission Union) value does solved my problem at some cases especially in a modern local LANs with Jumbo Frames, allowing and increasing the MTU to 9000 bytes
might be a good idea as this will increase the amount of packet size.and will raise network performance, however as always on distant networks with many router hops keeping MTU value as low as 1492 / 5000 is always a good idea.

 

FreeBSD Jumbo Frames network configuration short how to

Wednesday, March 14th, 2012

FreeBSD Jumbo Frames Howto configure FreeBSD

Recently I wrote a post on how to enable Jumbo Frames on GNU / Linux , therefore I thought it will be useful to write how Jumbo Frames network boost can be achieved on FreeBSD too.

I will skip the details of what is Jumbo Frames, as in the previous article I have thoroughfully explained. Just in short to remind you what is Jumbo Frames and why you might need it? – it is a way to increase network MTU transfer frames from the MTU 1500 to MTU of 9000 bytes

It is interesting to mention that according to specifications, the maximum Jumbo Frames MTU possible for assignment are of MTU=16128
Just like on Linux to be able to take advantage of the bigger Jumbo Frames increase in network thoroughput, you need to have a gigabyt NIC card/s on the router / server.

1. Increasing MTU to 9000 to enable Jumbo Frames "manually"

Just like on Linux, the network tool to use is ifconfig. For those who don't know ifconfig on Linux is part of the net-tools package and rewritten from scratch especially for GNU / Linux OS, whether BSD's ifconfig is based on source code taken from 4.2BSD UNIX

As you know, network interface naming on FreeBSD is different, as there is no strict naming like on Linux (eth0, eth1, eth2), rather the interfaces are named after the name of the NIC card vendor for instance (Intel(R) PRO/1000 NIC is em0), RealTek is rl0 etc.

To set Jumbro Frames Maximum Transmission Units of 9000 on FreeBSD host with a Realtek and Intel gigabyt ethernet cards use: freebsd# /sbin/ifconfig em0 192.168.1.2 mtu 9000
freebsd# /sbin/ifconfig rl0 192.168.2.2 mtu 9000

!! Be very cautious here, as if you're connected to the system remotely over ssh you might loose connection to it because of broken routing.

To prevent routing loss problems, if you're executing the above two commands remotely, you better run them in GNU screen session:

freebsd# screen
freebsd# /sbin/ifconfig em0 192.168.1.2 mtu 9000; /sbin/ifconfig rl0 192.168.1.2 mtu 9000; \
/etc/rc.d/netif restart; /etc/rc.d/routed restart

2. Check MTU settings are set to 9000

If everything is fine the commands will return empty output, to check further the MTU is properly set to 9000 issue:

freebsd# /sbin/ifconfig -a|grep -i em0em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000freebsd# /sbin/ifconfig -a|grep -i rl0
rl0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000

3. Reset routing for default gateway

If you have some kind of routing assigned for em0 and rl0, network interfaces they will be affected by the MTU change and the routing will be gone. To reset the routing to the previously properly assigned routing, you have to restart the BSD init script taking care for assigning routing on system boot time:

freebsd# /etc/rc.d/routing restart
default 192.168.1.1 done
add net default: gateway 192.168.1.1
Additional routing options: IP gateway=YES.

4. Change MTU settings for NIC card with route command

There is also a way to assign higher MTU without "breaking" the working routing, e.g. avoiding network downtime with bsd route command:

freebsd# grep -i defaultrouter /etc/rc.conf
defaultrouter="192.168.1.1"
freebsd# /sbin/route change 192.168.1.1 -mtu 9000
change host 192.168.1.1

5. Finding the new MTU NIC settings on the FreeBSD host

freebsd# /sbin/route -n get 192.168.1.1
route to: 192.168.1.1
destination: 192.168.1.1
interface: em0
flags: <UP,HOST,DONE,LLINFO,WASCLONED>
recvpipe sendpipe ssthresh rtt,msec rttvar hopcount mtu expire
0 0 0 0 0 0 9000 1009

6. Set Jumbo Frames to load automatically on system load

To make the increased MTU to 9000 for Jumbo Frames support permanent on a FreeBSD system the /etc/rc.conf file is used:

The variable for em0 and rl0 NICs are ifconfig_em0 and ifconfig_rl0.
The lines to place in /etc/rc.conf should be similar to:

ifconfig_em0="inet 192.168.1.1 netmask 255.255.255.0 media 1000baseTX mediaopt half-duplex mtu 9000"
ifconfig_em0="inet 192.168.1.1 netmask 255.255.255.0 media 1000baseTX mediaopt half-duplex mtu 9000"

Change in the above lines the gateway address 192.168.1.1 and the netmask 255.255.255.0 to yours corresponding gw and netmask.
Also in the above example you see the half-duplex ifconfig option is set insetad of full-duplex in order to prevent some duplex mismatches. A full-duplex could be used instead, if you're completely sure on the other side of the host is configured to support full-duplex connections. Otherwise if you try to set full-duplex with other side set to half-duplex or auto-duplex a duplex mismatch will occur. If this happens insetad of taking the advantage of the Increase Jumbo Frames MTU the network connection could become slower than originally with standard ethernet MTU of 1500. One other bad side if you end up with duplex-mismatch could be a high number of loss packets and degraded thoroughout …

7. Setting Jumbo Frames for interfaces assigning dynamic IP via DHCP

If you need to assign an MTU of 9000 for a gigabyt network interfaces, which are receiving its TCP/IP network configuration over DHCP server.
First, tell em0 and rl0 network interfaces to dynamically assign IP addresses via DHCP proto by adding in /etc/rc.conf:

ifconfig_em0="DHCP"
ifconfig_rl0="DHCP"

Secondly make two files /etc/start_if.em0 and /etc/start_if.rl0 and include in each file:

ifconfig em0 media 1000baseTX mediaopt full-duplex mtu 9000
ifconfig rl0 media 1000baseTX mediaopt full-duplex mtu 9000

Copy / paste in root console:

echo 'ifconfig em0 media 1000baseTX mediaopt full-duplex mtu 9000' >> /etc/start_if.em0
echo 'ifconfig rl0 media 1000baseTX mediaopt full-duplex mtu 9000' >> /etc/start_if.rl0

Finally, to load the new MTU for both interfaces, reload the IPs with the increased MTUs:

freebsd# /etc/rc.d/routing restart
default 192.168.1.1 done
add net default: gateway 192.168.1.1

8. Testing if Jumbo Frames is working correctly

To test if an MTU packs are transferred correctly through the network you can use ping or tcpdumpa.) Testing Jumbo Frames enabled packet transfers with tcpdump

freebsd# tcpdump -vvn | grep -i 'length 9000'

You should get output like:

16:40:07.432370 IP (tos 0x0, ttl 50, id 63903, offset 0, flags [DF], proto TCP (6), length 9000) 192.168.1.2.80 > 192.168.1.1.60213: . 85825:87285(1460) ack 668 win 14343
16:40:07.432588 IP (tos 0x0, ttl 50, id 63904, offset 0, flags [DF], proto TCP (6), length 9000) 192.168.1.2.80 > 192.168.1.1.60213: . 87285:88745(1460) ack 668 win 14343
16:40:07.433091 IP (tos 0x0, ttl 50, id 63905, offset 0, flags [DF], proto TCP (6), length 9000) 192.168.1.2.80 > 192.168.1.1.60213: . 23153:24613(1460) ack 668 win 14343
16:40:07.568388 IP (tos 0x0, ttl 50, id 63907, offset 0, flags [DF], proto TCP (6), length 9000) 192.168.1.2.80 > 192.168.1.1.60213: . 88745:90205(1460) ack 668 win 14343
16:40:07.568636 IP (tos 0x0, ttl 50, id 63908, offset 0, flags [DF], proto TCP (6), length 9000) 192.168.1.2.80 > 192.168.1.1.60213: . 90205:91665(1460) ack 668 win 14343
16:40:07.569012 IP (tos 0x0, ttl 50, id 63909, offset 0, flags [DF], proto TCP (6), length 9000) 192.168.1.2.80 > 192.168.1.1.60213: . 91665:93125(1460) ack 668 win 14343
16:40:07.569888 IP (tos 0x0, ttl 50, id 63910, offset 0, flags [DF], proto TCP (6), length 9000) 192.168.1.2.80 > 192.168.1.1.60213: . 93125:94585(1460) ack 668 win 14343

b.) Testing if Jumbo Frames are enabled with ping

Testing Jumbo Frames with ping command on Linux

linux:~# ping 192.168.1.1 -M do -s 8972
PING 192.168.1.1 (192.168.1.1) 8972(9000) bytes of data.
9000 bytes from 192.168.1.1: icmp_req=1 ttl=52 time=43.7 ms
9000 bytes from 192.168.1.1: icmp_req=2 ttl=52 time=43.3 ms
9000 bytes from 192.168.1.1: icmp_req=3 ttl=52 time=43.5 ms
9000 bytes from 192.168.1.1: icmp_req=4 ttl=52 time=44.6 ms
--- 192.168.0.1 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3003ms
rtt min/avg/max/mdev = 2.397/2.841/4.066/0.708 ms

If you get insetad an an output like:

From 192.168.1.2 icmp_seq=1 Frag needed and DF set (mtu = 1500)
From 192.168.1.2 icmp_seq=1 Frag needed and DF set (mtu = 1500)
From 192.168.1.2 icmp_seq=1 Frag needed and DF set (mtu = 1500)
From 192.168.1.2 icmp_seq=1 Frag needed and DF set (mtu = 1500)

--- 192.168.1.1 ping statistics ---
0 packets transmitted, 0 received, +4 errors

This means a packets with maximum MTU of 1500 could be transmitted and hence something is not okay with the Jumbo Frames config.
Another helpful command in debugging MTU and showing which host in a hop queue support jumbo frames is Linux's traceroute

To debug a path between host and target, you can use:

linux:~# traceroute --mtu www.google.com
...

If you want to test the Jumbo Frames configuration from a Windows host use ms-windows ping command like so:

C:\>ping 192.168.1.2 -f -l 8972
Pinging 192.168.1.2 with 8972 bytes of data:
Reply from 192.168.1.2: bytes=8972 time=2ms TTL=255
Reply from 192.168.1.2: bytes=8972 time=2ms TTL=255
Reply from 192.168.1.2: bytes=8972 time=2ms TTL=255
Reply from 192.168.1.2: bytes=8972 time=2ms TTL=255
Ping statistics for 192.168.1.2:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 2ms, Maximum = 2ms, Average = 2ms

Here -l 8972 value is actually equal to 9000. 8972 = 9000 – 20 (20 byte IP header) – 8 (ICMP header)