Why Linux is not responding – it’s the flaming file-wall!

I could not ping my Linux Server, and could not issue a traceroute command. It turns out the firewall was blocking the traceroute flow.

This blog posts describes how I checked this, and fixed the firewall problem.

Traceroute sends (by default) a UDP packet to a port address in the range 33434-33523. It usually responds with a “timed out” type response. If there is no response then there is a good chance that the packet is being dropped by a firewall.

See Understanding traceroute (or tracerte).

Using wireshark I could see UDP packets going in to my Linux, but there was no corresponding reply being returned.

When traceroute worked I got the out inbound UDP packet, and the outbound response with “destination unreachable” (which looks like a problem but actually shows normal behaviour) as shown in the data below. Wireshark highlights it with a black background, because it thinks it is a problem.

SourceDestinationDst PortportProtocolInfo
2001:db8::22001:db8::73343452119UDP52119 → 33434 Len=24
2001:db8::7 2001:db8::7 33434 52119 ICMPV6 Destination Unreachable (Port unreachable)

When traceroute failed I only got the inbound UDP packet

SourceDestinationDst PortportProtocolInfo
2001:db8::22001:db8::73343452119UDP52119 → 33434 Len=24

If the packets is blocked by a firewall, then the traceroute output will have “*” as the node name.

Useful Fire Wall (ufw) is documented here.

I was on Ubuntu Linux 20.04.

Display the status of the firewall

sudo ufw status verbose

This gave me

Status: active
Logging: off (low)
Default: deny (incoming), allow (outgoing), deny (routed)
New profiles: skip

To                         Action      From
--                         ------      ----
Anywhere                   ALLOW IN    10.1.1.2                  
22/tcp                     ALLOW IN    Anywhere                  
20,21,10000:10100/tcp      ALLOW IN    Anywhere                  
21/tcp                     ALLOW IN    Anywhere                  
20/tcp                     ALLOW IN    Anywhere                  
22/tcp (v6)                ALLOW IN    Anywhere (v6)             
20,21,10000:10100/tcp (v6) ALLOW IN    Anywhere (v6)             
21/tcp (v6)                ALLOW IN    Anywhere (v6)             
20/tcp (v6)                ALLOW IN    Anywhere (v6)         

By default,

  • incoming data is blocked
  • outbound data is allowed
  • routed data is blocked.

Logging is off, and problems are not reported.

The displays shows there are no rules for UDP – so any incoming UDP request is blocked (quietly dropped = dropped without telling anyone).

You may want to issue the command and pipe the output to a file, ufw.txt, to keep a record of the status before you make any changes. If you make any changes, they persist – even across reboot.

Enable logging to see what is being blocked

sudo ufw logging on

Rerun your traceroute or command.

At the bottom of /var/log/ufw I had (this has been reformatted to make it display better)

Nov 28 12:27:43 colinpaice kernel: [ 3317.641508] [UFW BLOCK] IN=enp0s31f6 OUT= MAC=8c:16:45:36:f4:8a:00:d8:61:e9:31:2a:86:dd SRC=2001:0db8:0000:0000:0000:0000:0000:0002 DST=2001:0db8:0000:0000:0000:0000:0000:0007
LEN=80
TC=0
HOPLIMIT=1
FLOWLBL=924186
PROTO=UDP
SPT=48582
DPT=33434
LEN=40

Wireshark gave me

Frame 4: 94 bytes on wire (752 bits), 94 bytes captured (752 bits) on interface enp0s31f6, id 0   
Ethernet II, Src: Micro-St_e9:31:2a (00:d8:61:e9:31:2a), Dst: LCFCHeFe_36:f4:8a (8c:16:45:36:f4:8a)
Internet Protocol Version 6, Src: 2001:db8::2, Dst: 2001:db8::7
    0110 .... = Version: 6
    .... 0000 0000 .... .... .... .... .... = Traffic Class: 0x00 
    .... .... .... 1110 0001 1010 0001 1010 = Flow Label: 0xe1a1a
    Payload Length: 40
    Next Header: UDP (17)
    Hop Limit: 1
    Source: 2001:db8::2
    Destination: 2001:db8::7
User Datagram Protocol, Src Port: 48582, Dst Port: 33434
    Source Port: 48582
    Destination Port: 33434
    Length: 40
    Checksum: 0x6ebd [unverified]
    [Checksum Status: Unverified]
    [Stream index: 0]
    [Timestamps]
Data (32 bytes)

From this, we can see the fields match up

  • flow label (0xe1a1a = 924186)
  • source 2001:db8::2 = 2001:0db8:0000:0000:0000:0000:0000:0002
  • destination 2001:db8::7 = 2001:0db8:0000:0000:0000:0000:0000:0007
  • source port 48582
  • destination port 33434.

Port 33434 is used by traceroute, so this is a good clue this is a traceroute packet.

The reason the record was written to the log is [UFW BLOCK]. The firewall blocked it.

The request came in over interface enp0s31f6.

How to enable it.

You can specify different filters, and granularity of parameters.

For example

  • sudo ufw rule allow log proto udp from 2001:db8::2
  • sudo ufw rule allow log in on enp0s31f6 log comment ‘colin-ethernet’
  • sudo ufw rule allow log proto udp to 2001:db8::7 port 33434:33523 from 2001:db8::2

Where enp0s31f6 is the name of the ethernet link where the traffic comes from.

When running with either of these, I had in the log file

Nov 28 17:03:12 colinpaice kernel: [19847.112045] 
[UFW ALLOW] 
IN=enp0s31f6 OUT= MAC=8c:16:45:36:f4:8a:00:d8:61:e9:31:2a:86:dd SRC=2001:0db8:0001:0000:0000:0000:0000:0009 
DST=2001:0db8:0000:0000:0000:0000:0000:0007 
LEN=60 TC=0 HOPLIMIT=1 FLOWLBL=0 PROTO=UDP SPT=33434 DPT=33440 LEN=20

and the traceroute worked.

Note: The comment ‘…’ is an administration aid to give a description. It does not come out in the logs.

Display the rules

sudo ufw status numbered

gave

Status: active

     To                         Action      From
     --                         ------      ----
[ 1] Anywhere                   ALLOW IN    10.1.1.2                  
[ 2] 22/tcp                     ALLOW IN    Anywhere                  
[ 3] 20,21,10000:10100/tcp      ALLOW IN    Anywhere                  
[ 4] 21/tcp                     ALLOW IN    Anywhere                  
[ 5] 20/tcp                     ALLOW IN    Anywhere                  
[ 6] Anywhere on enp0s31f6      ALLOW IN    Anywhere                   (log)
[ 7] 22/tcp (v6)                ALLOW IN    Anywhere (v6)             
[ 8] 20,21,10000:10100/tcp (v6) ALLOW IN    Anywhere (v6)             
[ 9] 21/tcp (v6)                ALLOW IN    Anywhere (v6)             
[10] 20/tcp (v6)                ALLOW IN    Anywhere (v6)             
[11] Anywhere (v6) on enp0s31f6 ALLOW IN    Anywhere (v6)              (log)
[12] 2001:db8::7 33434/udp      ALLOW IN    2001:db8::2                (log)

There is now a rule [12] for udp to 2001:db8::7 port 33434

You can use commands like

sudo ufw delete 6

to delete a row.

Note: Always display before delete. Having deleted the rule 6 – rule 7 now becomes rule 6, etc.

Now that it works…

Any changes to ufw are remembered across reboots.

You may want to turn off the logging, until the next problem

sudo ufw logging off

and remove the log from the fire wall rules, by deleting and re-adding the rule.

sudo ufw rule delete allow log proto udp from 2001:db8::2

sudo ufw rule allow proto udp from 2001:db8::2

Understanding traceroute (or tracerte)

I was trying to use traceroute to find the route between two nodes and I did not understand the output. Like many things, once you understand it, is obvious.

This is another of the little topics which I thought I understood, and found I did not.

For example

traceroute ibm.com

produces

traceroute to ibm.com (23.39.199.16), 30 hops max, 60 byte packets
 1  bthub.home (192.168.1.254)  4.139 ms  7.528 ms  10.629 ms
 2  * * *
 3  * * *

What does the “*” mean – and why are there 3 “*”?

Traceroute logic

At a conceptual level, the logic of traceroute is:

At the origin

  • Send a UDP packet over a link towards the remote node, with hop limit = 1.
  • Set a timer
  • Wait for the reply (with time out)

At an intermediate node

  • Set hop limit = hop limit -1
  • If hop limit = 0
    • then send a UDP packet back to the originator, giving the IP address of the intermediate node, and information like “Destination unreachable (Port unreachable)”, or “request timed out”.
    • else send the packet over a link towards the remote destination.

Back at the origin

  • Wait for the response. When the response arrives, stop the timer and calculate the duration.
  • Lookup the IP address of the intermediate node, to find the node name.
  • Display original hop count, the name of the intermediate node, IP address of the intermediate node, and duration of the request.

For example

1 colin.Linux.Server (2001:db8::2) 0.988 ms

This gives you information on the first hop in the chain.

Go to the next level.

You can take this further.

  • Repeat the operation multiple times. This allows you to get multiple response times, so you can see the range of responses times, and get an idea of the variation (or consistency) of the response time.
  • Repeat it with hop count = 2,3,4,5… . When the hop count is 1, you get information about the first hop, when the hop count is 2, you get information about the next hop etc.

For example for Linux

traceroute to 2001:db8:1::9 (2001:db8:1::9), 30 hops max, 80 byte packets
1 colin.Linux.Server (2001:db8::2)0.267 ms 0.207 ms 0.140 ms
2 Colin.zOS (2001:db8:1::9) 3.794 ms 6.215 ms 6.920 ms

It gets more interesting.

If you send multiple request, a node may decide to route the request down a different link, so you may get multiple IP addresses for each hop.

What if there is a problem?

Unknown address

Traceroute will report as much as it can. For example 2001:db8:1::10 does not exist.

traceroute to 2001:db8:1::10 (2001:db8:1::10), 30 hops max, 80 byte packets
1 colinpaice (2001:db8::7) 3053.091 ms !H 3052.807 ms !H

This reports as far as it got (colinpaice 2001:db8::7); and !H. On Linux you can have additional information (!H)

  • !H host unreachable
  • !N network unreachable
  • !P protocol unreachable
  • !S source route failed
  • !F fragmentation needed
  • !X communication administratively prohibited
  • !V host precedence violation
  • !C precedence cutoff in effect
  • !<num> ICMP unreachable code <num>

Lost or dropped packets

A intermediate node may not be able to send the response back, for example, a firewall may block (and drop) any UPD packets. The originator times-out waiting for the reply. In this case it reports “*” as the IP address, and cannot provide the duration of the requests. This can occur if the router does not support traceroute, there is no link back to the originator, or there is a firewall which drops packets (going out, or coming back).

More advanced requests

Specify a different home

By default traceroute uses the IP address of the connection it will use to send the packet.

For example I have a system with two interfaces

  • tap1,my end of the connection is 2001:db8:1::3 with the remote end having 2001:db8:1::9 (my z/OS)
  • eno1, my end of the connection is 2001:db8::2 with the remote end having 2001:db8::7 (my laptop)

If I use traceroute to my laptop, I can issue

traceroute 2001:db8::7

by default traceroute uses 2001:db8::2 as its starting point (the IP adddess of the direct connection). I can see this in the wireshark trace.

I can use traceroute to my laptop , and say start from the IP address of the connection to z/OS

traceroute6 -s 2001:db8:1::3 2001:db8::7

-s says use a different starting address 2001:db8:1::3 – corresponding to the link to z/OS as its starting point.

When I used this command, my request was blocked, as my firewall was configured to accept traffic from 2001:db8:2, and not from 2001:db8:1::3.

Make traceroute fail quicker

The traceroute defaults are to try a maximum of 30 hops and wait 5 seconds so you could wait for over 2 minutes if there was a problem). If you know your network is small (at most 3 hops) and responds in under a second, you can use

traceroute -6 -w 2 -q 5 -m 3 2001:db8::7

  • -6 for IP V6 (or just use traceroute6)
  • -w 2 wait for up to 2 seconds
  • -q 1 send out 1 UPD request on each hop
  • -m 3 a maximum of 3 hops.

On z/OS use the tracerte command

tso tracerte 2001:db8::7 (try 1 wait 1 max 2

I’m bored with giving the same reply to messages on the console.

When using z/OS I have to reply to messages, for example at startup and shutdown. After several months of this I was getting bored, and found z/OS has an auto reply capability.

In the SYS1.PARMLIB concatenation you can have AUTORxx members.

For example in SYS1.PARMLIB(AUTOR00) is

/* ARC0380A RECALL WAITING FOR VOLUME volser IN USE BY HOST procid, */ 
/*          FUNCTION function. REPLY WAIT, CANCEL, OR MOUNT         */ 
/*                                                                  */ 
/* Rule: 1                                                          */ 
/*                                                                  */ 
   Msgid(ARC0380A)   Delay(60S) Reply(CANCEL)                          

So if you get message ARC0380A, after 60 seconds it will reply CANCEL. If you are quick you could reply with something else.

If you always reply with the same value you could specify DELAY(0S)… but this means you cannot reply to the message with a different value… DELAY(5S) may be better.

You can specify multiple values in the REPLY(a,b,c), and can use system symbolics such as &SYSNAME.

REPLY(‘system=&SYSNAME.’,‘,option1,option2’)

My parmlib member IEASYS00 includes AUTOR=(00,DT), so members AUTOR00 and AUTORDT are used.