How to change and delete network routing

This blog post arose when trying to document IP V6 routing. The behaviour did not work as expected.

Static routes

Linux

On Linux, you can use command like

sudo ip -6 route add 2001:0db8:1::9/64 dev tap1
sudo ip -6 route del 2001:0db8:1::9/64 dev tap1
sudo ip -6 route flush 2001:0db8:1::9/64 dev tap1
sudo ip -6 -statistics route flush 2001:0db8:1::9/64 dev tap1

Some information may be stored in the neighbour cache so you may need to use commands like:

sudo ip -6 neigh flush to 2001:db8:1::9
sudo ip -6 neigh flush dev tap1

z/OS

You define static routes between BEGINRoutes and ENDRoutes. If you want to change one entry, you have to replace all entries. You cannot add or remove individual entries.

You cannot have an empty BEGINRoute… ENDRoute. If used, it has to have at least one entry. You can create a dummy entry that will never be used.

You can change this file, and use the OBEY command to activate it

v tcpip,tcpip,obeyfile,USER.Z24C.TCPPARMS(routefc)

To delete an entry, remove it from the file, and activate the file.

Dynamic routes

Dynamic routes are created from facilities like radvd on Linux. This defines capability available on an interface.

For example

interface  tap1
{
   AdvSendAdvert on; 
   AdvDefaultLifetime 60;

   prefix 2001:db8:1::/64
   {
   };
   route 2001:db8::/64 {AdvRouteLifetime 1800};
   route 2008::/64{ AdvRouteLifetime 600};
};

At the Linx(sender)

  • it creates an internal address(fe80::…) for the tap1 interface
  • it create a route saying to get to 2001:db8::1/64 route it over the tap1 interface.

At the remote end, it receives a Router Advertisement packet. This includes

  • The IP address of tap1 at the Linux end (created above)
  • Route statements which have
    • The address range
    • The route lifetime.

From this, the remote end creates a dynamic route for every route in the packet.

How to change a dynamic route

Change the radvd configuration file, then either

  • cancel and restart ravd
    • ps ax |grep radvd |grep -v grep |awk ‘{print $1 }’|sudo xargs kill -9
    • sudo radvd -d 5 -l /var/log/radvd.log -m logfile -C radvd.conf
  • send a SIGHUP (sudo kill -s HUP <pid> )

The route expires after AdvRouteLifetime seconds. For the route to remain current, there needs to be a regular Router Advertisement message.

Displaying a dynamic route

On Linux ip -6 route gave me (for one of the dynamic routes) for one entry

2008::/64 dev eno1 proto ra metric 100 pref medium
2008::/64 dev eno1 proto kernel metric 256 expires 86293sec pref medium

Where 86293sec is just under 1 day.

On z/OS you can display this dynamic routing information using the TSO NETSTAT ROUTE RADV DETAIL command.

Example output

DestIP:   2008::/64 
  Gw:     fe80::8024:bff:fe45:840c 
  Intf:   IFPORTCP6  MTU:  0 
  Metric: 00000002   LifetimeExp: 12/20/2022 12:32 
  GwReachable:  Yes  IntfActive:  Yes 

This shows there is an entry for 2008::/64, and it is due to expire at 20th December 2022 at 12:32.

How do I delete a dynamically created route?

You have several ways

  • Remove it from the radvd configuraton file. Restart radvd, and let the definition expire – possibly hours later.
  • On Linux, remove the entry from the config file, restart radvd, use route delete….
  • Cause it to expire earlier
    • Edit the configuration file
    • Set AdvRouteLifetime to a low value like 10 seconds,
    • Restart the radvd agent. This sends the RA message to the remote system, and sets the expiry time of the one of interest,
    • Delete the route from the config file,
    • Restart the radvd agent again. This sends the RA which will not have the route.

The information may still be in the neighbourhood cache, and this may need to be flushed.

Creating a default router

For the interface statement, set a default life time > 0. A value of 0 says this is not a default router.

interface  tap1
{  
   AdvSendAdvert on;
   AdvDefaultLifetime 0;
...

To remove the default router, set AdvDefaultLifetime to 0; and redeploy.

If there is a static definition for the default route, this will be used in preference to the dynamically defined router.

What does tso netstat neighbour give you?

The command TSO NETSTAT ND gave me

Query Neighbor cache for 2001:db8:1:0:8024:bff:fe45:840c 
  IntfName: IFPORTCP6          IntfType: IPAQENET6 
  LinkLayerAddr: 82240B45840C  State: Reachable 
  Type: Router                 AdvDfltRtr: No 

Query Neighbor cache for fe80::8024:bff:fe45:840c 
  IntfName: IFPORTCP6          IntfType: IPAQENET6 
  LinkLayerAddr: 82240B45840C  State: Reachable 
  Type: Router                 AdvDfltRtr: No 

Query Neighbor cache for fe80::9863:1eff:fe13:1408 
  IntfName: JFPORTCP6          IntfType: IPAQENET6 
  LinkLayerAddr: 9A631E131408  State: Reachable 
  Type: Router                 AdvDfltRtr: No 

On Linux the

ip -6 addr

command gave me

tap1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UNKNOWN qlen 1000
    inet6 2001:db8:1:0:b0fd:f92b:8362:577b/64 ...
    inet6 2001:db8:1:0:8024:bff:fe45:840c/64 ...
    inet6 fe80::8024:bff:fe45:840c/64 ...

tap2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UNKNOWN qlen 1000
    inet6 fe80::9863:1eff:fe13:1408/64 ...

The TSO output means

  • Query Neighbor cache for 2001:db8:1:0:8024:bff:fe45:840c. The address is one of the addresses on the remote end of the connection. There is an entry because some traffic came via the address.
  • IntfName: IFPORTCP6 The z/OS Interface name used to create the defintion
  • IntfType: IPAQENET6 the OSA-Express QDIO interfaces statement
  • LinkLayerAddr: 82240B45840C
  • State: Reachable Other options can include stale, which means z/OS has not heard anything from this address for a while
  • Type: Router
  • AdvDfltRtr: No. The information passed in the Router Advertisement, said this was connection does not Advertise a Default Router(AdvDfltRtr).

From the NETSTAT ND output we can see data has been received from

  • IFPORTCP6:2001:db8:1:0:8024:bff:fe45:840c
  • IFPORTCP6:fe80::8024:bff:fe45:840c
  • JFPORTCP6:fe80::9863:1eff:fe13:1408

To get data to flow down the 2001…. address I had to use

ping -I 2001:db8:1:0:8024:bff:fe45:840c 2001:db8:1::9

Where the -I says use the interface address.

You can get information about bytes processed by interface (not by address) using the TSO NETSTAT DEVLINKS command.

Why has my packet suddenly decided to go over there? Grrr

As one of the many problems I had trying to get IPV6 routing to work, I found that I could run a configuration script, and it would all work successfully (including a ping) – then a few seconds later, a manual ping would not work.

I had a shell script to display all my IP configuration information, to display the route information all in one line… including

option="-6 -o"
echo "==ROUTE"
ip $option route  |awk '{ print "ROUTE", $0 } '

When it worked, my route was

ROUTE ::1 dev lo proto kernel metric 256 pref medium
ROUTE 2001:db8::/64 dev enp0s31f6 proto ra metric 100 pref medium
ROUTE 2001:db8::/64 dev enp0s31f6 proto kernel metric 256 expires 86395sec pref medium
ROUTE 2001:db99::/64 dev enp0s31f6 proto ra metric 100 pref medium
ROUTE 2a00:23c5:978f:6e01::/64 dev wlp4s0 proto ra metric 600 pref medium
ROUTE fe80::/64 dev enp0s31f6 proto kernel metric 100 pref medium
ROUTE fe80::/64 dev wlp4s0 proto kernel metric 600 pref medium
ROUTE default via fe80::966a:b0ff:fe85:54a7 dev wlp4s0 proto ra metric 600 pref medium
ROUTE default via fe80::a2f0:9936:ddfd:95fa dev enp0s31f6 proto ra metric 1024 expires 132sec hoplimit 64 pref medium
ROUTE default via fe80::a2f0:9936:ddfd:95fa dev enp0s31f6 proto ra metric 20100 pref medium

Some interesting information in this display (see the man page here)

ROUTE 2001:db8::/64 dev enp0s31f6 proto ra metric 100 pref medium
  • 2001:db8::/64, this is the prefix of length 64 bits so 2001:db8:0:0. It is the address range 2001:0db8:0000:0000…. where …. is 0000:0000:0000:0000 to ffff:ffff:ffff:ffff
  • dev enp0s31f6 is the device (also known as the interface)
  • proto ra. The protocol was installed by Router Discovery protocol
  • metric 100. When there is a choice of valid routes, the lower the metric, the more it is favoured.
  • pref medium. Preference medium (out of high, medium, low).

Another interesting one is

ROUTE default via fe80::a2f0:9936:ddfd:95fa dev enp0s31f6 proto ra metric 20100 pref medium
  • If no other routes are found use the default, route, via enp0s31f6, installed by router discovery protocol(ra).
  • The metric is 20100 – so a low priority value.

A short while later, when ping failed, there was an additional route

ROUTE default via fe80::966a:b0ff:fe85:54a7 dev wlp4s0 proto ra metric 600 pref medium

With this the metric is 600 – which is lower than 20100 from before, so packets were sent to the wireless interface – which did not know what to do with them, and dropped them!

Solution

I used

sudo ip -6 route replace default via fe80::a2f0:9936:ddfd:95fa dev enp0s31f6 proto ra metric 200 pref medium

where the metric value was lower than the metric value for the wireless connection, and ping worked.

The above solution worked, but the IP v6 address changed from day to day. The following worked better as it has a permanent global address.

sudo ip -6 route replace default via 2001:db8::2 dev enp0s31f6 proto ra metric 200  pref medium

where 2001:db8::2 is the IP address of the connection on the remote, server, machine. This was done using

sudo ip addr add 2001:db8::2/64 dev eno1

Getting IP v6 static routing from Linux to/from z/OS

For me this was an epic journey, taking weeks to get working. It was like a magical combination lock, which will not open unless all of the parameters are correct, today has an ‘r’ in the month, and you are standing on one leg. Once you know the secrets, it is easy.

With IP V6 there is a technology called dynamic discovery which is meant to make configuring your IP network much easier. Each node asks the adjacent nodes what IP addresses they have, and so your connection to the next box magically works. I could not get this to work, and thought I would do the simpler task of static configuration – this had similar problems – but they were smaller problems.

There were two three four five six seven key things that were needed to get ping to work in my setup:

The key things

Allow forwarding between interfaces

On Linux

sudo sysctl -w net.ipv6.conf.all.forwarding=1

The documentation says “… conf/all/forwarding – Enable global IPv6 forwarding between all interfaces”.

Clearing the cache

Routing and neighbourhood definitions are cached for a period. If you change a definition, and activate it, an old definition may still be used. I found I got different results if I rebooted, re-ipled, or went for a cup of tea; it worked – then next time I tried it with the same definitions, it did not work. Clearing the routing and neighbourhood cache made it more consistent.

On z/OS use V TCPIP,,PURGECACHE,IFPORTCP6

On Linux use sudo ip -6 neigh flush all

Put a delay between creating definitions and using them.

I had a 2 second delay between creating a definition, and using it, which helped getting it to work. I think data is propagated between the system, and issuing a ping or other command immediately after a definition, was too fast for it,

A timing window

I had scripts to clear and redefine the definitions. Some times if I ran the laptop script then the server script, then ping would not work. If I reran the laptop script, then usually ping worked. Sometimes I had to rerun the server script.

The default route would often change.

The wireless connection to the server was unreliable. There would be a route from my laptop to the server via the wireless. Then a few minutes later the connection to the server would stop, and so alternate routes had to be used, because traffic via the wireless would be dropped.

I got around this problem, by explicit coding of the routes and not needing to use the default definitions. (Also disabled the wireless connection while debugging)

The correct route syntax

I found I was getting “Neigbor Solicitation” instead of the static routing. To prevent this the route on the laptop needed the via…

sudo ip -6 route add 2001:db8:1::9/128 via 2001:db8::2 dev enp0s31f6

and not

sudo ip -6 route add 2001:db8:1::9/128                dev enp0s31f6

See Is “via” needed when creating a Linux IP route?

The z/OS IP address kept changing across IPLs

Why is my z/OS IP address changing when using zPDT, and routing does not work?

Configuration

  • The laptop had an Ethernet connection to the server.
  • The server had an Ethernet like connection to z/OS. This was a tunnel(tap1), looking like an OSA to z/OS

The addresses:

Laptop Ethernet (enp0s31f6)2001:db8:::7
Server Ethernet (eno1)2001:db8:::2
Server Tunnel (tap1)2001:db8:1::3
Z/OS interface (ifacecp6)2001:db1::9

The Laptop side had prefix 2001:db8:0::/64, the z/OS side had prefix 2001:db8:1::/64 . See One minute topic: Understanding IP V6 addressing and routing if these numbers look strange.

Definitions

z/OS routing definitions

BEGINRoutes 
;     Destination      FirstHop          LinkName   Size 
ROUTE default6         2001:db8:99::3    IF2        MTU  1492
ROUTE 2001:db8:99::/64 2001:db8:99::3    IF2        MTU 5000 

ROUTE 2001:db8::/64    2001:db8:1::3     IFPORTCP6  MTU 5000 
ROUTE 2001:db8:1::/64  2001:db8:1::3     IFPORTCP6  MTU 5000 
                                                                              
ENDRoutes 

Where

  • default6 says if no other routes match, then send the traffic down IF2 connection. At the remote end of the IF2 connection, it has IP address 2001:db8:99::3.
  • Traffic for 2001:db8:99::/64 should be sent down interface IF2 – which has an address 2001:db8:99::3 at the remote end
  • Traffic for 2001:db8::/64 (2001:db8:0::/64) should be sent down interface IFPORTCP6 which has address 2001:db8:1::3 at the remote end.
  • Traffic for 2001:db8:1::/64 should be sent down interface IFPORTCP6 which has address 2001:db8:1::3 at the remote end.

I needed a route for both 2001:db8::/64 and 2001:db8:1::/64 as one was the route to the laptop, the other was the route to the Linux server.

Linux Server machine

On my Linux machine I had

from ip -6 addr

tap1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UNKNOWN qlen 1000
 inet6 2001:db8:1::3/64 scope global 
    valid_lft forever preferred_lft forever
 inet6 2001:db8::3/64 scope global 
    valid_lft forever preferred_lft forever
 inet6 fe80::e852:31ff:fe0f:81da/64 scope link 
    valid_lft forever preferred_lft forever

I used the global address 2001:db8:1::3 in my z/OS routing statement.

The documentation implies I should use the link-local address fe80::e852:31ff:fe0f:81da in my static z/OS definitions, but I could not see how to use this, as it changed every time I ipled my z/OS. This means I need to explicitly define an address on Linux for this connection ( 2001:db8:1::3).

Linux Server definitions

On my Linux server I defined static definitions.

sudo sysctl -w net.ipv6.conf.all.forwarding=1

# clear the state every time
sudo ip -6 route flush root 2001:db8:1::/64
sudo ip -6 route flush root 2001:db8::/64
sudo ip -6 neigh flush all 

# define the interface to z/OS
sudo ip -6 addr del 2001:db8:1::3/64 dev tap1
sudo ip -6 addr add 2001:db8:1::3/64 dev tap1

sudo ip -6 addr del 2001:db8::2/64 dev eno1
sudo ip -6 addr add 2001:db8::2/64 dev eno1


sudo ip -6 route del 2001:db8::/64 dev eno1
sudo ip -6 route add 2001:db8::/64 dev eno1

sudo ip -6 route del 2001:db8:1::9 dev tap1
sudo ip -6 route add  2001:db8:1::/64   dev tap1

# sudo traceroute -d -m 2 -n -q 1 -I    2001:db8::7 
# ping 2001:db8::7 -c 1 -r
# ping 2001:db8:1::9 -c 1 -r

This script grew as I added all of the options to get it to work.

The statements are

sudo sysctl -w net.ipv6.conf.all.forwarding=1

This enables the cross interface traffic.

sudo ip -6 route flush root 2001:db8:1::/64
sudo ip -6 route flush root 2001:db8::/64
sudo ip -6 neigh flush all

These clear the routing for the two addresses, and for the neighbourhood cache. I do not know if these are required, without them the results were not consistent.

#give the interface to z/OS an explicit address
sudo ip -6 addr del 2001:db8:1::3/64 dev tap1
sudo ip -6 addr add 2001:db8:1::3/64 dev tap1


#give the connection to the Laptop an explicit address
sudo ip -6 addr del 2001:db8::2/64 dev eno1
sudo ip -6 addr add 2001:db8::2/64 dev eno1

These deleted then created global addresses for the server end of the interfaces.

sudo ip -6 route del 2001:db8::/64 dev eno1
sudo ip -6 route add 2001:db8::/64 dev eno1


sudo ip -6 route del 2001:db8:1:: dev tap1
sudo ip -6 route add 2001:db8:1:: dev tap1

These deleted and created routes the traffic to the interfaces. I could have used route rep…

Linux Laptop definitions

#Give the ethernet connection to the server an explicit address
sudo ip -6 addr add 2001:db8::19 dev enp0s31f6

#create the route to the server using the via
sudo ip -6 route del 2001:db8:1::/64 dev enp0s31f6
sudo ip -6 route add 2001:db8:1::/64 via 2001:db8::2 dev enp0s31f6

I needed to specify

  • an explicit to the address of the interface to the server, so it could be used as a destination from z/OS.
  • the route to get to the server. I needed to specify the via, so the static route was used directly. Without the via, it tried to use Neighbourhood discovery.

Pinging

For “ping” to work, the packet has to reach the destination and the reply get back to the originator. See Understanding ping and why it does not answer.

If I pinged 2001:db8:1::9 (z/OS) from the Linux server (the end of the IFPORTCP6 connection) the traffic came from address 2001:db8:1::3, The reply was sent back using the matching 2001:db8:1::/64 definitions.

If I pinged 2001:db8:1::9 (z/OS) from my laptop, through the Linux server to z/OS, the traffic came from address 2001:db8::7. The reply was sent back using the matching 2001:db8::/64 definitions.

If I pinged 2001:db8::7 (laptop) from z/OS it was sent back using the matching 2001:db8::/64 definitions.

One minute topic: Understanding IP V6 addressing and routing

Understanding IP addressing and routing is not difficult, but there are some subtleties you need to be aware of.

This is a good place to start.

IP V4 addressing

An IP V4 address is like 192.6.24.56, where each number is between 0 and 255 inclusive (8 bits). You see routing statements like 192.6.24.9/24 which means the left 24 bits are significant for routing. 192.6.24.99/24 is routed the same as 192.6.24.22/24 because 192.6.24.n/24 refers to the range 192.6.24.0 to 192.6.24.255.

IP V6 addressing

IP V6 addresses are like abcd:efgh:ijkl:mnop:qrst:uvwx:yzab:cdef – or 8 groups of 4 hex digits.

Within each group leading zeros can be dropped.

The longest sequence of consecutive all-zero fields is replaced with two colons (::).

fe80:0000:0000:0000:11ad:b884:0000:0084 can be written fe80:0:0:0:11ad:b884:0:84 which can be written fe80::11ad:b884:0:84, which is a more manageable number to use.

I tend to use addresses like fe00::4 because they are short!

IP V6 prefixes

An Internet Service Provider (ISP) provides connectivity to its users. Each enterprise customer, or end user, is allocated a prefix, usually 48 digits long, and you have 16 digits for routers (the subnet) within your organisation. Normally the total prefix length is 64.

At home with a wireless router, my laptops address is 2a00:dddd:ffff:1111:65fa:229:f923:84b8. 2a00:dddd:ffff from my ISP and my subnet is 1111 within my organisation.

An address like 2001:db8::/64 is the range 2001:db8:0:0:0:0:0:0 to 2001:db8:0:0:ffff:ffff:ffff:ffff.

An address like 2001:db8::9/128 is the single address 2001:db8:0:0:0:0:0:9, because all digits are significant.

There are different levels of IP V6 addresses

  • Addresses starting with fe80::, called link-local addresses, are assigned to interfaces for communication on the attached link. If you think of lots of machines on an Ethernet connection, they have a fe80… address. They tend to be used internally by Dynamic Routing. I haven’t explicitly used one.
  • “global” addresses – or not on an Ethernet cable.
    • fc00::/7 Unique Local Addresses (ULA) – also known as “Private” IPv6 addresses. They are only valid within an enterprise.
    • 2…::/16 Global Unique Addresses (GUA) – Routable IPv6 addresses. These addresses allow you to access resources, such as web sites, outside of your domain. My ISP provides me with an address 2a00:abcd:….

Reserved addresses

Some addresses are reserved, for example

  • 2001:db8::/32 is reserved for documentation, these addresses do not leave your enterprise.
  • fe80::/10 Addresses in the link-local prefix. These are allocated to the “cable” or connection between two nodes. Two different “network cables” can have the same fe80… address because they are on different cables.
  • fc00/12 are addresses which are within your enterprise. Routers will not send these addresses out of its domain.
  • ff02::1 Multicast, all nodes in the link-local
  • ff02::2 Muticast, all routers in the link-local
  • ff05::2 All routers in the site-local (in your machine)

See here for a more complete list.

Defining an address

If I define an address for connection (on Linux) I can use

  • sudo ip -6 addr add 2001::99 dev tap1, this is one address. When displayed this gives 2001::99/128
  • sudo ip -6 addr add 2001::999/64 dev tap1, this is an address, and when used in routing, use the left 64 bits. When displayed this gives 2001::999/64

Routing

It is important to understand how the prefix affects the routing behaviour.

If I have two Ethernet connections(interfaces) into my laptop. I want traffic for 2001::a:0:0:0 to go via interface A, and traffic for 2001::b:0:0:0 to go via interface B.

If I use

sudo ip -6 route add 2001::a:0:0:0/64 dev A
sudo ip -6 route add 2001::b:0:0:0/64 dev B

then this will not always work. With 2001::a:0:0:0/64 the prefix is 2001:0:0:0:a:0:0:0/64. When comparing a packet with address 2001::b:0:0:0 with each route; both routes are available, because 2001:0:0:0 matches both, and if the packet gets sent to 2001::b:0:0:0 it will be lost.

You either need to move the a/b to make them significant, 2001:0:0:a::/64 and 2001:0:0:b::/64 or use 2001:0:0:0:a::/80 and 2001:0:0:0:b::/80 in the routing statements.

The system needs to be able to route traffic to the correct interface, so you need to be careful how you set up the routing.

Does this make sense?

Specifying

sudo ip -6 route add 2001:db8::99/64 dev eno1 metric 1024 pref medium

is the same as

sudo ip -6 route add 2001:db8::/64 dev eno1 metric 1024 pref medium

because the routing only looks at the left 64 bits. The ::99 is ignored.

Having the :99 makes it a bit more confusing for those stumbling about trying to understand this topic. I’ve had to rewrite some of my blog posts where I use 2001:db8::99/64 in the routing.

In the case of

sudo ip -6 route add 2001:db8::99/128 dev eno1 metric 1024 pref medium

the ::99 is relevant. A packet for 2001:db8::98 would not be routed down this definition because all 128 bits of the route definition are relevant.

Understanding radvd with IPV6 on Linux.

My two day project to deploy IP V6 dynamic routing, turned into an eight week project before I got it to work.

I am documenting a lot of what I learned – today’s experience is understanding what radvd is and what the configuration options mean. I found a lot of documentation – most of which either assumed you know a lot, or only provided an incomplete description.

High level view of what radvd provides

There are several ways of providing configuration to TCPIP. One way is using radvd (on Linux).

In a configuration file you give

  • An interface name
    • Which IP addresses (and address ranges) are available at the remote end of the connection.
    • Which IP addresses (and address ranges) are available at the local end of the connection

From the information about the local end, the remote end can build its routing tables.

Background IP routing

With IP you can have

  • static routing, where you explicitly give the routes to a destination – such as to get to a.b.c.d – go via xyz interface. You have to do a lot of administration defining the addresses of each end of an interface (think Ethernet cable)
  • dynamic routing, and neighbourhood discovery, where the system automatically finds the neighbours and there is less work for an administrator to do.

With IPv6 you have

  • global addresses with IP addresses like 2001:db8:1:16…..
  • link-local addresses. These are specific to an interface. An Ethernet connection can have many terminals hanging off ‘the bus’. The link-local address is only used within the connection. A different Ethernet cable can use the same address. There is no problem as the addresses are only used with the cable.
  • Neighbour Discovery. Rather than specify every thing as you do with static routing, IP V6 supports Neighbour Discovery, where each node can tell connected nodes, what routes and IP addresses it knows about. This is documented in the specification. This supports
    • Router Advertisement (RA), (“Hello, I’m a router, I know about the following addresses and routes”),
    • Router Solicitation (RS), (“Hello, I’ve just started up, are there any routers out there?”),
    • Neighbour Solicitation(NS) (“Does anyone have this IP address?”), and
    • Neighbour Advertisement (Usually in response to a Neighbour Solicitation, “I have this address”).

What is radvd?

The radvd program is a Router Advertisment Daemon (RADvd) which provides fakes router information – but is not a router.

You specify a configuration file. The syntax is defined here.

You can specify that this interface provides a “default” route. See here.

Example definition

I have a Linux Server, and a laptop running Linux connected by an Ethernet cable.

For the server, the radvd.conf file has

# define the ethernet connection
interface eno1
{
AdvSendAdvert on;
MaxRtrAdvInterval 60;
MinDelayBetweenRAs 60;

prefix 2001:ccc::/64
{
# AdvOnLink on;
# AdvAutonomous on;
AdvRouterAddr on;
};

# route 2001:ff99::/64
# {
# # AdvRoutePreference medium;
# # 3600 = 1 hour
# AdvRouteLifetime 3600;
#
# };
};

The key information is

Comments

Data following #

The name of the interface

interface eno1{….};

prefix statement

prefix 2001:ccc::/64{…}

2001:ccc::/64 is the ipv6 address range or, to say it another way, ipv6 addresses with the left 64 bits beginning with 2001:0ccc:0000:0000. In IP V6 this is known as the prefix.

Basically this prefix statement means “this interface is a route to the specified prefix”.

This creates some addresses on the server for the connection.

eno1    inet6 2001:ccc::e02a:943b:3642:1d73/64 scope global temporary dynamic...       
eno1    inet6 2001:ccc::dbf:5c90:61a6:20ae/64 scope global dynamic mngtmpaddr...
eno1    inet6 2001:ccc::c48b:e8f1:495c:5b52/64 scope global temporary dynamic ...       
eno1    inet6 2001:ccc::2d8:61ff:fee9:312a/64 scope global dynamic mngtmpaddr ...

create routes

The prefix statement creates a route on the server to get to the laptop

2001:ccc::/64 dev eno1 proto ra metric 100 pref medium
2001:ccc::/64 dev eno1 proto kernel metric 256 ...

This says that on the server, if there is a request for an address in the range 2001:ccc::/64 send it via device eno1.

route 2001:ff99::/64 statement.

If present this passes information to the remote end of the connection, see below. It says “I (the server) know how to route to 2001:ff99::/64”.

It the routing address does not show up in any ip -6 commands on the server.

I could see the Route Information data flowing across the network within a Router Advertisment packet. But I could not see it being used at the remote end.

I think the best thing to do is ignore the route statement.

Polling

The radvd code periodically sends information along the connection to the other end, at an interval you specify. See radvdump below on how to display it.

At the other end of the connection…

At the remote (laptop) end of the connection, using WiresShark to display the data received, shows a Router Advertisement with

Internet Protocol Version 6, Src: fe80::a2f0:9936:ddfd:95fa, Dst: ff02::1
Internet Control Message Protocol v6
    Type: Router Advertisement (134)
    ...
    ICMPv6 Option (Prefix information : 2001:ccc::/64)
    ICMPv6 Option (Route Information : Medium 2001:ff99::/64)
    ICMPv6 Option (Source link-layer address : 00:d8:61:e9:31:2a)

Where

  • fe80::a2f0:9936:ddfd:95fa is the link-local address on the server machine
  • ff02::1 the multicast address “All nodes” on a link (link-local scope)”
  • Prefix information : 2001:ccc::7/64 the prefix of the IP address 2001:ccc:0:0…
  • Route Information : Medium 2001:ff99::/64 This is from the “route” in the radvd configuration file on the Linux Server. It tells the laptop that this connection knows about routing to 2001:ccc::/64 .

From the route information it dynamically creates a route on the laptop to the server.

2001:ff99::/64 via fe80::a2f0:9936:ddfd:95fa dev enp0s31f6 proto ra metric 100 ...

This creates a route to 2001:ff99::/64 via the IP address fe80::….95fa, from the laptop end of the Ethernet connection with name enp0s31f6 to where-ever the connection goes to (in this example it goes to my server).

If I ping 2001:ffcc::9 from the laptop, it will use this route on its way to the z/OS server.

Connection to z/OS

Within the radvd.conf file is the definition to get to z/OS. This interface looks like an Ethernet (the ip -6 link command gives link/ether). This is for a different radvd configuration file to the earlier example.

interface  tap1
{
   AdvSendAdvert on; 
   MaxRtrAdvInterval 60;
   MinDelayBetweenRAs  60; 
   AdvManagedFlag  on;
   AdvOtherConfigFlag on;
   
   prefix 2001:db8:1::/64
   {
     AdvOnLink on;
     AdvAutonomous on;
     AdvRouterAddr on;
    
   };
   prefix 2001:db8:1::99/128
   {
     AdvOnLink on;
     AdvAutonomous on;
     AdvRouterAddr on;
    
   };

   route 2001:db8::/64
   {
     AdvRoutePreference medium;
     AdvRouteLifetime 3100;
   
   };
};

This says there are IP addresses in the range 2001:db8:1::/64 via this connection. z/OS reads the Router Advertisement and creates a route for them. TSO Netstat route gives

DestIP:   2001:db8:1::/64 
  Gw:     :: 
  Intf:   IFPORTCP6         Refcnt:  0000000000 
  Flgs:   UD                MTU:     9000 

The explicit IP address, with the 128 to specify use the whole value, rather than just the prefix,

 prefix 2001:db8:1::99/128
   {
     AdvOnLink on;
     AdvAutonomous on;
     AdvRouterAddr on;    
   };

creates a route in z/OS; from TSO NETSTAT ROUTE

DestIP:   2001:db8:1::99/128 
  Gw:     :: 
  Intf:   IFPORTCP6         Refcnt:  0000000000 
  Flgs:   UHD               MTU:     9000 

In UHD, the U says the interface is Up, the H says this is for a host (a specific end point), and the D says this is dynamically created.

If you try to ping 2001:db8:1::99 from the server, a Neighbour Solicitation request is sent from the server to z/OS, asking “do you have 2001:db8:1:99?”. My z/OS did not have that defined and so did not respond.

When I defined this address on z/OS TCPIP using the obeyfile

INTERFACE IFPORTCP6  DELETE 
INTERFACE IFPORTCP6 
    DEFINE IPAQENET6 
    CHPIDTYPE OSD 
    PORTNAME PORTCP 
    INTFID 7:7:7:7 
                                             
INTERFACE IFPORTCP6 
    ADDADDR 2001:DB8:1::9 
INTERFACE IFPORTCP6 
    ADDADDR 2001:DB8:9::9 
INTERFACE IFPORTCP6 
    ADDADDR 2001:DB8:1::99 

(And used

  • v tcpip,,sto,ifportcp6
  • v tcpip,,obeyfile,USER.Z24C.TCPPARMS(IFACE61)
  • v tcpip,,sta,ifportcp6

to activate it)

After this, the ping was successful because there was a neighbour solicitation for 2001:db8:1::99, and z/OS replied with Neighbour Advertisement of 2001:db8:1::9 saying I have it.

Create a default route

If you specify AdvDefaultLifetime 0 on the interface, this indicates that the router is not a default router and should not appear on the default router list in the Router Advertiser broadcasts. If the value is non zero, the recipient, can use this connection as a default route, for example there is no existing default route. A statically defined default will be used in preference to a dynamically defined one.

Use radvdump to display what it sent in the RA

You can use sudo radvdump to display what is being sent in the Router Advertiser message broadcast over multicast. It looks just like the radvd.conf file, but with all of the values filled in, so you can see the defaults.

New (well, old) job name symbol simplifies JCL!

This function has been around since z/OS 2.3, so is not that new!

There are JCL symbols

&SYSJOBNM

You can use it in started tasks and Jobs

S OMPROUTE.OMP4 JCL symbol &SYSJOBNM gives OMROUTE

S OMPROUTE,JOBNAME=OMP4  JCL symbol &SYSJOBNM gives OMP4

You can use this like

//PARM1 DD DISP=SHR,DSN=USER.PARMLIB(X&SYSJOBNM)
//PARM2 DD DISP=SHR,DSN=USER.PARMLIB(Y&SYSJOBNM)  

and so have data sets based on the invoked job name.  This means you can simplify your started task procedures, and may not need the user to specify overrides on the start command.  

Note: If you move to using JOBNAME= instead of specifying a .identifier, you may need to change other definitions. For example TCPIP defines port 520 with jobname OMPROUTE. If you switch to using OMPROUTE,JOBNAME=OMP1 this jobname will need to change to OMP1.

&SYSJOBID

gives a value like STC08460 for a started task.

&SYSUID

gives the userid like IBMUSER

&JOBNAME

gives the job assigned to the address space in which the JCL is converted, so JES2 for me.

How do I see what’s changing in my Linux network configuration?

I was trying to find out what changes were being made to my IP V6 network.

The short answer is the command

sudo nmcli general logging level DEBUG domains ALL

and reset it to the default with

sudo nmcli general logging level INFO domains ALL

You see the last 5 minutes worth of trace using

journalctl -u NetworkManager -S -5m >nw.txt

See man NetworkManager.conf.

Debian Man gives slightly different information including

ALL : all log domains
DEFAULT : default log domains
DHCP : shortcut for “DHCP4,DHCP6”
IP : shortcut for “IP4,IP6”

ADSL : ADSL device operations
AGENTS : Secret agents operations and communication
AUDIT : Audit records
AUTOIP4 : AutoIP operations
BOND : Bonding operations
BRIDGE : Bridging operations
BT : Bluetooth operations
CONCHECK : Connectivity check
CORE : Core daemon and policy operations
DBUS_PROPS : D-Bus property changes
DCB : Data Center Bridging (DCB) operations
DEVICE : Activation and general interface operations
DHCP4 : DHCP for IPv4
DHCP6 : DHCP for IPv6
DISPATCH : Dispatcher scripts
DNS : Domain Name System related operations
ETHER : Ethernet device operations
FIREWALL : FirewallD related operations
INFINIBAND : InfiniBand device operations
IP4 : IPv4-related operations
IP6 : IPv6-related operations
MB : Mobile broadband operations
NONE : when given by itself logging is disabled
OLPC : OLPC Mesh device operations
PLATFORM : OS (platform) operations
PPP : Point-to-point protocol operations
PROXY : logging messages for proxy handling
RFKILL : RFKill subsystem operations
SETTINGS : Settings/config service operations
SHARING : Connection sharing. With TRACE level log queries for dnsmasq instance
SUPPLICANT : WPA supplicant related operations
SUSPEND : Suspend/resume
SYSTEMD : Messages from internal libsystemd
TEAM : Teaming operations
VLAN : VLAN operations
VPN : Virtual Private Network connections and operations
VPN_PLUGIN : logging messages from VPN plugins
WIFI : Wi-Fi device operations
WIFI_SCAN : Wi-Fi scanning operations
WIMAX : WiMAX device operations

it produces data like

<debug> [747.1872] ndisc-lndp[...,"eno1"]: received router advertisement at 17121
<debug> ndisc[...,"eno1"]: scheduling next now/lifetime check: 163 seconds
<debug> ndisc[...,"eno1"]: neighbor discovery configuration changed [GAR]:
<debug> ndisc[...,"eno1"]:   dhcp-level none
<debug> ndisc[...,"eno1"]:   hop limit      : 64
<debug> ndisc[...,"eno1"]:   gateway fe80::a2f0:9936:ddfd:95fa pref medium exp 179.8153
<debug> ndisc[...,"eno1"]:   gateway fe80::9b07:33a1:aa30:e272 pref medium exp 162.8153
<debug> ndisc[...,"eno1"]:   address 2001:bbb::573e:5c69:2ab3:4ae6 exp 81751.8153
<debug> ndisc[...,"eno1"]:   address 2001:ccc::dbf:5c90:61a6:20ae exp 86399.8153
<debug> ndisc[...,"eno1"]:   route 2001:db8::99/128 via :: pref medium exp 86399.8153
<debug> ndisc[...,"eno1"]:   route 2001:ff99::/64 via fe80::a2f0:9936:ddfd:95fa pref medium exp 3099.8153
<debug> ndisc[...,"eno1"]:   route 2001:ccc::/64 via :: pref medium exp 86399.8153
<debug> ndisc[...,"eno1"]:   route 2001:bbb::/64 via :: pref medium exp 81751.8153
<debug>  platform: (eno1) route: append     IPv6 route: 2001:db8::/80 via :: dev 2 metric 100 mss 0 rt-src ndisc
<debug>  platform: (eno1) signal: route   6   added: 2001:db8::/80 via :: dev 2 metric 100 mss 0 rt-src rt-ra

Is “via” needed when creating a Linux IP route?

To get static routing working I needed a route like one of

# specific destination
sudo ip -6 route add fc:1::9/128 via fc::2 dev enp0s31f6r
sudo ip -6 route add fc:1::9/128  via fc::2 
#range of addresses
sudo ip -6 route add fc:1::/64 via fc::2  dev enp0s31f6
sudo ip -6 route add fc:1::/64 via fc::2 

If I a route without the via

sudo ip -6 route add fc:1::9/128 dev enp0s31f6

then it ignored my static routing and did Neighbor Solicitation; it asked adjacent systems if they had knew about the IP address fc:1::9. This is an IP V6 Neighbour Discovery facility.

There were hints around the internet that if the next hop address is not specified, then the “next hop router” will try to locate the passed address.

So the short answer to the question is: “yes. You should specify it when using static routing”.

Understanding ping and why it does not answer.

I’m sure every one reading this post has the kindergarten level of knowledge of ping (when it works), the hard part is when ping does not work. Ping can do so much more.

Pinging 101

If you successfully ping an IP address you get a response like

PING 2001:db8::7(2001:db8::7) 56 data bytes
64 bytes from 2001:db8::7: icmp_seq=1 ttl=64 time=0.705 ms
64 bytes from 2001:db8::7: icmp_seq=2 ttl=64 time=0.409 ms

First steps

For the ping to be successful the ping has to get to the remote end, and the response needs to get back to the originator. This has two implications (which are obvious once you understand)

  • At each hop the node needs to know how to get to the destination.
  • At each hop the node needs to know how to get to the originator.

If the remote end does not have a routing definition to get to the originator, the response will get thrown away, and your ping will time out.

Did it leave/arrive in mybox?

Depending on how heavily used your system is displaying the number of bytes and packets sent over an interface may be of some help. If the number is zero, then the interace was not used. If the number is non zero, this could be caused by your ping, or by some other traffic.

Using TSO NETSTAT DEVLINKS and a ping -c1 192.168.1.74 (for one ping)

The statistics for the interface showed a change

BytesIn                           = 116
Inbound Packets                   = 1 
...
BytesOut                          = 116
...

Forwarding

If the route is through servers, then the servers need to be enabled for forwarding. For example

  • Linux: sudo sysctl -w net.ipv6.conf.all.forwarding=1
  • z/OS: IPCONFIG DATAGRAMFWD

If forwarding is not specified, the ping request will come in on one interface and be thrown away.

Pinging to a multicast address

With multicast you can send the same data to multiple destinations on a connection(interface), or on a host.

You can issue

ping ff02::1%tap1

where

  • ff02::1 is an IP V6 multi cast address – ff02 is for everything on this link
  • %tap1 says use the interface tap1. Without it, ping does not know which link to send it to.

Wireshark shows the source was fe80::5460:31ff:fed4:4587 which is the address of the interface used to send out the request.

The output was

PING ff02::1%tap1(ff02::1%tap1) 56 data bytes
64 bytes from fe80::5460:31ff:fed4:4587%tap1: icmp_seq=1 ttl=64 time=0.082 ms
64 bytes from fe80::7:7:7:7%tap1: icmp_seq=1 ttl=255 time=3.36 ms (DUP!)
64 bytes from fe80::5460:31ff:fed4:4587%tap1: icmp_seq=2 ttl=64 time=0.082 ms
64 bytes from fe80::7:7:7:7%tap1: icmp_seq=2 ttl=255 time=3.01 ms (DUP!)
64 bytes from fe80::5460:31ff:fed4:4587%tap1: icmp_seq=3 ttl=64 time=0.083 ms
64 bytes from fe80::7:7:7:7%tap1: icmp_seq=3 ttl=255 time=3.22 ms (DUP!)

The z/OS host, has two IP addresses for the interface – and both of them replied.

Pinging from a different address on the machine

I had a server where there as

  • an Ethernet connection to my laptop. The server end of the connection had address 2001:db8::2
  • an Ethernet like connection to z/OS running through a tunnel. The device (interface) was called tap1.

To ping to the multicast address, as if it came from 2001:db8::2, the address of an Ethernet connection on the same machine, you can use

ping -I 2001:db8::2 ff02::1%tap1

Wireshark shows the source was 2001:db8::2.

The output was

PING ff02::1%tap1(ff02::1%tap1) from 2001:db8::2 : 56 data bytes
64 bytes from 2001:db8:1::9: icmp_seq=1 ttl=255 time=3.15 ms
64 bytes from 2001:db8:1::9: icmp_seq=2 ttl=255 time=1.22 ms
64 bytes from 2001:db8:1::9: icmp_seq=3 ttl=255 time=3.21 ms

without the duplicate responses (I do not know why). (It may be due to the global address 2001… compare with the link-local address 9e80…)

You might use this ping from a different address when checking a firewall. The firewall may be restricting the source of a packet.

The problems of ping using a different address on the machine

I had a wireless connection, and an Ethernet connection to my laptop. If I pinged through my server to z/OS, the “return address” was from the wireless connection. z/OS was not configured for this, so the reply to the ping was lost.

Even trying to force the interface id to use with

ping -I enp0s31f6 2001:db8:1::9

The wireless connection was chosen, and ping gave a message

ping: Warning: source address might be selected on device other than: enp0s31f6

I had to give my Ethenet connection an address, and change the route to add the src

sudo ip -6 addr add 2001:db8::7 dev enp0s31f6

sudo ip route replace 2001:db8:1::/64 via fe80::a2f0:9936:ddfd:95fa dev enp0s31f6 … src 2001:db8::7

Only then did the ping request get to z/OS – but z/OS did not know how to get back to my laptop!

A normal Wireshark trace

This shows the request and the reply.

Why can ping fail?

If you only get the request data in the Wireshark trace, this means no reply was sent back.

This could be for many reasons including

  • The IP address (2001:db8::1:0:0:9 in the wireshark output above) could not be reached. This could be due to
    • A fire wall dropped it
    • It could not be routed on
    • The address did not exist
  • The response could not be sent back
    • A firewall blocked it
    • There is no routing from the destination back to the originator
    • There is no routing on an intermediate hop

Example of failure

I had a radvd configuration which included

prefix 2001:db8:0:0:1::/64

   {
     AdvOnLink on;
     AdvAutonomous on;
     AdvRouterAddr on;

   };  

   route 2001:db8::/64
   {
     AdvRoutePreference medium;
     AdvRouteLifetime 3100;
   
   };

The 2001:db8:0:0:1::/64 says traffic for 2001:db8:0:0 is on this system, and traffic for 2001:db8::/64 is off this host.

When ping tried to reply – it tried to send the packet to 2001:db8::/64 – which was routed to the same host and so IP just dropped the packet.

I needed 2001:db8:0:0:1::/80. This says traffic for 2001:db8:0:0:1 is on this system. I also used 2001:db8::/80 which is 2001:db8:0:0:0/80 is off this host. The /80 gave the finer granularity.

Once you know these things, it is obvious. This is called experience.

Another example of a failure

As part of writing up another blog post, I created my network to use only address fc00:…

With this, ping failed to work.

The reason for this was that at the back-end, I could see the source was an 2001:db8:… address, which was not configured in my back-end.

On my front end system my Ethernet device had

2: enp0s31f6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
    inet6 fc::7/128 scope global 
       valid_lft forever preferred_lft forever
    inet6 2001:db8::4cca:6215:5c30:4f5e/64 scope global temporary dynamic 
       valid_lft 84274sec preferred_lft 12274sec
    inet6 2001:db8::51d8:9a9f:784:3684/64 scope global dynamic mngtmpaddr noprefixroute 
       valid_lft 84274sec preferred_lft 12274sec
    inet6 fe80::9b07:33a1:aa30:e272/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

I deleted this using

sudo ip -6 addr del 2001:db8::4cca:6215:5c30:4f5e/64 dev enp0s31f6

and ping worked!

When I added it back in, ping continued to work. I cannot find which interface address ping uses.

Of course I could have used

ping -I fc::7  fc:1::9

to which interface address to use!

A failure with a hint

I had a WiresShark output

The destination Unreachable had

Internet Control Message Protocol v6
    Type: Destination Unreachable (1)
    Code: 3 (Address unreachable)
    ...
    Internet Protocol Version 6, Src: fc:1::9, Dst: fc::a
    Internet Control Message Protocol v6

This is saying that at the server end of the link to z/OS, where the server end had address fc:1::3 ( see the data at the start of the black line) was unable to deliver the packet to dst: fc::a. This shows the problem is with the server in the middle rather than z/OS.

The solution turned out to be more complex than I first though.

I tried

sudo ip -6 route add fc::/64 dev eno1 via fc::7

but this gave

RTNETLINK answers: No route to host

On the laptop I did

ip -6 addr

which gave me

enp0s31f6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
    inet6 fc::7/128 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::9b07:33a1:aa30:e272/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

back on the server I replaced fc::7 with fe80::9b07:33a1:aa30:e272

sudo ip -6 route add fc::/64 dev eno1 via fe80::9b07:33a1:aa30:e272

and then ping worked!

Digging into this I found the documentation on Neighbourhood discovery section 8 says

For static routing, this requirement implies that the next- hop router’s address should be specified using the link-local address of the router.

Sometimes

sudo ip -6 route add fc::/64 dev eno1 via fc::7

worked fine. ip -6 route gave

fc::7 dev eno1 metric 1024 pref medium
fc::/64 via fc::7 dev eno1 metric 1024 pref medium
fc:1::/64 dev tap1 metric 1024 pref medium

I think this just goes to show that this is a complex area, and there are things happening which I do not understand.