One minute networking: getting your data to flow around the corner; IP tunnelling

This is another of the little bits of networking knowledge, which, once you understand it, is obvious! Some of the documentation on the web is either wrong or is missing information.

The original problem

I wanted to use a route management protocol (OSPF) for managing the routing information known by each router. It has its own format packets. Not every device or router supports these packets.

You configure the interface name, and the OSPF data flows through the interface.

When the connection is a direct line, the data is passed to the remote system and it can use it. When the connection is indirect, for example via a wireless router. The wireless router does not know how to handle the OSPF packets and throws them away. The result is that my remote machine does not get the OSPF packets.

The solution – use a tunnel

One solution is to wrap the packets of data, so they get passed up to the router, round the corner, and back down to the remote system.

When I was employed, we had an internal mail system for paper correspondence . If we wanted to send a letter to a different site, we took the piece of internal mail, put it in an envelope and sent it through the national mail to the remote site. At the remote site, the mail room removed the external envelope, and sent the internal letter on to the recipient. It is a similar process with IP tunnelling.

I have a laptop with IP address A.B.C.D and a server with address W.X.Y.Z., I can ping from A.B.C.D to W.X.Y.Z, so there is an existing path between the machines.

You define a tunnel to W.X.Y.Z (the external envelope) and give which interface address on your system it should use. (Think of having two mail boxes for your letter, one for Royal Mail, another for FedEx).

You define a route so as to say to get to address p.q.r.s use tunnel ….

The definitions

The wireless interface for my laptop was 192.168.1.222 . The wireless address of my server was 192.168.1.230

I defined a tunnel from Laptop to Server called LS

sudo ip tunnel add LS mode gre local 192.168.1.222 remote 192.168.1.230 

Make it active and define the address on the server 192.168.3.3 .

sudo ip link set LS  up
sudo ip route add 192.168.3.3 dev LS

If I ping 192.168.3.3 the enveloped packet goes to the server machine 192.168.1.230 . If this address is defined on the server the ping sends a response – and the ping worked!

Except it didn’t quite. The packet got there, but the response did not get back to my laptop.

At the server the ping “from” IP address was 10.1.0.2, attached to my laptop’s Ethernet. This was not known on the server.

I had three choices

  • Define a tunnel back from the server to the laptop.
  • Use ping -I 192.168.1.222 192.168.3.3 which says send the ping request to 192.168.1.1 , and set the originator address to 192.168.1.222. The server knows how to route to this address.
  • Define a route from the server back to my laptop.

The simplest option was to use ping -I … because no additional definitions are required.

This does not solve my problem

To get OSPF data from the server to my laptop, I need a tunnel from the server to my laptop; so a tunnel each way

Different sorts of data are used in an IP network

  • IPV6 and IPV4 – different network addressing schemes
  • unicast and multi cast.
    • Unicast – Have one destination address, for example ping, or ftp
    • Multicast – Often used by routers and switches. A router can send a multicast broadcast to all nodes on the local network for example ‘does any nodes have IP address a.b.c.d?‘. The data is cast to multiple nodes.

When I defined the tunnel above I initially specified mode ipip. There are different types of tunnel mode ipip is just one. The list includes

  • ipip – Virtual tunnel interface IPv4 over IPv4 can send unicast traffic, not multi cast
  • sit – Virtual tunnel interface IPv6 over IPv4.
  • ip6tnl – Virtual tunnel interface IPv4 or IPv6 over IPv6.
  • gre – Virtual tunnel interface GRE over IPv4. This supports IPv6 and IPv4, unicast and multicast.
  • ip6gre – Virtual tunnel interface GRE over IPv6. This supports IPv6 and IPv4, unicast and multicast.

The mode ipip did not work for the OSPF data.

I guess that the best protocol is gre.

Setting up a gre tunnel

You may need to load the gre functionality

sudo modprobe ip_gred
lsmod | grep gre

create your tunnel

sudo ip tunnel add GRE mode grep local 192.168.1.222 remote 192.168.1.230 
sudo ip link set GRE up
sudo ip route add 192.168.3.3 dev GRE

and you will a matching definition with the same mode at the remote end.

Displaying the tunnel

The command

ip link show dev AB 

gives information like

9: AB@NONE: mtu 1476 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/gre 192.168.1.222 peer 192.168.1.230

where

  • link/gre this was defined using mode gre
  • 192.168.1.222 the local interface to be used to send the traffic
  • peer 192.168.1.230 the IP address for the far end

The command

ip route 

gave me

192.168.3.3 dev AB scope link

so we can see it gets routed over link(tunnel AB).

Using the tunnel

I could use the tunnel name in my defintions, for example for OSPF

interface AB
area 0.0.0.0

One minute networking: TCP buffer sizes

When data flows over a TCPIP connection there are several factors which control the rate at which data can be sent. You can influence some of these factors.

Data is sent as packets typically of size about 1440 bytes – because old hardware could only support this. You could use larger packets, but you may hit a router which chops it into smaller blocks.

The basic TCPIP flow

Consider a Client Server connection. The client application wants to send some data to a server application

  • The client uses send() to put some data into a TCPIP buffer and returns.
  • TCPIP sends some data (a packet) from this buffer, sets a timer and waits.
  • The server receives the data, end sends back an ACK saying so far I have received this many bytes from you.
  • The application on the server does a receive (if there is no data, the application is suspended until data arrives). If there is enough data to satisfy the receive, the application returns, otherwise it is suspended.
  • At the client end, when TCPIP has received the ACK. It no longer needs the data which has just been acknowledged. It can send more data.
  • If no ACK was received and the timer has timed out, TCPIP resend the data.

There are several parts to this:

  • Putting things into the pipe – the send buffer
  • The pipe
  • Getting things from the pipe, the receive buffer

The send buffer

  • TCPIP has a buffer for its use.
  • The application
    • An application does a send() and passes data to TCPIP.
    • If there is space in the TCPIP buffer, the data is moved into the buffer, and the application returns.
    • If there is not enough space for all of the data, enough data is moved to fill the buffer, and the application waits until more space is available in the buffer.
    • When all of the data has been passed to the TCPIP buffer, the application returns, and can do more application work.
  • TCPIP
    • TCPIP takes a chunk of the buffer (a packet) , sends it over the network, and sets a timer.
    • It can then process another chunk of data, and send it over the network, so there are multiple packets in flight.
    • When the far end has passed the data to the application, it sends the ACK back.
    • The local end, when it has received the ACK for a chunk of data, knows the data has been received by TCPIP at the remote end, it no longer needs to keep a copy of the data, and frees up the space on the buffer.

How big a buffer is needed to get good throughput?

The data is held in the TCPIP buffers; waiting to be sent plus the round trip time; from when the data was put into the TCPIP buffer, to getting the ACK back. This could be 10s of milliseconds. Multiple packets can be in-flight (perhaps 10s or 100s) which improves the throughput. So send 10 packets; wait, when the first ACK is received, send another packet etc., so there are always 10 packets in flight.

If the buffer is too small the application has to wait. Increasing the send buffer size will increase throughput up to a point (when the application does not have to wait) after this point making it larger may make no difference.

As more data is in flight, the connection needs a bigger send buffer.

An application can set the send buffer size using the SETSOCKOPT call. If this is not used, then there will be a TCPIP default send buffer size. On z/OS this is the system wide TCPCONFIG TCPSENDBFRSIZE …. parameter.

The default used to default to 16KB, and currently is typically 64KB. There is a TCPIP enhancement which says if the send buffer size is larger than 64KB, then TCPIP can dynamically increase it if it will improve performance. See Outbound Right Sizing(ORS).

Note: If you change the system wide send buffer size (TCPCONFIG TCPSENDBFRSIZE on z/OS), this will affect all applications that do not set the size using SETSOCKOPT. You should test this before putting it into production because it may affect many applications.

The receive buffer

At the receiving end, TCPIP has a buffer. Data from the network is put into this buffer. After the data has been put into the buffer, TCPIP sends back an ACK with three fields saying

  • so far I’ve received this many bytes from you
  • I’ve sent you this many bytes
  • my buffer has space for this many bytes

An application does a receive to get the data, if there is insufficient data to satisfy the receive, the application can wait, or return just the data in the buffers, depending on the options.

If the receive buffer is full, any incoming data will be thrown away. If the application does receive the data, then does lots of processing on the data, followed by receive more data etc, the receive buffer may fill up. Some applications receive the data, give the data to a subtask to process, immediately do another receive, and so try to keep the receive buffer empty.

If the amount of arriving data is larger than the free space in the buffer, TCPIP will return “no space left in the buffer” as part of an ACK. The sender then knows to wait. When the application receives the data, and makes space, “x bytes are available in the buffer” is sent as part of the ACK, and the sender can start sending data again. This “space available” is known as the Window Size, and helps regulate the flow of data.

If you think about this for several minutes, you will realise that there is a time lag between the receive available buffer size going to zero, and the sender receiving the ACK saying no space in receive buffer. Any in-flight packets may get thrown away, or the end application may get all the data from the buffer. The “no space left in receive buffer” tells the sender to stop sending data until there is space in the buffer, and the sender may then reduce the amount of in-flight data.

Having a zero sized window means there is a problem that the application is not getting the data from the buffer fast enough.

How big a receive buffer is needed to get good throughput?

If the buffer is too small the application has to wait, and packets may be thrown away.

An application can set the receive buffer size using the SETSOCKOPT call. If this is not used, then there will be a TCPIP default receive buffer size. On z/OS this is the TCPCONFIG TCPRCVBFRSIZE …. parameter.

The maximum receive buffer size is specified in TCPMAXRCVBUFRSIZE.

If the receive buffer size is greater than 64B, then a performance enhancement called Dynamic Right Sizing(DRS) can come into action which automatically increases the buffer size up to 2MB.

Inside the pipe

I have described the sender side filling the send buffer for the connection, and the application on the receiver side taking data from the connection’s receive buffer. I’ll look at the pipe in between.

Data is send across the network in packets. The packets are usually small – for example 1500 bytes for Ethernet. Some protocols support larger packet sizes. Data send within a z/OS can have 56KB packet sizes. The Maximum Segment Size (mss) is the maximum size of the user data in a packet.

If a packet is too large for a device, it may be cut into smaller chunks and then passed on – or the packet may just be dropped.

The simplest and slowest transmission is send one packet and wait for the ACK, then send another packet.

It is much more efficient to send multiple packets. For example send 10 packets, when the first ACK comes back (saying the first packet has been received), send the next packet and so on, so there are always 10 packets (or less) in the pipe.

The amount of data on the network is limited by the smaller of the send buffer size and the receive window size. This means you need both a big send buffer, and a big receive buffer to get maximum throughput.

The TCP window is the maximum number of bytes that can be sent before the ACK must be received. If the network is unreliable it is better to keep the window small to reduce the amount of data that needs to be resent after a missing ACK.

Where can I get more information?

I wrote a blog post about tuning MQ channels which gives additional information.

How do I display this buffer information?

On z/OS you can use

  • TSO NETSTAT CONFIG command reports the default receive buffer size, the default send buffer size, and the default maximum receive buffer size
  • TSO NETSTAT ALL (IPPORT nnnn where nnnn is the port number.
  • TCPMON on GITHUB to monitor the buffer and window sizes in near real time.

On Linux

You can use the command

  • ss -im -at ‘( dport = :21 )’ which displays information about connections with destination port of 21.
  • ss -im -at ‘( dst = 10.1.1.2 )’ which displays information about connections with destination ip address of 10.1.1.2

Is there more information available about buffers and windows?

There is a lot of information on the web, but it is not usually easy to digest.

I thought this article was clear about the different buffers and windows.

How do I change the buffer sizes?

An application can change them using the SETSOCKOPT call see here options SO_RCVBUF and SO_SNDBUF

With some applications, they have a specific way of setting the buffer sizes

  • MQ for midrange RcvBuffSize etc
  • MQ on z/OS use +cpf RECOVER QMGR(TUNE CHINTCPRBDYNSZ nnnnn)
    +cpf RECOVER QMGR(TUNE CHINTCPSBDYNSZ nnnnn)
  • FTP on Linux -x option

Otherwise the system defaults are used.

Other information provided with display commands

Commands like netstat provide other information

For example

  • round trip time – this is average time in millisecond taken for a packet to be sent over the network, and the ACK is received
  • RoundTripVariance – this gives the spread of the response times. It is the sum of the square of each response time. A measure of the spread of the response times is the standard deviation = sqrt((the variance – average round trip time ** 2) /N) where N is the number (of packets sent). If all the packets have the same round trip time, this will be close to zero.
  • Local 0 window count – the number of times there was 0 space in the receive buffer
  • Remote – window count – the number of times the remote end had 0 space in its receive buffer.

One minute MVS: Networking subnets.

I’ve understood and used subnets, (in a hand waving way), but found it hard to write down what they are good for, and why we have them. There are many explanations on the web but they all seemed to describe how to use subnets, and not why we need them. Some of what I say below, may be strictly not true, but I hope it gives the right concepts.

  • You can use an Ethernet cable to join two machines. This is not very interesting.
  • You can have an Ethernet router. You plug the Ethernet cable from your machine to one of the ports on the Ethernet router.
  • The Ethernet router has devices attached to it with addresses such as 192.168.1.160, 192.168.1.24, 192.168.1.74. The router handles traffic for 192.168.1.* The IP address is 32 bits long, and the router is configured so that if the top 24 bits of an address are 192.168.1, then pass the traffic to the router. This is written as 192.168.1.0/24. The remaining 8 bits can be used for devices attached to the router, so almost 256 devices. (192.168.1.0 and 192.168.1.255 are reserved).
  • If you had a large building you could configure a router with address 192.169/16 and have 65,000+ devices attached to it. This may not be a good idea.
    • The router sends out management packets to all devices in the subnet saying, for example “does anyone have this IP address”. With many devices the router could spend all its time processing these management packets, and not handling user data
    • You may want to segregate different areas, so addresses 192.168.1.* is for the first floor, and 192.168.2.* is for the second floor. If you want to have a firewall for the first floor it is much easier configuring all traffic going to 192.168.1.* rather than for some machines within 192.168.* and so all users are using the firewall – which may not be what you want.
    • Each floor has a confidential printer. It is easier to configure the printer so that only machines with the same subnet address, IP address 192.168.1.* can send print files to the printer on 192.168.1.22, rather than filter out users on the second floor.
  • With IP V6 there are 128 bits available for subnetting. Mostly a subnet of /64 is used. I have an address 2a00:9999:8888:7777:a0cd:ec92:bceb:91ab/64 so 2a00:9999:8888:7777 is the address of my router (64 bits), and the device on the router is currently a0cd:ec92:bceb:91ab (64 bits).

Basic connectivity

Single point to point cable

My laptop is connected to my server by an Ethernet cable.

I’ve defined the address at each end 10.1.0.2/24 at the laptop and 10.1.0.3/24 at the server. I can ping between the two machines. When I changed the server to have 12.1.0.3/24 there was no connectivity – because they were in different subnets.

Wireless connection – IPV4

My system was configured automatically to have the laptop 192.168.1.222/24 and the server 192.168.1.222/24. These are the same subnet, so traffic goes from my laptop up the wireless connection to the wireless router, and to the server over the wireless connection.

Wireless connection – IPV6

My system was configured automatically to have the laptop a prefix (subnet) of fe80 and specific address c82d:b94c:21fa:3d1c with this this subnet. The server had prefix (subnet) fe80 and specific address c82d:b94c:21fa:3d1c.

The default routing is via device (the wireless router) with prefix (subnet) fe80 and address c82d:b94c:21fa:3d1. These are both “internal to the router” addresses.

Today my laptop also has IP address 2a00:9999:8888:7777:a0cd:ec92:bceb:91ab/64 and my server has address2a00:9999:8888:7777:605a:2d22:5daf:53d7/64. These can be used to contact sites on the internet, because they are external addresses.

Getting out of the subnet.

My server has a connection over virtual Ethernet to z/OS. The server end of this link has address 10.1.3.1/24. If I use wireless connection from my laptop to the server, I cannot easily access this link, because the wireless router does not know about 10.1.3.1 – and I have no way of configuring it.

On Linux I can configure the server to be a software router (radvd), and have a physical Ethernet cable to the it from my laptop. This way I can control the IP routing to and from the server.

You can also use a bridge … but that is an advanced topic.

Setting up for PING can be difficult!

  • Setting up ping within one machine is trivial
  • Setting up ping between two machines is relatively easy
  • Setting up ping with 3 machines can be hard to get right and to get working.

Most documentation describes how to set up PING between two machines, and does not mention anything more complex.

Ancient history (well 1980’s)

To understand modern TCPIP networks, it helps to know about the origins of Ethernet.

Originally with Ethernet there was a bus; a very long cable. You plugged your computer into this bus. Each computer on the bus had a unique 48 bit MAC address. You could send a request to another computer on this bus using their Ethernet address. You could send a request to ALL computers on this bus. These days this might be, send to all computers:”does anyone have this IP address…?”

An Ethernet bus is connected to an Ethernet router which can route packets to other routers.

Ethernet has evolved and instead of a long bus shared by all users, you have a switch device, and you plug the Ethernet cable from your computer to the switch. Conceptually it is the same, but with the switch you get much better performance and usability. You still have routers to get between different Ethernet environments. The request send to all computers:”does anyone have this IP address…?” goes to the switch, and the switch sends it to all plugged in computers.

When you start TCPIP, it configures the Ethernet hardware adapter with the addresses is should listen for.

Other terminology and concepts

  • A router is a networking device which forwards packets between different networks. Packets of data get sent from the originator through zero or more routers to the final destination.
  • A gateway has several meanings
    • It can connect different network types, for example act as a protocol translator, it can have a built-in firewall
    • It can route IP traffic between different IP subnets
    • It can loosely mean a router

Usually a router or gateway is a dedicated device or hardware.

You can have a computer act as a router or gateway; it takes data from one interface and routes it to a different interface. The computer can pass the data through a fire wall, or transform it, for example converting internal IP addresses to external IP addresses (NAT translation).

At a concept level, a computer’s network adapter is configured only to listen for packets which match the IP addresses of the interface, and ignore the rest.

If you want a computer to act as a router or gateway, you need to configure the network adapter to listen to all traffic, and to process the traffic. This is important when routing traffic from computer A through computer B to computer C.

A simple one hop ping, first time

I have set up my laptop to talk to a server over Ethernet.

I have configured my laptop using

sudo ip -6 addr add 2001:7::/64 dev enp0s31f6

This says for any IP V6 address starting with 2001:0007:0000:0000 then try sending it down the Ethernet cable with interface name enp0s31f6.

Ping a non existing address

If I try to ping an address say 2001:7::99 a packet is sent to all computers on the Ethernet bus

from myipAddress to everyone on the Ethernet bus, does anyone have IP address 2001:7::99?

There is no reply because no computer on the bus, has the address.

Ping the adjacent box

On the server, the IP address of the Ethernet cable is 2001:7::2.

If I ping this address, there are the following flows

From myipAddress (and myMAC) to everyone on the Ethernet bus, does anyone have IP address 2001:7::2?
From 2001:7::2 to myipAddress (MAC), yes I have that IP address. My Ethernet MAC address is …

My laptop puts the remote IP address and its MAC address into its neighbour cache, then issues the ping request.

The ping request looks in the neighbour cache find the IP address and MAC address and issues.

From myipAddress to 2001:7::2 at macAddress .. ping request.

The second and later times I issue a ping

The ping request looks in the neighbour cache find the IP address and MAC address and issues the ping

From myipAddress to 2001:7::2 at macAddress .. ping request.

So while the neighbour cache still has the IP of the target, and it’s MAC address, the ping can be routed to the next hop.

Setting up a multi hop ping

I have my laptop connected via a physical Ethernet cable to my server. The server is connected to z/OS via a virtual Ethernet connection. The IP address of the z/OS end of the virtual Ethernet cable is 2001:7::4.

The ping 2001:7::4 request on my laptop, does not work (as we saw above), because TCP asked everyone on the Ethernet bus if it has address 2001:7::4 – and no machine replies.

You need to define the link to the server machine as a gateway or router-like device which can handle IP addresses which are not on the Ethernet bus. You define it like

sudo ip -6 route add 2001:7::/64 via 2001:7::2

This says for any traffic with IP address 2001:0007:0000:0000:* send it via address 2001:7::2.

This requires 2001:7::2 to be known about, and so needs the following to be configured first

sudo ip -6 route add 2001:7::2 dev enp0s31f6

for TCPIP to know where to send the traffic to.

The route add 2001:7::2 dev enp0s31f6 command sends a broadcast on the Ethernet bus asking – does anyone have 2001:7::2. My server replies saying yes I have it. This is the same as for the ping above.

In summary

To send traffic on the same Ethernet bus you use

sudo ip -6 route add 2001:7::2 dev enp0s31f6

To route it via a router, switch, or computer acting as a router or switch you need.

sudo ip -6 route add 2001:7::/64 via 2001:7::2 or
sudo ip -6 route add 2001:7::/64 via 2001:7::2 <dev enp0s31f6>

The computer acting as a router or switch may need additional configuration. For example to allow it to route traffic from one Ethernet bus to another Ethernet bus you need to enable packet forwarding. On Linux to enable forwarding for all interfaces use

sysctl -w net.ipv6.conf.all.forwarding=1

On z/OS you use a static route such as

ROUTE 2001:7::/64 2001:7::3 JFPORTCP6

Where 2001:7::/64 is the range of addresses 2001:7::3 is (one of) the addresses of the interface at the remote end of the cabl, and JFPORTCP6 is the interface name. This is similar to the Linux route statement above.

You might need to set up the firewall. On my server I needed

sudo ufw route allow in on eno1 out on tap2

What IP Address does the sender have?

This is where is starts to get more complex.

Every network connection has at least one IP address.

With IP V6

  • Each interface gets an “internal” IP address such as fe80::9b07:33a1:aa30:e272
  • You can allocate an external address using the Linux command sudo ip -6 addr add 2001:7::1 dev enp0s31f6
  • If the interface has one external IP address configured, then this will be used.
  • If the interface has more than one external address configured then the first in the list may be used (not always).
  • If the interface does not have an external IP address, then TCPIP will find one from another interface, or allocate one for the duration of the request, such as 2a00:23c5:978f:9999:210a:1e9b:94a4:c8e. This address comes from my wireless connection

When my laptop is started, only the wireless connection has an IPV6 address. A ping request to 2001:7::4 had the origin IP address of 2a00:23c5:978f:9999:210a:1e9b:94a4:c8e which is the first address in the list for the wireless connection.

You can tell ping which address to use, for example

ping -I 2a00:23c5:978f:9999:cff1:dc13:4fc6:f21b 2007:1::4 (this fails because the server does now know to send the response to the requester)

I defined a new address for the interface using

sudo ip -6 addr rep 2002:7::1 dev enp0s31f6

I could issue ping -I 2002:7::1 2001:7::4, but this failed to get a response, because the back-end and intermediate nodes, did not know how to get the response back to 2002:7::1

Does this sender address matter?

Yes, because the remote end needs to have a definition to be able to send the response back.

I had my laptop connected to a Linux server over Ethernet, which in turn was connected to z/OS over a virtual Ethernet.

On z/OS I could see the ping request arrive, but it could not send the response back because it did not know how to get to 2a00:23c5:978f:9999….

I configured the laptop end of the Ethernet to give it an IP address 2001:7::1

sudo ip -6 addr add 2001:7::1/64 dev enp0s31f6

I configured the server to laptop to have an IP address of 2001:7::2

sudo ip -6 route add 2001:7::1/128 dev eno1

I configured the server to z/OS with an IP address and a route

sudo ip -6 addr add 2001:7::3/128 dev tap2
sudo ip -6 route add 2001:7::4/128 dev tap2

Now when I did the ping, the originator was 2001:7::1.

I configured the z/OS interface to send stuff back

INTERFACE JFPORTCP6 
DEFINE IPAQENET6
CHPIDTYPE OSD
PORTNAME PORTC

INTERFACE JFPORTCP6
ADDADDR 2001:DB8:8::9
INTERFACE JFPORTCP6
ADDADDR 2001:DB8::9
INTERFACE JFPORTCP6
ADDADDR 2001:7::4

START JFPORTCP6

and the routing

BEGINRoutes 
; Destination Gateway LinkName Size
ROUTE 2001:7::/64 2001:7::3 JFPORTCP6 MTU 5000
...
ENDRoutes

This says that all traffic with destination address 2001:0007:0000:000…. send to interface JFPORTCP6. This interface is connected to a gateway with the address of the remote end of the Ethernet (so on the server) of 2001:7::3.

The server machine needs to act as a router between the different Ethernet buses. You can display and configure this using

sysctl -a |grep forwarding
sudo sysctl -w net.ipv6.conf.all.forwarding=1

Packet forwarding on z/OS

By default the z/OS interface only listens for packets with one of the IP addresses of the interface. For z/OS to be able to be a router; accept all packets, and route them to other interfaces you need:

  • IPCONFIG DATAGRAMFWD in the TCP/IP Profile
  • PRIROUTER on the Interface definition . This configures the Ethernet adapter (hardware) so If a datagram is received at this device for an unknown IP address, the datagram is routed to this TCP/IP instance.

But normally if you are running z/OS you use a cheaper, physical router, rather than use the z/OS to do your routing. It might only be people like me who run z/OS on their laptop who want to try routing through z/OS.

Real time monitor for z/OS TCPIP interface statistics

Ive put up on Github a program which provides real time statistics for z/OS TCPIP interfaces.  For example

Interface       ,HH:MM:SS,IBytes,OBytes,IB/sec,OB/sec,,IUP   ,OUP   ,,...
TAP0 ,09:11:12, 0, 0, , ,, 0, 0,,...
TAP0 ,09:11:22, 0, 0, 0, 0,, 0, 0,,
TAP0 ,09:11:32, 0, 0, 0, 0,, 0, 0,,
TAP0 ,09:11:42, 348, 348, 34, 34,, 3, 3,,
TAP0 ,09:11:52, 0, 0, 0, 0,, 0, 0,,
TAP0 ,09:12:02, 0, 0, 0, 0,, 0, 0,,
TAP0 ,09:12:12, 0, 0, 0, 0,, 0, 0,,
TAP0 ,09:12:22, 0, 0, 0, 0,, 0, 0,,
TAP0 ,09:12:32, 0, 0, 0, 0,, 0, 0,,
TAP0 ,09:12:42, 0, 0, 0, 0,, 0, 0,,

Where the values are deltas from the previous time interval

  • IBytes is input bytes during that interval
  • OBytes is output bytes
  • IB/Sec is rate of input bytes
  • OB/Sec is rate of output byte
  • IUP is count of Input Unicast packets
  • OUP is count of Output Unicast packets.

You get one output file per interface.

The output file can be downloaded and used as input to a spread sheet, to graphically display the data.

I’m working on some code to display the global TCPIP stats, but I do not think they are very interesting.  If they are interesting to you, please let me know and I’ll add my code.

One minute: Understanding TCPIP routing: Static, RIP, OSPF

This is another blog post in the series “One minute…” which gives the basic concepts of a topic, with enough information so that you can read other documentation, but without going too deeply.

IP networks can range in size from 2 nodes(machines), to millions of nodes(machines), and a packet can go from my machine to any available machines – and it arrives! How does this miracle work?

I’ll work with IP V6 to make it more interesting (and there is already a lot of documentation for IP V4)

I have and old laptop, connected by Ethernet to my new laptop. My new laptop is connected by wireless to my server which is connected to z/OS. I can ping from the old laptop to z/OS.

  • Each machine needs connectivity for example wireless, Ethernet, or both.
  • Each machine has one or more interfaces where the connectivity comes in (think Ethernet port, and Wireless connection). This is sometimes known as a device.
  • Each interface has one or more IP addresses.
  • You can have hardware routers, or can route through software, without a hardware router. A hardware router can do more than route.
  • Each machine can route traffic over an interface (or throw away the packet).
    • If there is only one interface this is easy – all traffic goes down it.
    • If there is more than one interface you can specify which address ranges go to which interface.
    • You can have a default catch-all if none of the definitions match
    • You can have the same address using different interfaces, and the system can exploit metrics to decide which will be used.
    • You can have policy based routing. For example
      • packets from this premier user, going to a specific IP address should use the high performance (and more expensive) interface,
      • people using the free service, use the slower(and cheaper) interface.

Modern routing uses the network topology to manage the routing tables and metrics in each machine.

Static

The administrator defines a table of “if you want get to… then use this interface, the default is to send the packet using this … interface”. For example with z/OS

BEGINRoutes 
;     Destination   SubnetMask    FirstHop    LinkName  Size 
; ROUTE 192.168.0.0 255.255.255.0       =     ETH2 MTU 1492 
ROUTE 10.0.0.0      255.0.0.0           =     ETH1 MTU 1492 
ROUTE DEFAULT                     10.1.1.1    ETH1 MTU 1492 
ROUTE 10.1.0.0      255.255.255.0   10.1.1.1  ETH1 MTU 1492 

ROUTE 2001:db8::/64 fe80::f8b5:3466:aa53:2f56 JFPORTCP2 MTU 5000 
ROUTE fe80::17      HOST =                    IFPORTCP6 MTU 5000 
ROUTE default6      fe80::f8b5:e4ff:fe59:2e51 IFPORTCP6 MTU 5000
                                                                      
ENDRoutes 

Says

  • All traffic for 10.*.*.* goes via interface ETH1.
  • If no rule matches (for IP V4) use the DEFAULT route via ETH1. The remote end of the connection has IP address 10.1.1.1
  • All traffic for IPV6 address 2001:db8:0:* goes via interface JFPORTCP2
  • If no rule matches (for IP V6) use the DEFAULT6 route via IFPORTCP6. The remote end of the connection has IP address fe80::f8b5:e4ff:fe59:2e51.

On Linux the ip route command gave

default via 192.168.1.254 dev wlxd037450ab7ac proto dhcp metric 600 
10.1.0.0/24 dev eno1 proto kernel scope link src 10.1.0.3 metric 100 
10.1.1.0/24 dev tap0 proto kernel scope link src 10.1.1.1 

This says

  • The default is to send any traffic via device wlxd037450ab7ac.
  • Any traffic for 10.1.0.* goes via device eno1
  • Any traffic for 10.1.1.* goes via device tap0.

Routing Information Protocol(RIP)

Manually assigning metrics (priorities) to hint which routes are best, quickly becomes unmanageable when the number of nodes(hosts) increases.

If the 1980’s the first attempt to solve this was using RIP. It uses “hop count” of the destination from the machine as a metric. A route with a small hop count will get selected over a route with a large hop count. Of course this means that each machine needs to know the topology. RIP can support at most 15 hops.

Each node participating in RIP learns about all other nodes participating in RIP.

Every 30 seconds each node sends to adjacent nodes “I know about the following nodes and their route statements”. Given this, eventually all nodes connected to the network will know the complete topology.
For example, from the frr(Free Range Routing) trace on Linux

RIPng update timer expired!
RIPng update routes on interface tap1
  send interface tap1
  SEND response version 1 packet size 144
   2001:db8::/64 metric 1 tag 0
    2001:db8:1::/64 metric 1 tag 0
   2002::/64 metric 2 tag 0
    2002:2::/64 metric 2 tag 0
   2008::/64 metric 3 tag 0
    2009::/64 metric 1 tag 0
    2a00:23c5:978f:6e01::/64 metric 1 tag 0

This says

  • The 30 second timer woke up
  • It sent information to interface tap1
  • 2001:db8::/64 metric 1 this is on my host(1 hop)
  • 2002::/64 metric 2 this is from a router directly connected to me (2 hops).
  • 2008::/64 metric 3 is connected to a router connected to a router directly connected to me (3 hops.)

On z/OS the command F OMP1,RT6TABLE gave me message EZZ7979I . See OMPROUTE IPv6 main routing table for more information

DESTINATION: 2002::/64 
  NEXT HOP: FE80::E42D:73FF:FEB1:1AB8 
  TYPE:  RIP           COST:  3         AGE: 10 
DESTINATION: 2001:DB8::/64 
  NEXT HOP: FE80::E42D:73FF:FEB1:1AB8 
  TYPE:  RIP*          COST:  2         AGE: 0 

This says

  • To get to 2002::/64 go down interface with the IP address FE80::E42D:73FF:FEB1:1AB8.
  • This route has been provided by the RIP code.
  • The destination is 3 hops away (in the information sent from the server it was 2 hops away)

The fields are

  • RIP – Indicates a route that was learned through the IPv6 RIP protocol.
  • * An asterisk (*) after the route type indicates that the route has a directly connected backup.
  • Cost 3 – this route is 3 hops away.
  • Age 10 -Indicates the time that has elapsed since the routing table entry was last refreshed

OSPF (Open Shortest Path First)

OSPF was developed after RIP, as RIP had limitations – the maximum number of hops was 15, and every 30 seconds there was a deluge of information being sent around. The OSPF standard came out in 1998 10 years after RIP.

Using OSPF, when a system starts up it sends to the neighbouring systems “Hello,_ my router id is 9.3.4.66, and I have the following IP addresses and routes.” This information is propagated to all nodes in the OSPF area. When a node receives this information it updates its internal map (database) with this information. Every 10 seconds or so each node sends a “Hello Packet” to the adjacent node to say “I’m still here”. If this packet is not received, then it can broadcast “This node …. is not_responsive/dead”, and all other nodes can then update their maps.

If the configuration changes, for example an IP address is added to an interface, the node’s information is propagated throughout the network. In a stable network, the network traffic is just the “Hello packet” sent to the next node, and any configuration changes propagated.

One of the pieces of information sent out about node’s route is the metric or “cost”. When a node is deciding which interface to route a packet to, OSPF can calculate the overall “cost” and if there are a choice of routes to the destination it can decide which interface gives the best cost.

To make it easier to administer, you can have areas, so you might have an area being the UK, another area being Denmark, and another area being the USA.