Installing Data Gatherer on z/OS

OpenTelemetry is a technology for providing trace data as work flows between systems. It is often called OpenTel or OTel.

As work flows around a network, status(“where the work is”) information is sent to a central location, and tools at this central location can sew the data together and produce visualisations of the flow of data, and where the work was delayed.

The data is normally sent over TCP/IP.

Products/programs could emit their own data in the required format, and sent it over TCP/IP to the server. On z/OS data can be written to In-Memory SMF, and a data collector reads the SMF records and sends the data over TCP/IP to the central site.

IBM provides a Data Gatherer. This post is about configuring it and getting it working. It feels a bit rough around the edges, and I would have designed some of it differently.

Where is it?

The code has been forced into the RMF data collector. The data collector code is usually installed in the Unix directory /usr/lpp/grb.

Where is the documentation?

In the z/OS Data Gatherer User’s Guide (SC31-5703-70) – Chapter 3 are instructions on creating security profiles.

The first sentence in this chapter is:

Instructions for setting up the z/OS OpenTelemetry Emitter are available in the file system in your z/OS installation at /usr/lpp/grb/opentelemetry_emitter/dt/README.

These instructions are not clear, and sometimes wrong. (For example to disable TLS, you set a flag, and then define a certificate!) Thanks to Joern Thyssen from Rocket Software for his help in getting it working.

I created some JCL called MYDG and submitted it. You could use a started task. The userid running the data gatherer needs access to the SMF data. If the access to the SMF data is restricted you may want to use a started task with its own restricted userid. Instructions for this are in the Chapter 3 above.

My JCL

//IBMOT32  JOB  (OTEL),MSGLEVEL=(1,1),NOTIFY=&SYSUID
// EXPORT SYMLIST=*
// SET JARFILE='zos-otel-emitter-dt.jar'
// SET $OTLENDP='http://10.1.0.2:4317'
// SET $OTLPRT='grpc'
// SET REGSIZE='0M'
// SET $OTLEXPC='false'
// SET $TLSENBL='false'
// SET $MTLS='false'
// SET $TLSCERT=''
// SET $TLSCLKY=''
// SET $TLSCLCR=''
// SET $SMFRDBS=''
//*
// SET VERSION='21'
// SET JAVADIR='/usr/lpp/java/java21/current_64'
// SET APPHOME='/usr/lpp/grb/opentelemetry_emitter/dt'
// SET $INMRESL='IFASMF.MQOTEL'
// SET $SMFDUMP='false'
// SET $SMFDUMP='true' dump SMF record binary
// SET $SMFRDFL='0'
//JAVAJVM EXEC PGM=JVMLDM&VERSION,REGION=&REGSIZE PARM='/+I'
//STEPLIB DD DISP=SHR,DSN=JAVA.V21R0M0.SIEALNKE
...

Java 21 and higher

If you are using Java 21 or higher, some of the output from Java comes out by default in ASCII (and so is not easily readable). You need to specify

IJO="$IJO -Dfile.encoding=IBM-1047" 

Identifing the SMF data

This reads SMF data identified as IFASMF.MQOTEL, and sends it over http (not https) to http://10.1.0.2:4317.
MQ writes its OpenTel data to SMF records type 1158.

My SMFPRMxx in parmlib has

RECORDING(LOGSTREAM) 
...
INMEM(IFASMF.MQOTEL,RESSIZMAX(128M),TYPE(1158))

This gives a user specified name IFASMF.MQOTEL to records with type 1158. You protect the name IFA.IFASMF.MQOTEL with your security manager. (The Chapter 3 documentation is confusing).

The Data Gatherer accesses the data through the label IFASMF.MQOTEL.

Collecting data

Once Java has started (it takes about 15 seconds to start on my baby zD&T machine), it will listen for new records sent to the SMF resource (IFASMF.MQOTEL). It does not drain existing records.

You can send the data over TCP/IP or write it locally.

If you are sending the data over TCP/IP, once the first record has been read from SMF, the data collector starts a TCP/IP session to the remote collector. If the IP address is not active (or is misconfigured) it can take many seconds ( > 15 seconds for me) before the UnknownHostException is thrown.

Personally I would have connected at startup, so you know if you have a configuration error. It is not good when you start the server at midnight, but only find there is a problem at 0800 when the work starts. It would be better to report the error when the server is started, because it gives you more time to fix any problems.

If the connection is successful, there is no notification.

What is sent?

You can specify the option

 SET $SMFDUMP='true'  

and it dumps the data in //STDOUT

00000000 13 60 00 00 7E 7E 00 61 5A DC 01 26 17 7F E5 E2 .-..==..!...."VS
00000010 F0 F1 00 00 00 00 00 01 00 20 01 00 00 E2 E3 BE 01...........ST.
00000020 DD C1 37 4E 40 00 00 00 01 00 00 01 00 00 00 00 .A.+ ...........
00000030 00 00 00 00 04 86 00 00 00 00 00 40 00 00 00 02 .....f..... ....
00000040 00 01 0A 00 E2 D7 C1 D5 00 E2 E3 BE DD C0 C0 60 ....SPAN.ST..{{-
00000050 80 00 00 00 00 00 00 00 00 E2 E3 BE DD C0 CC 83 .........ST..{.c
00000060 80 00 00 00 00 00 00 00 F0 81 86 F7 F6 F5 F1 F9 ........0af76519
00000070 F1 F6 83 84 F4 F3 84 84 F8 F4 F4 F8 85 82 F2 F1 16cd43dd8448eb21
00000080 F1 83 F8 F0 F3 F1 F9 83 85 F2 85 F3 82 85 84 84 1c80319ce2e3bedd
00000090 83 F0 83 F0 F6 F0 F8 F0 85 F2 85 F3 82 85 84 84 c0c06080e2e3bedd
000000A0 83 F0 F1 86 F9 F7 83 F0 00 04 00 2F 00 18 0C 01 c01f97c0........
000000B0 A2 85 99 A5 89 83 85 4B 95 81 94 85 00 04 01 F4 service.name...4
000000C0 C3 E2 D8 F9 00 20 09 01 A2 97 81 95 4B 95 81 94 CSQ9....span.nam
000000D0 85 00 00 00 00 0B 01 F4 D4 D8 C7 C5 E3 40 C3 D6 e......4MQGET CO
000000E0 D3 C9 D5 00 00 28 18 01 94 85 A2 A2 81 87 89 95 LIN.....messagin
...

Once you have proved it is working – I suggest you set SMFDUMP=’false’.

TLS support

The Data Gatherer has support for TLS, but the backend I was using Jaeger does not have TLS support. The documentation says install the cassandra product; I could not install this on my Linux machine.

Debugging the JCL

I had various problems about configuration problems. I found specifying PARM=’/+I’

//JAVAJVM  EXEC PGM=JVMLDM&VERSION,REGION=&REGSIZE,PARM='/+I' 

showed what configuration parameters were used.

Debugging TLS

You can get a Java trace for TLS by specifying

IJO="$IJO  -Djavax.net.debug=all " 

Though you may not want to specify “all”. See here for more information on javax.net.debug