Understanding the TLS trace, for example trying to get a client to use a web server over TLS, is a difficulty-squared problem.
In this post, I give some tools to reduce the amount of trace data, and provide and annotated trace so you can understand what is going on, and spot the errors. I’ve also annotated the trace with common user errors, and links to possible error causes.
This post, and the referenced pages are still work in progress while I sort out some of the little problems that creep in. Please send me comments on any mistakes or suggestions for improvements (or additional reasons why a handshake fails).
Understanding the TLS trace is hard.
It is hard enough to understand what the trace is showing, but it is made even more difficult to use.
- Where there are concurrent threads running; the trace records are interleaved, and it can be hard to tell which data belongs to which thread.
- The formatting is sometimes poor. Instead of giving one line with a list of 50 comma separated numbers, it gives you 100 lines with either a number, or a comma. And when you have two threads doing this, it is a nightmare trying to work out what data belongs to which thread. (But I usually ignore these records).
- The trace tends to be “provide all information that might possibly be useful”, rather than provide information needed by most people to resolve why a client cannot connect to the server. For example the trace gives a print out of the encrypted data – in hex!
- You turn the trace on with a java -D… parameter. Other applications, such as web servers, have different ways of turning the trace on. I could not find a way of turning it on and off dynamically so you can get a lot of output if you have to run for a long period. The output may go into trace files which wrap.
- Different implementations have slightly different trace formats.
All these factors made it very difficult to understand the trace and find the cause of your problems.
What can you do to understand it.
Do not despair!
- On z/OS I created an edit macro, which I use to delete or transform data. It reduces an 8000 line spool file down to 800 lines. See ztrace.rexx.
- On Linux I have a python script which does the same. See tls.py.
In some traces sections are delimited by ***…. *** to make it easier to see the structure.
To find problems look for *** at the start of the line, or “exception”.
You may need to look at both ends of the trace to understand the problem. One end may get a response “TLSv1.2 ALERT: warning, description = …” and you need to examine the other end to find the reason for the message.
Annotated trace file
I have taken a trace file from each end, and annotated then, so you can see what the flow is, and how and where the data is used. I have colour coded some of the flows, and included some common errors (in red) , with a link to possible solutions.
Some lines have hover text.
If you have suggestions for additional information – or reasons why things do not work – please tell me and I’ll update the documentation.
The trace is from a Linux client going to a Liberty on z/OS.
- Server starts up – and waits for a connection from a client
- Client starts up, and sends a “Client Hello” request to the server
- Server wakes up, processes the request
- Server sends “ServerHello” to the client
- Optional. If the server wants client authentication, server send the “client Authentication request”
- *** ServerHelloDone. It has finished the processing, send the data and wait for the reply.
- Client wakes up and processes the “ServerHello”, optionally sends back the “CertificationResponse”, and sends verify.
- Servers processes the verify and ends the handshake.