Setting up JES2 input NJE node (server) and AT-TLS

I got this working in response to a question about AT-TLS and JES2.

You need to configure the port and IP address of the destination node using AT-TLS.

I created the socket definitions

$ADDSOCKET(TLS),NODE=1,IPADDR=10.1.1.2,NETSRV=1,PORT=2275

Before you start

Get a working JES2 NJE, and AT-TLS environment. It makes it difficult to get the AT-TLS configured as well as getting NJE to work.

JES2 NJE needs a Netserver (NETSRV) to do the TCP/IP communication.

When you configure AT-TLS this intercepts the traffic to the IP address and port and does the TLS magic. This means you need a different netserver, and a tls specific port, and a TLS specific socket. It looks like the default TLS port is 2252. The doc says

SECURE=OPTIONAL|REQUIRED|USE_SOCKET
Indicates whether the NETSERV should accept only connection requests with a secure protocol in use such as TLS/SSL. When SECURE=REQUIRED is speci®edQ the NETSERV rejects all connection requests that do not specify a secure protocol is to be used for the connection. When SECURE=OPTIONAL is speciedQ the NETSERV allows connections with or without a secure protocol in use.
The default, USE_SOCKET, inherits the SECURE setting from the SOCKET statement associated with the NETSERV. If the SOCKET says SECURE=YES, then processing is the same as specifying
SECURE=REQUIRED on the NETSERV.
To specify that the NETSERV should use NJENET-SSL (2252) as the PORT it is listening on and the default port for outgoing connections, but not require all connections to use TLS/SSL, you must specify SOCKET SECURE=YES on the socket that is associated with the NETSERV and set the NETSERV to SECURE=OPTIONAL.

I do not understand this because AT-TLS will try to do a TLS handshake and fail if the session is not a TLS session.

It feels like the easiest way is to have a netserver just for TLS with its own port. I may be wrong.

In my PAGENT configuration, I took a working TLSrule and created

TTLSRule CPJES2IN 
{
LocalAddr ALL
RemoteAddr ALL
LocalPortRange 2252
Direction Inbound
Priority 255
TTLSGroupActionRef AZFGroupAction1
TTLSEnvironmentActionRef AZFEnvAction1
TTLSConnectionActionRef AZFConnAction1
}

This is for the inbound traffic on port 2252.

I defined the JES2 node

$TSOCKET(TLS),NODE=1,IPADDR=10.1.1.2,NETSRV=1,PORT=2252 

with the matching port=2252

I assigned this socket to netsrv1, and started it

$TNETSRV1,SOCKET=TLS
$SNETSRV1

I used a Python nje client to connect to z/OS. I used a modified version of the python NJE client, where I defined a certfile, keyfile and cafile.

I used

nje = njelib.NJE("N50","S0W1")
nje.set_debuglevel(1)
nje.setTLS is colin added code
#nje.setTLS(certfile="/home/colinpaice/ssl/ssl2/jun24/docec521june.pem",
# keyfile="/home/colinpaice/ssl/ssl2/jun24/docec521june.key.pem",
# cafile="/home/colinpaice/ssl/ssl2/jun24/docca256.pem")
connected = nje.session(host="10.1.1.2",port=2252,timeout=1)

Where the JES2 system is called S0W1, the node used is N50.

The z/OS IP address is 10.1.1.2, and the port is 2252.

There were no helpful messages to say the session was using TLS. I used Wireshark on the connection, and AT-TLS trace to check the TLS calls.

If I used a non TLS connection to the z/OS node I got

EZD1287I TTLS Error RC: 5003 Data Decryption    
LOCAL: ::FFFF:10.1.1.2..2252
REMOTE: ::FFFF:10.1.0.2..41288
JOBNAME: JES2S001 RULE: CPJES2IN

showing the AT-TLS definition was CPJES2IN.

RC 5003 will occur when the AT-TLS process is expecting an TLS message but receives a clear-text message – so no TLS request coming in.

Setting up JES2 NJE using TCP/IP

I was trying to test TLS and JES2 NJE, and needed to get JES2 NJE working. I did not have remote system to use, so I used Python NJE, I also used openssl s_server to act as a server – just for the connection.

For more information on setting up JES 2 NJE with TLS see:

Setting up NJE on JES2

You can use static (defined in the JES2PARM member) or define them dynamically using commands.

The bits you need

TCP/IP work is done in a net server NETSRV task. You can define more than one of these to allow you to partition the work.

The net server needs a SOCKET definition. This socket definition needs the IP address on the local system, and the port used to connect to the socket code. If you let it default to the local IP address, it may not pick the IP address you want to use.

You need a NODE definition for the remote end.

You need a TCP/IP LINE definition for the connection to the remote system.

You need a SOCKET for the remote connection, giving the IP address of the remote end, the port to be used at the remote end, the LINE definition to be used, and the NODE to be used.

These have to be started before they can be used.

I had firewall problems on my Linux server, where it was not forwarding packets to the remote system. Once I fixed this, the connection was easy.

Static definition

The address of my z/OS is 10.1.1.2. The address of the remote end is 10.1.0.2

In the JES2 parmlib members I added

NODE(2)     NAME=LAPTOP    
SOCKET(LOC) NODE=1,IPADDR=10.1.1.2,netsrv=1,PORT=175
NETSRV(1) SOCKET=LOC
SOCKET(LAPTOP) NODE=50,IPADDR=10.1.0.2,LINE=2,NETSRV=1,port=22
LINE(2) UNIT=TCP

Dynamic definitions

I used the following operator commands to define the resources, rather than define them statically

$ADDSOCKET(LOC),NODE=1,IPADDR=10.1.1.2,netsrv=1,PORT=175
$Addnetsrv(1),socket=LOC
$addline(2),unit=tcp
$ADDSOCKET(LAPTOP),IPADDR=10.1.0.2,line=2,netsrv=1,node=50

You need to use a statically defined NODE.

Starting them up

I then issued

  • $SNetsrv1 This starts an address space with name JES2S001.
  • $SLNE2 to start the line
  • $Sn,socket=LAPTOP

Other useful commands

  • $DNETSRV1
  • $DNetsrv1,sessions this gave output like
    • $HASP898 NETSRV1 SESSIONS=(LNE2/LAPTOP/S6)
  • $DNetsrv1,socket this displays which socket the net server is using.
  • $DSOCKET to display all sockets
  • $DSOCKET(LAPTOP4)
  • $TSOCKET(LOC),SECURE=YES,PORT=2275

z/OS systems-ssl strange behaviour with environment variables

I was trying to use system ssl to write a program to use native z/O TLS facilities. I wasted a couple of hours because it said it could not find my keyring. Then when I collected a trace, it sometimes did not find the file – which did exist as I could list it.

If I used

//START1   EXEC PGM=GSKMAIN,REGION=0M, 
//* PARM='4000'
// PARM=('ENVAR("_CEE_ENVFILE=DD:STDENV")/4000')
//STDENV DD PATH='/u/ibmuser/gskparms'

When the USS file had

GSK_TRACE_FILE=/tmp/zzztrace.file 
GSK_TRACE=0xff
GSK_KEYRING_FILE=START1/TN3270

This worked file

When I used

//START1   EXEC PGM=GSKMAIN,REGION=0M, 
//* PARM='4000'
// PARM=('ENVAR("_CEE_ENVFILE=DD:STDENV")/4000')
//STDENV DD *
GSK_TRACE_FILE=/tmp/zzztrace.file
GSK_TRACE=0xff
GSK_KEYRING_FILE=START1/TN3270
/*

This failed to work.

If I looked in the trace file I had

ENTRY gsk_open_keyring(): ---> Keyring 'START1/TN3270                       ' 

Where it had taken the whole length of the line – and so START1/TN3270 padded with blanks was not found.

The trace file was not /tmp/zzztrace.file, it was /tmp/zzztrace.file padded with lots of blanks!

The answer is to use a environment file in USS, not in JCL or a data set.

Destination unreachable, Port unreachable. Which firewall rule is blocking me?

I was trying to connect an application on z/OS through a server to my laptop – so three systems involved.

On the connection from the server to my laptop, using Wireshark I could see no traffic from the application.

When I used Wireshark on the z/OS to server connection I got

   Source   Destination port Protocol info 
>1 10.1.1.2 10.1.0.2 2175 TCP ..
<2 10.1.1.1 10.1.1.2 2175 ICMP Destination unreachable (Port unreachable)

This means

  1. There was a TCP/IP Packet from 10.1.1.2 (z/OS) to 10.1.0.2 (mylaptop) port 2175
  2. Response:Destination unreachable (Port unreachable)

This was a surprise because I could ping from z/OS through the server to the laptop.

Looking in the firewall log ( /var/log/ufw.log) I found

[UFW BLOCK] IN=tap0 OUT=eno1 MAC=... SRC=10.1.1.2 DST=10.1.0.2 ... PROTO=TCP SPT=1050 DPT=2175 ...

This says

  • Packet was blocked. When using the ufw firewall – all of its messages and definitions contain ufw.
  • From 10.1.1.2
  • To 10.1.0.2
  • Source port 1050
  • Destination port 2175

With the command

sudo ufw route allow in on tap0 out on eno1

This allows traffic to be routed through this node from interface tap0 to interface eno1, and solved my problem.

What caused the problem?

iptables allows the systems administrator to define rules (or chains of rules – think subroutines) to control the flow of packets through the Linux kernel. For example

  • control input input packets destined for this system
  • control output packets from this system
  • control forwarded packets flowing through this system.

ufw is an interface to iptables which makes it easier to define rules.

You can use

sudo ufw status

to display the ufw definitions, for example

To                         Action      From
-- ------ ----
22/tcp ALLOW Anywhere
Anywhere on eno1 ALLOW Anywhere
Anywhere on tap0 ALLOW Anywhere (log) # ‘colin-ethernet’

You can use

sudo iptables -L -v

to display the iptables. The -v options show you how many times the rules have been used.

sudo iptables-save reports on all of the rules. For example (a very small subset of my rules)

-A FORWARD -j ufw-before-forward
-A ufw-before-forward -j ufw-user-forward
-A ufw-user-forward -i tap0 -o eno1 -j ACCEPT
-A ufw-user-forward -i eno1 -o tap0 -j ACCEPT

-A ufw-skip-to-policy-forward -j REJECT --reject-with icmp-port-unreachable

Where

  • -A FORWARD.… says when doing forwarding use the rule (subroutine) called ufw-before-forward. You can have many of these statements
  • -A ufw-before-forward -j ufw-user-forward add to the end of subroutine ufw-before-forward, call (-jump to) subroutine ufw-user-forward
  • -A ufw-user-forward -i tap0 -o eno1 -j ACCEPT in subroutine ufw-user-forward, if the input interface is tap0, and the output interface is eno1, then ACCEPT the traffic, and pass it on to interface eno1.
  • -A ufw-user-forward -i eno1 -o tap0 -j ACCEPT in subroutine ufw-user-forward, if the input interface is eno1, and the output interface is tap0, then ACCEPT the traffic, and pass it on to interface eno1.
  • -A ufw-skip-to-policy-forward -j REJECT –reject-with icmp-port-unreachable. In this subroutine do not allow the packet to pass through, but send back a response icmp-port-unreachable. This is the response I saw in Wireshark.

With -j REJECT you can specify

icmp-net-unreachable
icmp-host-unreachable
icmp-port-unreachable
icmp-proto-unreachable
icmp-net-prohibited
icmp-host-prohibited
icmp-admin-prohibiteda

The processing starts at the top of the tree and goes into each relevant “subroutine” in sequence till it finds and ACCEPT or REJECT.

If you use sudo iptables -L -v it lists all the rules and the use count. For example

Chain FORWARD (policy DROP 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
...
259 16364 ufw-before-forward all -- any any anywhere anywhere

Chain ufw-before-forward (1 references)
pkts bytes target prot opt in out source destination
...
77 4620 ufw-user-forward all -- any any anywhere anywhere

Chain ufw-user-forward (1 references)
pkts bytes target prot opt in out source destination
0 0 ACCEPT all -- eno1 tap2 anywhere anywhere
0 0 ACCEPT all -- tap2 eno1 anywhere anywhere
9 540 ACCEPT all -- tap0 eno1 anywhere anywhere
0 0 ACCEPT all -- eno1 tap0 anywhere anywhere

Chain ufw-reject-forward (1 references)
pkts bytes target ...
45 2700 REJECT ... reject-with icmp-port-unreachable
  • For the packet forwarding it processed a number of “rules”
    • 259 packets were processed by subroutine ufw-before-forward
  • Within ufw-before-forward, there were several calls to subroutines
    • 77 packets were processed by subroutine ufw-user-forward
  • Within ufw-user-forward the line (in bold) said there were 9 packets, which were forwarded when the input interface was tap0 and the output was eno1.
  • Within the subroutine ufw-reject-forward, 45 packets were rejected with icmp-port-unreachable.

The ufw-reject-forward was the only instance of icmp-port-unreachable with packet count > 0. This was the rule which blocked me.

Log file

In the /var/log/ufw.log was an entry for [UFW BLOCK] for the address and port,

Non functional requirements: backups

This blog post is part of a series on non functional requirements, and how they take most of the effort.

The scenario

You want a third party to implement an application package to allow people to buy and sell widgets from their phone. Once the package has been developed, they will hand it over to you to sell, support, maintain and upgrade and you will be responsible for it,

At the back-end is a web server.

Requirements you have been given.

  • We expect this application package to be used by all the major banks in the world.
  • For the UK we expect the number of people who have an account to be about 10 million people
  • We expect about 1 million trades a day.

See start here for additional topics.

Why backup?

You need to take backups, (and more importantly be able to restore them) for various reasons

  • To recover from media failures.
  • To recover from human failure. You may have mirrored disks, but if an operator deletes a file or table, it will be reliably deleted on the mirrored disks.
  • You may be asked for historical information. 10 years ago, did this person have an account with you, and can you show the transactions on the account.

How to backup

For a simple file, it is easy to backup.

For a database, or file which is continually being updated, you need a more sophisticated approach. If a transaction is deleting funds from one account and incrementing the funds in a different account, you need to ensure that the backup has consistent data.

With databases you can back up an “inflight” database. If you need to restore it, it replays the transaction log and reapplies any transactions.

Other solutions is to have the main database read only, and do updates in a small database in front of the main database.

You could also partition the database, for example the A partition for surnames beginning with A, etc. These should be smaller than one large database, and so quicker to backup.

What do you backup?

You need to think about what you backup. For example people’s names and addresses do not change very much, but their current balance may change every day.

How long to keep the backup for?

You may have to keep backups for 10 years depending you your industry regulator.

How much does it cost ?

When you are specifying the project there will be many unknowns, so you need to make assumptions.

For example in the brief it says there will be 10 million users and 1 million trades a day.

Non functional requirements: do not create a straitjacket.

This blog post is part of a series on non functional requirements, and how they take most of the effort.

The scenario

You want a third party to implement an application package to allow people to buy and sell widgets from their phone. Once the package has been developed, they will hand it over to you to sell, support, maintain and upgrade and you will be responsible for it,

At the back-end is a web server.

Requirements you have been given.

  • We expect this application package to be used by all the major banks in the world.
  • For the UK we expect the number of people who have an account to be about 10 million people
  • We expect about 1 million trades a day.

See start here for additional topics.

The functional straitjacket

You may decide to use a facility like a particular database, because you can use a function that only that particular database provides. If you want to move from this database supplier – you are tied because of this function. It may be expensive writing this function yourself – to allow you to move with out disrupting the ongoing usage.
You may decide to use a standard level of function, such as SQL – but there are different standards. For example one standard of SQL can support JSON.

To avoid the straitjacket, you might decided on a subset of functions, and the applications cannot use functions outside this subset.

Testing for this

You might want to do most of your testing on one platform/environment, but include tests on different environments. For example run your Java web server on Windows and some on Linux, and have two different back end databases.

This may increase the development costs – but this is cheaper than trying to escape from the straitjacket when running in production.

Designing for this

Rather than scatter database calls throughout your code. Consider a component or domain which does all of the database calls. If you need to change your database, you have only to change the component, not the whole product. It allows mapping of different return codes to be done once, as well as mapping parameters.

With some databases (such as DB2) you run a program “runstats” which tells DB2 to update it’s meta-knowledge of the tables. This helps the database manager make the best decision for accessing the data, when there are multiple paths to the data. For example should it use a sequential search to access rows, or use an index. If this meta data changes, you need to rebind your applications to pick up the changes. If your SQL is isolated to one component, you just have to rebind that component, and not the whole product.

Other straitjackets

  • Use of certificate authority certificates. Can you change CA chain?
  • Use of computer language, and deprecated functions within that function.
  • Use of a particular compiler. Your code does not compile on a different compiler or different operating system
  • Use of packages for TLS – and using unsupported cipher specs.
  • Use of virtualisation system
  • Use of cloud provider.

Non functional requirements: do you want immediate database consistency or eventual consistency?

This blog post is part of a series on non functional requirements, and how they take most of the effort.

The scenario

You want a third party to implement an application package to allow people to buy and sell widgets from their phone. Once the package has been developed, they will hand it over to you to sell, support, maintain and upgrade and you will be responsible for it,

At the back-end is a web server.

Requirements you have been given.

  • We expect this application package to be used by all the major banks in the world.
  • For the UK we expect the number of people who have an account to be about 10 million people
  • We expect about 1 million trades a day.

See start here for additional topics.

What is consistency?

After an end user sells some widgets, if you display the status you should see the number of widgets owned has gone down, and the money in the users’ account has increased. The number of system wide available widgets has gone up, and the amount of money in central account has gone down. All the numbers should be consistent. If there is a problem with the system, such as a database outage, or a power cut, when the system recovers the data should still be consistent.

Wikipedia says

In computer science, ACID (atomicity, consistency, isolation, durability) is a set of properties of database transactions intended to guarantee data validity despite errors, power failures, and other mishaps.[1] In the context of databases, a sequence of database operations that satisfies the ACID properties (which can be perceived as a single logical operation on the data) is called a transaction. For example, a transfer of funds from one bank account to another, even involving multiple changes such as debiting one account and crediting another, is a single transaction.

If there is one big database this is pretty “obvious”.

There are databases with “eventual consistency“. These databases are distributed, and highly available, and it takes a short time (seconds) for updates to be propagated to other instances. Eventually all instances become consistent.

You may make an update on your phone, but when you look with your laptop’s browser, it takes a few seconds to reflect the update – because a different server and database instance were used.

Distributed databases

A single database is a single point of failure. You can have databases which are distributed. For example you have many sites and a database instance at each site. When every a user makes a trade, the local database is updated, and at commit time, the updates are sent to the remote sites, and the changes applied to the other database. Immediately after the trade, the databases are inconsistent. A short time later (seconds) all databases are consistent.

This looks a pretty simple design. However It gets more complex when there are updates occurring on all instances at the same time.

For this to work, the update sent to the other system will reflect the changes such as “10 widgets sold” “credit account with $100”. Rather than absolute values “current balance $400”

What to think about?

You need to consider your availability targets. 100% is achievable, but you need to consider you will need to shutdown machines an reboot to apply fixes, or to move the machine instance.

Can you tolerate a eventual consistent environment, or do you need totally consistent image?

How will it scale?

Will you partition your data so groups of users such as with names in the range A-C go to one server, D-F go to another server etc?

Consider having name and address information in one database – as the information does not frequently change, and dynamic information such as number of widgets, and account balance in another database.

If you have an eventually consistent database, how do you stop people from having multiple sessions all simultaneously trying to transfer your money to an external bank account – and exploiting the delay when using eventual consistency .

Non functional requirements: the additional expense of cheap solutions

This blog post is part of a series on non functional requirements, and how they take most of the effort.

The scenario

You want a third party to implement an application package to allow people to buy and sell widgets from their phone. Once the package has been developed, they will hand it over to you to sell, support, maintain and upgrade and you will be responsible for it,

At the back-end is a web server.

Requirements you have been given.

  • We expect this application package to be used by all the major banks in the world.
  • For the UK we expect the number of people who have an account to be about 10 million people
  • We expect about 1 million trades a day.

See start here for additional topics.

The cost of change.

In the press you can find references to organisations who want to move off an existing platform or solution, but find the cost of doing so is too expensive. The moving cost is many times the original cost of implementation, and the cost of moving is more than the “cost savings”. But they decide to move anyway. I’ve been reading about a council whose cost of moving went from an initial figure of £3 million to over £25 million – and they haven’t completed it yet.

An analogy

Someone described migrating an application from one platform to another was a bit like changing the engines of an air-plane – while the plane is in the air. The new engine will not be a drop in replacement and you have to keep flying to your destination.

Upgrading software from one version to a different version should work, but there may be small differences.

Replacing a core component such as using a different database will need a lot of work. This may be due to

  • a different performance profile,
  • the SQL may not be entirely consistent,
  • you use a facility which is not in the newer database and so you need to change your application.
  • Error messages and codes may be dissimilar.

Looking at the costs

What is the cost?

As part of the discussions with a supplier of a service, such as cloud, CPU, disk space, network capacity, you may have got a good initial deal. After the honeymoon period the cost of these services increases, and you may not be prepared for this. For example the cost of disk space could be an amount per GB per day.

  • The cost is 10 cents per GB per day
  • Your database is 10GB so costs $1 a day.
  • You also backup your database daily and keep the copy for 10 years.
    • After 1 day you have 1 10GB backup – costing $1
    • On the 10th day you have 10 backup copies costing $10 * 1
    • On the 1000th day you have 1000 backups costing $1000 * 1
    • After n days the accumulated cost of the backups is n*(n+1)/2 so 1000 days costs half a million dollars.
    • You also have multiple copies of the backups for Disaster Recovery processes etc. After 1000 days, the accumulated cost is $1Million! This is despite “the cost is only 10 cents per GB per day”.

The charging changes

The terms, conditions and costs may change – basically the price goes up more than expected. As I write this, Broadcom is increasing the cost of the VM license by a factor of 10 times and people are now moving to a different virtualisation technology.

Do not lock yourself in

You may find that you are using a facility in the environment, but this facility is not available in other environments, for example cloud providers. It may be worth using common facilities rather than platform unique facilities, as this makes it easier to change, but at a higher initial cost.

Non functional requirements: security

This blog post is part of a series on non functional requirements, and how they take most of the effort.

The scenario

You want a third party to implement an application package to allow people to buy and sell widgets from their phone. Once the package has been developed, they will hand it over to you to sell, support, maintain and upgrade and you will be responsible for it,

At the back-end is a web server.

Requirements you have been given.

  • We expect this application package to be used by all the major banks in the world.
  • For the UK we expect the number of people who have an account to be about 10 million people
  • We expect about 1 million trades a day.

See start here for additional topics.

What security?

Security covers

  • Application users – for example using their mobile phone to authenticate
  • What userid will be used on the web server to run the transactions. Is this related to the end user’s id?
  • What fields are visible to the application user
  • What fields are available to the help desk staff. For example can they see the full date of birth – or they type the DOB into a field and it is validated.
  • Are you going to provide audit information for any changes to the database; for all fields, or only for some fields.
  • Are you going to report on read only access to some fields.
  • How are you going to report violations
  • Are you going to use encryption on fields. How do you protect the keys.
  • Is your database going to be encrypted – so if someone copies the database file they are unable to read it, or are you going to rely on the fields being encrypted.
  • What encryption are you going to use – some encryption is weak (quantum computers will be able to decrypt some ciphers in an instant)
  • Are your backups encrypted?
  • Able your backup and disaster recovery sites able to restored from backups. Do they have the correct certificates?
  • If someone phones in and says they have forgotten their password – how do you validate the request – bearing in mind the phone may have been stolen.

Non functional requirements: metrics and checking your product is performing within spec

This blog post is part of a series on non functional requirements, and how they take most of the effort.

The scenario

You want a third party to implement an application package to allow people to buy and sell widgets from their phone. Once the package has been developed, they will hand it over to you to sell, support, maintain and upgrade and you will be responsible for it,

At the back-end is a web server.

Requirements you have been given.

  • We expect this application package to be used by all the major banks in the world.
  • For the UK we expect the number of people who have an account to be about 10 million people
  • We expect about 1 million trades a day.

See start here for additional topics.

Why measure?

Your management team want this product to be a success. How do you know if you are achieving required performance, and how do you avoid a twitter storm of people complaining that your product is slow? Some businesses, like those buying and selling on the stock exchange, get fined if they do not meet externally specified performance targets.

Three areas you may be interested in:

  1. Will there be a problem next week, next month? Can we see trends in the existing data, such as CPU usage is going to max out, or the end user response time at peak time will be out of spec.
  2. Is there a performance problem now? If so, can you identify the problem area?
  3. Last month when you had a performance problem – have you kept enough data to identify the problem. The problem may not have shown up in your real time monitoring.

What do you want to measure?

The people specifying the product have said the average response end user response time needs to be under 50 milliseconds. This needs some clarification

  • You cannot control or influence the phone network, so you do not want to be blamed for a slow network. A better metric might be “time in your system” must be less than 40 milliseconds
    • This means you need to capture the entry and exit time, and report the transaction duration on exit.
    • You need a dash board to display this information.
  • The “average response time”: if you get excellent response time at night, when no one is using the system, and poor response time during the lunch break – the average of these response times may be under 40 milliseconds. This way of reporting the data is useless. You might want to report the information on a granularity of 1 minute.
  • You might want to display a chart of the count of any transactions taking over 40 milliseconds, and display the maximum count value in the interval. The should always be zero.
  • You might want to report the average response time per hour, and report this over the last 3 months (or the last 13 months). This should allow you to see if there are any trends in response time, and gives you time to take action before it is a problem.
  • Your application could record the time spent doing database work, and if this time is over 20 milliseconds, report and plot this.
  • If you have any exceptions, such as long response time, then you could send an event to a thread which captures these, and writes a data base record. You you not want to have a database write after your transaction has completed, just before it returns to the end user.

How to capture data

You can have the transaction collect the per transaction response time within your application server. You could also have a mobile phone that does a transaction every minute – so you get an early notification if there is a network problem

How to look at the data

You need to have a profile of expected/historical data. For example during the data the response time is 40 ms. Overnight it is 10 ms. If you start getting a response time of 20 ms overnight, this should be investigated; it is still below 40 ms, but it does not match the profile, and is a good indicator of a problem.

Different days have different profiles, perhaps Monday is always a peak day, and Christmas day is a quiet day.