Should stress testing be stressful? and a tale of two managers.

I had two different things about stress testing happen in the last week – one of life’s little coincidences. The first was a paper I rediscovered about CICS stress testing written for a prestigious IBM journal – about 30 years ago, which was not accepted for publication because it was “obvious” and didn’t have graphs and complex equations. The other event was talking to someone who was involved with testing military hardware.

Testing military hardware

John’s department in the army was to take new kit from the manufactures, use it, and give feedback. A couple of weeks later, retest it when “development” had “fixed” the problems. John found that the easy problems were usually fixed, but the hard problems were not.

John hatched a cunning plan. He got managers from the manufacturer to come down, have a good lunch, and see their vehicles in action. After the good lunch, he “invited” the VIPs to get into army fatigues and go for a ride, so they could experience their product first hand. After their ride at speed over the army vehicle assault course, seeing all the capabilities of their product first hand (eg firing guns) , they eventually got back to the lunch venue. The visitors got out the vehicle, looking very ill; some had been sick, some had bruises, some were temporarily deaf.

After they had time to recover John went through the list of outstanding problems, with words like “As you may have experienced…”. Afterwards, John’s commanding officer called him into his office and gave him a telling off for putting civilians lives at risk etc… then said “well done – good work – don’t do it again – dismiss”. Overall this was a success as the next set of vehicles to test had some of the major problems fixed.

The moral of this story is you need to test in a realistic situation with all of the problems that your end user may encounter.

The IBM non article.

The IBM article I recently discovered in a box, was written in the early days of stress testing. I remember doing the CICS stress test 40 years ago. 10 of us sat in a room, each with a terminal and typed in random CICS commands for 1 hour. If CICS didn’t crash, this was a successful test. A few years later testing had moved on. The testing had simulated users running scripts. There was 1000 “end user” running complex applications. From this base line it was significantly enhanced. This enhanced testing was so successful the team were asked to write it up for the prestigious IBM Journal of Research and Development. I reviewed the document, and thought it was great. Rather than talk about CICS, terminal control, SIP etc, they approached system testing as testing a hypothetical “IBM car”, which every one could understand. While some testers took the car on the road to visit their parents, the stress testers had a plan. Go off road and see what happens.

  • “Bang” a tyre caught on a sharp rock and caused a puncture. Ahh there is no spare tyre. Defect.
  • Once that was fixed, do it again, “Bang”. This time there is a spare wheel, but no jack to allow the wheel to be changed. Defect
  • Whoops they bumped into a gate. Take the car back to the garage, and get them to repair it. No matching paint? Defect.
  • Now that all works. Try driving it in reverse across a field – hmm poor visibility. Defect
  • No problems found? – do it even faster.
  • Now fill the car with (heavy) bags of compost (hmmm, there is a nasty lip on the edge of the boot – defect) and the car is hard to drive round corners at speed – defect.
  • Get your partner to drive to your parents – what do you mean, she cannot reach the pedals?

As I said – it was a great article, but not what the journal wanted.

Thoughts about stress testing.

CICS has various pools of resources, for example the number of threads it can support. If this is tunable, you may set the value at 1000, and try running with 1000 threads, and more than 1000 threads to make sure the code works at the limit. You may find you run out of CPU before you can drive the workload at these limits.

Testing with 1000 concurrent threads is hard. It is better to make the pool smaller, say 10 threads, and test with 10 or more threads. To simulate peak workloads you can either increase the number of threads, or make the pool size smaller, so while CICS is running decrease the size of the pool down from 1000 to 10. Let CICS sort itself out, and then put the limit back to 1000 – and repeat.

In the car analogy, drive every where in first gear – this will test the high revs without going very fast! It depends on how you look at the problem.

A tale of two managers

There is the story of a large development team. There was a senior development manager. Under him were two managers, and each of these had four managers who managed the day to day work.

One second line manager said he liked graphs with lots of green “Test cases successful”. “Problem” test cases were reviewed, and if they were considered unrealistic, or not likely to happen, they were quietly removed from the test plan. He worked hard to get more green on his charts. He was very proud of the green charts outside of his office.

The other manager said “Green charts mean you are not testing hard enough. I want to know where it breaks, why it breaks, the impact on the overall system”. When his charts showing testcases successful had 75% green and 20% red (and 5% investigate), he called the test team in and said “here is a bigger box” get testing. The tests successful went down to 50% – and he was very pleased. Eventually they got the successful tests up to 95% green.

The product was shipped to customers, and the “red” manager became the change team manager, in charge of customer support. It was interesting that although both areas had customer problems, the “green” code had more problems. Some of the “interesting” problems found in the green code, had tests, which had quietly been dropped.

Because the “red” code had experienced tougher problems during testing, development had added better diagnostics, so difficult customer problems were “easier” to debug (still hard – just not so hard).

The moral of the story is if you are not breaking the product – you are not pushing it hard enough. You can have as many green lines on charts as you like – they may not reflect reality.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s