Non functional requirements: what does your support team need?

This blog post is part of a series on non functional requirements, and how the NFR take most of the effort.

The scenario

You want a third party to implement an application package to allow people to buy and sell widgets from their phone. Once the package has been developed, they will hand it over to you to sell, support, maintain and upgrade and you will be responsible for it,

At the back-end is a web server.

Requirements you have been given.

We expect this application package to be used by all the major banks in the world.
For the UK we expect the number of people who have an account to be about 10 million people
We expect about 1 million trades a day.

See start here for additional topics.

Support structure

You will need to set up a support structure to handle questions and problems from your customers. The level of support covers

Answering general use questions – such as database usage, or web server configuration. This team will usually be people who speak the customer’s language. This is known as Level one, or first level.
Answering harder question and problems. These guys may have access to the source. They know the diagnostic tools available with the product. This is known as level 2, or second level
Debugging code problems. This may be just one team – but with people spread around the world who can work on the problem 24 *7. They may have customer-like test systems so they can try to reproduce the customer’s problems. This is known as level 3 or third level.
Write fixes. Once a problem has been identified, you need people to write fixes. A fix can have side effects so you may need to stop the customer problem from happening, but not give the correct solution (until a new version of your product). This team is known as the change team, or fix team.
Test fixes. This team works with the team that writes fixes.
- the team reproduces the problem
- applies the fix and checks the problem is resolved
- packages the fix for customers
- look at their existing tests to see why there were no existing tests to capture it, then enhancing the test cases to cover the holes found.

What do the support team need?

Level 2 and level 3 need access to, and knowledge of the tools available to debug problems. This may include

Messages on a log
Probe-ids or First Failure Data Capture
Traces
Dumps of storage. If there are hexadecimal dumps, you may need tools for format the data into control blocks, and fields within the control blocks.
- Your tools may also have smarts to report unusual situations discovered, such “The value of field ErrorsDDetected in the control block CB1 for user MYUSERS is non zero”

The teams need guidance as to which traces to turn on when. For example turning on a system wide Java trace can double the CPU cost, half the throughput, and produce giga bytes of trace data.

You need commands to be able to turn on/off the trace without restarting your product.

If just one user gets a problem, can you turn on trace just for that user?

Can you turn on trace for a component, such as the database component?

Some customers will not allow trace to be enabled in production, because of the impact on CPU and throughput, and for data privacy issues. This means you need to provide debug facilities which are low cost, cannot be disabled and do not display any personal data.

What are the non functional requirements for the support teams?

Source for the product
- in a source control system
- with a build tool for all environments.
“Internal documentation” describing the components, and what each source module does.
Comments in the code for any non obvious code. For example describing what a function does.
Test system so any fixes can be validated in all environments.
- Supporting different levels (or just the lowest) of the operation system.
- Support of national language – so you have a web server configured for Japanese.
External facilities used by product (such as Java or database) are supported, current, and will be current for the next 12 months. You may have to pay for the support.
Internal defect database, so the support teams can search for previously seen, or similar problems.
All external components or dependencies are documented, and support plans agreed with the provider.

Non functional requirements: providing debug information.

This blog post is part of a series on non functional requirements, and how they take most of the effort.

The scenario

At the back-end is a web server.

Requirements you have been given.

We expect this application package to be used by all the major banks in the world.
For the UK we expect the number of people who have an account to be about 10 million people
We expect about 1 million trades a day.

See start here for additional topics.

Why plan for debug information?

The web server will be installed in customer environments, to which you do not have access. If there is are problems the customer will expect you to diagnose the problem with the information the product provides.

If you turn the trace on for some products it uses up much more CPU, and this may be unacceptable to a customer ( Quote from an upset customer “Are you saying I have to buy a bigger server just to collect the trace!?!”)

Even with a low transaction rate of 100 a second, running with a verbose trace will be very expensive, and there will be a lot of it.

Some products write an entry and exit trace. You could be smart, do not create an entry trace, and only provide an exit trace if there is an unexpected or interesting condition. In this case you need to write the input parameters, and error codes, and enough information to be able to identify the problem. This might include an account number, a database table name etc.

You might want to write trace in a binary format and format it when needed – this saves the CPU used to format the data, but it is more work to write code to format the data.

Every unexpected return code from a function should be traced. If you are doing a database call, and it returns no record found – this may be expected, and so can be ignored. If you were doing a database update, and you get record not found – this is a problem.

To help with the flow through some code, you could consider footprints in storage. You have macro increments a counter, and sets on a bit in the storage. For example

bit debugInfo[60];
...

debugInfo[1] = 1;
if ()
{ 
  debugInfo[2] = 1;
}
else
{
 debugInfo[3] = 1;
}

if ( writing_trace_entry)
{
output debuginfo...
}

and you have a clue as to the path taken through the code.

With every trace entry provide the information needed to identify which source module, and which point within the source module. This could be module name, and line number, or combination of a number for the source (myprog.c has number 73) and the n’th trace macro instance.

You might want to be able to trace an individual user, such as account number… This has a much smaller impact to tracing all systems.

You might want trace a component, such as database, TLS, authentication etc.

Non functional requirements: standards

This blog post is part of a series on non functional requirements, and how they take most of the effort.

The scenario

At the back-end is a web server.

Requirements you have been given.

We expect this application package to be used by all the major banks in the world.
For the UK we expect the number of people who have an account to be about 10 million people
We expect about 1 million trades a day.

See start here for additional topics.

What standards?

You may have industry standards you have to follow. You may have corporate standards you have to follow.

For a hardware device the standards could include the amount of radiation it generates (high frequency radio or microwaves) You need to specify how this will be tested.

You may have to worry about the amount of heat your solution will generate, and how much power it will consume.

Industry software standards may include

Keeping an audit trail of transactions for 10 years
Provide a right to forget, so you can permanently remove someone’s records from the system. This may be incompatible with 1) above. I didn’t say the standards are consistent.
- How do your implement right to forget – you have backups taken 9 years ago. Are you going to restore all these backups, delete the records, create another backup and delete the original? This will not work.
Report breaches of security.
- A hacker has accessed your system
- Some one in your organisation has looked at data without a need to know. For example someone looks up their child’s partner, to see if they are on the government/police computer. This means you need to provide the mechanism to protect and report violations, and an the ability to administer this authority.
Maintain a list of people who had authority to a resource – but did not use it.

How will you test these standards are being met?

Non functional requirements: start here

I’ve been involved with a university, providing guest lectures on various computer topics. In conversation with the students, they thought that implementing the functional requirements took most of the effort. I disagreed, saying that the functional requirements took a small percentage of the effort, the effort was spent in the Non Functional Requirements.

The scenario

At the back-end is a web server.

Requirements you have been given.

We expect this application package to be used by all the major banks in the world.
For the UK we expect the number of people who have an account to be about 10 million people
We expect about 1 million trades a day.

These series of posts cover my thoughts on some of the topics. I’ve tried to cover aspects which are not covered by generally available information.

First thoughts

I spent just a few minutes coming up with the list of Non Functional Requirements below.

How do you provide 24* 7 availability – you are allowed 1 minute of outage in a year! How do you do this, bearing in mind you need to reboot your machines once a month.
Backups… how often do you backup your database? – how long do you keep your backups for ? It might take 12 hours to backup your database (50 million records each of 10,000 bytes) How do you do this and provide 24*7 availability, and database consistency.
More important than backups – are you able to restore from a backup and recover your data?
What monitoring do you provide – so when you get a twitter storm saying this product is slow – what does the product provide? Is average response time good enough? (No)
Our customers often want messages in English, Japanese etc. How do you write your code to support this?
Your product has a database problem – you are doing 10,000 transactions a second – so in 100 seconds you get 1 million messages in your log! How do you avoid the flood of messages.
What protection do you want for your database – for example encryption of fields , who has access to fields?
- Can people who provide the disks where you run your database, read your databases and other files.
- If the disks/files are encrypted – who has access to the decryption keys?
- Can people who are responsible for backing up your data, see the data in the database?
What audits information do you need to provide – for example who changed what, when? How long to keep this data for? Do you keep access records of who looked at a record; think of the Police computers, should you as a policeman be able to access information about a high profile person, “just out of interest”.
What test suites do you provide – for example you create a fix, and you need to regression test the whole of your product. (In one product I worked on the test suite was about 5 times larger than the base code! It handled normal, error and edge cases).
What debug coding standards do you have? Think of the post office Horizon scandal. At the end of the day there is a difference in the amount of money in, and out of £10,100. How do you debug this and find the problem
You need sensible error messages (so people can google them) with appropriate, helpful return codes. What standards do you need to provide?
What encryption are you going to use on connections?
What headers are you going to provide in the HTML?
How do you stress test your product? One customer (a bank) ran their test workload at double the expected production workload. Customer said that if the system was down – it cost them $10,000,000 a minute in fines and compensation.
What levels of code, such as Java will be used.

Links to pages