There must be some mistake

Whether in spreadsheets, databases, or elsewhere, data and derived calculations can readily let you down. What can you do to detect and correct errors?

It is all too easy to put blind faith in numbers. The more precise they appear, perhaps with several digits after the decimal point, the more accurate they must be. But if the data have been entered incorrectly, or the formulas used to calculate results from them are flawed, that apparent accuracy only blinds you from spotting their error.

Take a careful look at a recent spreadsheet or database that you have been using. See whether it performs any checks to ensure that, if you do inadvertently put garbage in, perhaps a body weight in pounds instead of kilograms, or missing the decimal point out of a monetary value, your error is highlighted. If your sole check is the operator’s eyeball, then your efforts will all too often be rewarded with garbage results.

Even if you are one of the few who meticulously checks the range of input data, have you ascertained how your calculations respond to values at the limits of the possible? If an entry is allowed to be zero or negative, are all your formulas capable of handling such values? How do you test whether your work delivers the results that you think it should? When you last upgraded its host spreadsheet or database application, did you check that those calculations were unaffected?

Data entry

If you want to be able to put any trust into the numbers that come out at the other end, ensure that every single value entered is accurate. A powerful tool for doing this is to check that each lies within its expected range: if you are entering adult human body weights in kilograms, then you should not accept values below, say, 20, nor above 560, the current world record.

However you should also warn if values are within that range but appear improbable, perhaps less than 50 or more than 200 kg. Databases are generally better-equipped to handle range checking, as they can be set to refuse entries outside a given range, and you can also set up warnings for extreme values within those. Controls are not as flexible in spreadsheets, but you must still warn the user.

If you only enter a small number of values at any occasion, then it should not be hard to implement practices for accurate entry, such as ensuring that you do not transpose digits, turning 153 into 135, perhaps.

Where substantial amounts of data have to be entered, you need to employ more rigour. One proven technique is to get two different operators to key the same data in separately, then to compare their entries: where they concur, you accept the entered data as being correct; where they differ, you enter the value a third time. This can be invaluable when entering large amounts of handwritten data, for example from paper questionnaires.

Entered values and formulas should be formally separated too, so that there is no risk of errors in calculations changing the raw data. If you are working with a large dataset, you would do well to save or export the values themselves once they have been thoroughly checked, so that if anything goes wrong you do not have to re-enter your data or check them again.

Calculations

Every formula, calculation, or manipulation applied to those data then needs to be checked assiduously. Within each formula involving units of any kind, ensure that it uses consistent units, and those in which your data are delivered. Conversions between Imperial and metric measure are a common source of error, having caught NASA out on some spectacular occasions.

Even staying metric throughout, you must get the units correct, as it is all too easy to be out by a power of ten. Examine all your formulas, rewrite them with the values in a different order, for example turning (X*Y)/Z into (X/Z)*Y. Another important crosschecking technique is to arrive at totals (or similar) in a different way: as well as totalling each column and adding those column totals, total each row and check that the sum of the row totals is the same as that of the column totals.

Formulas and calculations are inherently vulnerable to many different forms of error, so they must never be used carelessly. Copying and pasting formulas from one spreadsheet cell to another is quick and convenient, but if you do not check that the pasted formula is exactly correct, it becomes dangerous. If you can, put all formulas into protected cells in a spreadsheet, so that it is harder to blunder in and mess them up.

If available, use reputable and well-documented software instead of a general purpose spreadsheet. This is particularly true for statistics: whilst spreadsheets can be a quick and easy way of generating averages and simple statistics, it is extremely dangerous to write sophisticated statistical formulas into a spreadsheet, and sooner or later that will catch you out. Some formulas work well so long as the values remain within certain limits, but become grossly inaccurate if those limits are exceeded.

Sometimes spreadsheet implementations of certain mathematical functions can become error-ridden if used on very low or high values, and can introduce numeric errors. Reputable statistical and mathematical systems usually document such issues in their reference manuals, and may use several different methods of calculating functions to improve reliability.

Cumulative errors

Whilst maths using only whole numbers should always remain accurate, wherever you use floating point values there is inherent approximation and thus the capacity to accumulate error. It is not unusual in spreadsheeting to come across slight error, such as an answer being given as 5.99998 instead of 6.0. This normally results from the difficulties in performing precise maths in binary, and having to convert from binary representations of floating point numbers back to decimal ones.

However these small errors can accumulate until they become substantial, if the end result relies on many formula steps, each of which suffers such small inaccuracies. You may be able to reduce this by assembling the stages into a smaller number of formulas, or by avoiding repeated methods of estimation such as iteration.

Professional quality maths and statistics software is usually very careful to minimise this problem, so what you are struggling with in Excel may work much better in Mathematica, Maple, or R.

When numerical problems occur in calculation, you are likely to be returned an error value, such as a ‘NaN’ (not a number), which might represent the outcome of trying to divide by zero, for instance. Rather than leaving such errors to propagate through your calculations to the output, consider trapping those problems and pointing out where the error has been generated, so that it can be fixed. Test this by deliberately putting in erroneous values and seeing what happens, and how effective your traps prove to be.

Testing and documentation

Once you have constructed your formulas and built confidence in their performance, assemble a test suite of examples to validate the formulas now and in the future. This should include data representing common values, and those testing the limits of use, and should be compared against longhand calculations. Document your methods, the validation suite, and ensure that testing is repeated every time that the host software is updated.

Techniques: Range Checking in Spreadsheets

Good databases allow you to limit entered values to within certain ranges, but spreadsheets can normally only warn you visually if a cell does not meet predetermined criteria. Although this makes it easier for erroneous data to slip into a spreadsheet, it is still much better than not having range checking at all.

In Microsoft Excel, this is known as Conditional Formatting, invoked from within the Format menu, or from the toolbar. For example, if you wanted to ensure that body weight was above 50 and below 200 kg, select the cell(s) to be checked, bring up the Manage Rules dialog using that menu command, and click the + button to add a new rule. Using the Classic style, set it to Format only cells that contain a Cell value not between 50 and 200, then click OK.

In Excel, select the range of cells and configure their Conditional Formatting.

Excel offers other valuable checks that can be performed in its bewilderingly extensive rules, such as for duplicates among unique identity numbers, and the use of limits set in other spreadsheet cells.

Excel's Conditional Formatting rules are simple to set up. — Excel’s Conditional Formatting rules are simple to set up.

Apple’s more recent Numbers has a similar feature of the same name, but rather more easily accessed. Select one or more cells or cell ranges, and click on the Cells Inspector tool in the Inspector. Click on the Show Rules… button in Conditional Format to bring up the Conditional Format rules view. In the ‘Choose a rule’ pop-up menu select the ‘Not between’ item, and enter your limits of 50 and 200 respectively. Click the Edit button to set the highlight style for cells that fall outside that range, then the Done button to apply it.

Numbers also offers Conditional Formatting as a means of range checking.

Numbers provides strong support for simple date checking, which is valuable given the multiple errors that can occur when entering dates. Although Excel has good date support it may take some learning before you can readily trap date entry errors.

Techniques: Spreadsheet Auditing

Microsoft Excel has long had a suite of tools to audit formulas and check for other related errors. Although not as extensive or sophisticated as third-party tools currently available only for Windows versions of Excel, they are still a great help, and should not be ignored. Apple’s Numbers still lacks equivalent tools, although a useful way of checking formulas is to reveal them using the Formula List tool.

Before trying to use Excel’s auditing tools, if necessary unprotect your spreadsheet via the Protection command in the Tools menu, and select the Formulas item in the Ribbon, which puts the Audit Formulas tools in the toolbar. Select the cell that you wish to audit, and you can show all those cells on which that is dependent, and all those that depend on it.

The Trace Dependents tool in Audit Features shows you which cells depend on which.

If you are auditing a cell containing a formula, that will clearly depend on the cells used to compose the formula, but will have no meaning if the cell contains a value in its own right.

Cells containing errors or NaN values are normally accompanied by an icon of a black exclamation mark on a yellow background. Hover the pointer over this and the tooltip will explain the nature of the error, whilst a popup menu appears to the right of the icon. This provides additional help in dealing with the error, including a command to Trace Error, which functions similarly to the auditing tool in revealing cells on which it is dependent. It also gives access to the Error Checking options in Excel’s Preferences.

Click on the Warning icon to see a fuller explanation of the error being shown.

There are many websites that offer help with Excel formulas and use. Among the most comprehensive is Contextures, where there are hundreds of valuable tips. For those involved in financial modelling using spreadsheets, papers detailing best practice are here.

Note that the screenshots are taken using Excel 2011. Current test versions of Excel 2016 appear identical in these respects.

Updated from the original, which was first published in MacUser volume 29 issue 07, 2013.

Share this:

Related