Start the investigation
Let’s imagine you are a detective in the software police force. Your captain (= your boss) comes to your desk and asks you to take over the investigation in a crime that has just been reported. In our story, the crime is a misbehaviour in the software that you develop. As an experienced detective, you know what you have to do.
You start the investigation by collecting information. You go to the person who reported the crime. Maybe it was a tester, a user, or a colleague. You ask him what happened, where and when and how he observed it. Even if you received a written crime report (= a bug ticket), it would be helpful to talk to the reporter first-hand. You note everything that you hear, but you refuse to come to conclusions right now. You know that people are often unreliable sources of information.
This means that you need to find more witnesses. Maybe someone else in your company experienced the same behaviour. Or is there another user out there who reported something similar?
Then you start to collect hard evidence – log files and traces of the incident. Maybe there are also screenshots or program outputs that you can examine.
And you have to examine the crime scene itself – the source code.
Build your hypothesis
After you have collected all the information, you can start building a hypothesis. Based on the data and the code, you try to establish how the crime might have happened. Sometimes it is crystal clear how the crime went along, and you can go to the next step.
But sometimes, it is more difficult. You might have to reconstruct the crime in the lab. You try to reproduce it on your computer. You take your software and feed it with the input from your logs or the information provided by the witnesses.
When you build your hypothesis, make sure that you find the means, the motive and the opportunity.
The means in our analogy is a path in your code that, if executed, produced the observed output.
Then you need to find the motive. This is the correct state of the system that would be needed for this behaviour. And the opportunity, the input that leads the system in the given state to produce the observed output.
Make sure that you have all three of them! More than once, programmers have found a piece of code that could have produced the observed output. But in the given state or with the given input, it could not have happened this way. And they miss the real culprit.
If your hypothesis stands all these tests, you might want to present it to the jury.
In the trial you will have to convince the jury that you have found the culprit. The jury might consist of your colleagues, your boss or other experts on the topic. In easy cases, it might be enough to just convince your co-worker sitting next to you. For more complicated cases, you might even want to assign a defendant for the suspected code who will try to destroy your arguments.
You present your case to this jury. You show them the suspect. And you show your hypothesis – consisting of the means, the motive and the opportunity. The jury asks questions and tries to understand whether the crime could have happened this way.
If you can convince the jury that the suspected code was the malefactor, you sentence it to a rework to fix the bug. But if you cannot convince the jury, and they find a hole in your argumentation, then you will have to start your investigation all over again. This means work, but it will save innocent code from being reworked while the real culprit is still out there somewhere!
More lessons from detectives
As you see, the work of finding and fixing a bug is much like the work of a police detective. This also means that we can take other valuable lessons from police work, too!
One of the most famous fictional detectives, Sherlock Holmes, is often quoted with this sentence:
When you have eliminated the impossible, whatever remains, however improbable, must be the truth
In our software setting this means that if all other explanations that we can think of are out, then another impropable explanation must be true. It could be strange system behaviour like an error in the compiler or a problem with the garbage collector. Those problems are very unlikely, as this software is normally well tested, but these things do happen! Or maybe the problem finder misunderstood the behaviour and there was no bug at all.
Of course, this does not mean that you should accept that explanation without further investigation. But at least you know where to start looking.
Set up a watch
Often a detective has a suspect, but he cannot prove his guilt as there is not enough evidence. In this situation, he might put the suspect under observation until he is seen commiting another crime. Then there will be evidence.
You can then do the same for your software. If you cannot prove that a certain piece of code causes the failure, you can put it under observation. For instance, you can add additional log outputs. Or you can run the system with your debugger attached and put breakpoints in the suspected code. When the breakpoint is reached, you will have a snapshot of the current state of the software at the event of the crime.
This way, the next time the bug happens, you will be able to send the culprit to jail.
Set up a trap
You can also try to trick the suspect into revealing himself. You could do this by writing a test that excercises this piece of code under the condition of the crime. If the code is the culprit, it should show the same behaviour again.
Force a confession
If the guilt still cannot be proven, you might have to put the suspect under pressure until he confesses. You could do it by putting the code under pressure by bombarding it with a lot of random data or with many parallel requests. Eventually the condition for the original bug is reached, and the crime happens again.
In the movies you often see policemen cross-examining a suspect. Our software equivalent of this technique is the peer review. If you are stuck in your investigations, ask a colleagues to have a look at a suspicious piece of code. Or review it together, moving through the code step by step. Often your colleague might find a problem that you have overlooked.
The job of a detective hunting a criminal and the job of a software developer are often very similar. This means both professions can learn a lot from each other. The same is also true for other professions that need the same kind of precision and expert knowledge, like doctors or auditors. It pays off to have a sneak peek into other professions once in a while. You will always learn something valuable for your own job.