When you debug your code to find a problem in your program, you observe the behavior of the program and make assumptions about the nature of the problem. But the behavior you observe can lead you to the wrong conclusions. Based on the wrong conclusions, you would design a fix that might look good at first glance, but would fix nothing or even make things worse.
In this article, I will give an example of a bug hunt that went down the wrong path. Read on to see how even seemingly simple systems can be surprisingly difficult to get right.
The Radiator That Would Not Cool Down
Some time ago we moved into a new apartment. It was winter, so we were happy that all the radiators were working and keeping our home warm and cozy. Then spring came and it was time to turn off the heat. But one of the radiators would not cool down. It would stay warm and heat the room to a point where it was uncomfortable.
It was at this point that I began my hunt for the fault in the radiator.
The temperature in our radiator is controlled by a simple mechanical thermostat. The thermostat senses the air temperature and compares it to the temperature setting. It has a small pin that protrudes from its axis, and with that pin it can push a corresponding bolt on the valve inside the radiator. And by putting more or less pressure on that bolt, it regulates the flow of hot water into the radiator, which in turn controls how hot the radiator gets.
My First Fix
The temperature control was not working properly, so my conclusion was quick and easy: Change the thermostat, fix the problem.
I purchased a new thermostat, this time an electronic thermostat. With this thermostat, I was hoping to not only fix the problem, but also get some extra features like programmable times to heat the room. Just extra convenience.
I had already used this new type of thermostat in other rooms, so I thought I knew how to handle and install it.
Thus prepared, with an analysis of the problem and a fix in mind, I went ahead and replaced the thermostat. I was ready to enjoy a cool room in the spring and summer and the convenience of timed heating in the winter.
Guess what happened?
The heater got even hotter than before!
Wow! So somehow my bug fix, which I was pretty sure would fix the problem, actually made it worse.
What could have gone wrong? I quickly did another analysis. The system is not that big, just the valve and the thermostat. I assumed that my new thermostat would work. After all, I had installed the same model a few times in other rooms and had no problems. I also replaced the new thermostat with one of the thermostats I knew to work. Still the same problem.
The Wrong Path
This led me to the only other possible conclusion I could think of at the moment. If the system consists of a valve and a thermostat, and the thermostat is working, then the valve is the problem. I tried to fix it myself. People on the internet write about problems with valves and how they can sometimes get stuck. I tried moving the bolt with my pliers to loosen it. But nothing changed. What else could I do? At this point I was convinced that the valve would have to be completely replaced. This is not an easy operation and requires some experience. So I called a plumber to replace the valve.
The plumber came, looked at my installation, checked the valve and after ten minutes showed me the problem. It was not the valve. It was the thermostat. But how?
The solution was that the new electronic thermostat had an additional part compared to the mechanical thermostat. The thermostat is not mounted directly on the valve, but there is an adapter between the valve and the thermostat. This adapter consists of a rotating ring that, when rotated, pushes the bolt in the valve. The electronic thermostat can turn this ring with a small gear driven by a stepper motor. And this ring also serves as an emergency handle. If the thermostat is defective or has a dead battery, you can still turn it by hand to turn off the radiator.
This little wheel is designed to turn about 360° between the minimum and maximum extension of the pin. And for whatever reason, the wheel on this radiator was turned more than its limits. It was working out of bounds, and so when the thermostat turned the wheel, the pin would never get extended enough to close the valve completely and stop the flow of water into the radiator.
Knowing this, the fix was simple: just turn the wheel back into the proper range and everything worked fine. The radiator stayed cool when it was supposed to, and got warm when it needed to.
The whole affair cost me a few bucks to pay the plumber, and it left me with wounded pride. But I also learned a lesson: When fixing a bug, the actual bug fix can change the behavior of the system in unexpected ways. In this case, I should have checked my fix (the new thermostat) first, instead of assuming that the underlying system (the valve) was broken.
The conclusion is that when we make a fix, we can never assume that the fix is correct, we must always examine all the changes we have made to the system and understand all the side effects. This is the only way to build safe and reliable systems. In this case, the only thing that got hurt was my pride. But I have to keep this lesson in mind in my daily work as an engineer so that no user of our products gets hurt.