Test Driven Development - A practical Example - Page 4 of 12

Step Four

The addition of two integers is now finished. Let’s look at the rest of the list:

Addition of decimals
Addition of negative numbers
Addition of multiple numbers
Subtraction
Multiplication
Division
Bracketed Expressions

Now that our little program supports addition, we are going to implement subtraction.

Integration Test

In the last step, we have created integration tests for CalculatorApplication. From now on, we use these tests as our starting point for the development of the next feature. Whenever we add new functionality, we write some integration tests first. These test will validate that the classes that we develop test driven collaborate well together.

For our feature of subtraction, we design a new integration test. It has multiple scenarios for the subtraction of two integer numbers.

def test_subtraction_of_two_numbers(self):
    output = self.app.calculate("3 - 5")
    self.assertEqual(output, "-2")

    output = self.app.calculate("120- 80")
    self.assertEqual(output, "40")

    output = self.app.calculate("0 -0")
    self.assertEqual(output, "0")

Validating subtraction equations

When we run the test, the error message tells us that the output is “Invalid Expression” instead of the expected number. This output is produced when ExpressionValidator thinks this expression is invalid. This shows that our first step towards a green integration test is to extend ExpressionValidator so that it accepts subtraction.

Let’s add a valid example for subtraction to the test case test_valid_expression.

def test_valid_expression(self):
    isValid = self.validator.validate(['3', '+', '5'])
    self.assertTrue(isValid)

    isValid = self.validator.validate(['120', '+', '8'])
    self.assertTrue(isValid)

    isValid = self.validator.validate(['0', '+', '0'])
    self.assertTrue(isValid)

    isValid = self.validator.validate(['3', '-', '5'])
    self.assertTrue(isValid)

The test fails, which proves that the error is caused by the ExpressionValidator. Of course it does, because the method validate currently only accepts a ‘+’ as an operator. It has to allow a ‘-‘, too.

def validate(self, tokens):
    if(len(tokens) == 0):
        return True

    if(len(tokens) == 3):
        if(tokens[0].isdigit() and tokens[2].isdigit() and (tokens[1] == '+') or tokens[1] == '-'):
            return True

    return False

Implementing Subtraction

We run the test again. The error in test_valid_expression is fixed now, but the integration test test_subtraction_of_two_numbers still fails. The error message looks different now:

FAIL: test_subtraction_of_two_numbers (testIntegrationCalculatorApplication.TestIntegrationCalculatorApplication)
----------------------------------------------------------------------
Traceback (most recent call last):
  File ".../testIntegrationCalculatorApplication.py", line 43, in test_subtraction_of_two_numbers
    self.assertEqual(output, "-2")
AssertionError: '0' != '-2'
- 0
+ -2

The output of the operation is “0”. Our test, however, expects “-2”. Reviewing the code of Calculator, we notice that it returns 0 whenever it does not support an operation. We need to add subtraction as a supported operation. We start with this unit test:

def test_subtraction_of_two_numbers(self):
    result = self.calculator.calculate(['3', '-', '5'])
    self.assertEqual(result, -2)

    result = self.calculator.calculate(['120', '-', '80'])
    self.assertEqual(result, 40)

The following implementation fixes the unit test.

class Calculator:
    def calculate(self, tokens):
        if(len(tokens) == 3):
            operator = tokens[1]
            if(tokens[0].isdigit() and tokens[2].isdigit()):
                if operator == '+':
                    sum = int(tokens[0]) + int(tokens[2])
                    return sum
                elif operator == '-':
                    sub = int(tokens[0]) - int(tokens[2])
                    return sub
        return 0

Now not only the unit test but also the integration test are green. This means that we have successfully implenented subtraction.

Refactoring

All tests are green now. But we are not finished yet. Do you spot the duplicated code?

Both ExpressionValidator and Calculator perform the same operations on the tokens: Checking if they are numbers or operators and if they are valid operators. If we leave the code like this, it will make our lives harder later on, because we will have to change two different classes.

How can we remove the duplication? An important concept is missing in our code. Right now, all tokens are strings, so every function that handles the tokens has to perform string operations. This is a rather clumsy solution. It would be much better if we had decicated datatypes for Tokens. Let’s create these types.

Numbers and Operators

There are two kinds of tokens in our code – numbers and operators. Thus we need two new classes to encapsulate these concepts. We call these classes Number and Operator accordingly.

How should their interfaces look like? We can figure it out by changing the implementations of validate and calculate as if we already had these classes:

class ExpressionValidator:
    def validate(self, tokens):
        if(len(tokens) == 0):
            return True

        if(len(tokens) == 3):
            if(type(tokens[0]) is Number and type(tokens[2]) is Number and type(tokens[1]) is Operator):
                return True

        return False

class Calculator:
    def calculate(self, tokens):
        if(len(tokens) == 3):
            operator = tokens[1]
            if operator == Operator.Addition:
                sum = tokens[0].get_value() + tokens[2].get_value()
                return sum
            if operator == Operator.Subtraction:
                sub = tokens[0].get_value() - tokens[2].get_value()
                return sub
        return 0

In these new implementations, both validate and calculate look much cleaner because all the string operations are gone and no duplication exists between the two methods. This new code shows how the interfaces of Number and Operatorcould look like:

Number needs a method get_value, that returns the value of the number stored inside the class
Operator should be an enumeration type, which allows to distinguish between different kinds of operations

To make the new versions of validate and calculate work, we need to create Number and Operator. We develop them test driven, too. We know how their interfaces need to look like, so we can write unit test to validate their behaviour.

We start with the test for Number:

import unittest

from Number import Number

class TestNumber(unittest.TestCase):
    def test_create_number(self):
        number = Number(42)
        self.assertEqual(number.get_value(), 42)

The implementation of Number is rather short:

class Number():
    def __init__(self, value):
        self.value = value

    def get_value(self):
        return self.value

We continue with creating Operator. It is an Enum class that does not need any additional functionality. This means we don’t need to create any test cases for it. We implement it right away:

from enum import Enum

class Operator(Enum):
    Addition = 1
    Subtraction = 2
    Multiplication = 3
    Division = 4

With the two new classes in place, ExpressionValidator and Calculator are complete again. However, their test cases still fail, because they pass strings into the methods instead of Tokens. Let’s fix that:

def test_valid_expression(self):
    isValid = self.validator.validate([Number(3), Operator.Addition, Number(5)])
    self.assertTrue(isValid)

    isValid = self.validator.validate([Number(120), Operator.Addition, Number(8)])
    self.assertTrue(isValid)

    ...

def test_addition_of_two_numbers(self):
    result = self.calculator.calculate([Number(3), Operator.Addition, Number(5)])
    self.assertEqual(result, 8)

    result = self.calculator.calculate([Number(120), Operator.Addition, Number(80)])
    self.assertEqual(result, 200)

Now our unit tests are green again. However, the integration tests are still red. But why?

Adapting Tokenizer

The reason for the failing integration test is that Tokenizer does not yet produce the new types. It still produces strings. So Tokenizer has to be adapted, too. We start with the unit tests:

class TestTokenizer(unittest.TestCase):
    def setUp(self):
        self.tokenizer = Tokenizer()

    def test_tokenize_empty_string(self):
        output = self.tokenizer.tokenize("")
        self.assertTrue(type(output) is list)
        self.assertEqual(len(output), 0)

    def test_tokenize_string(self):
        output = self.tokenizer.tokenize("3 + 5")
        self.assertEqual(output, [Number(3), Operator.Addition, Number(5)])

        output = self.tokenizer.tokenize("5 * -")
        self.assertEqual(output, [Number(5), Operator.Multiplication, Operator.Subtraction])

        output = self.tokenizer.tokenize("120+ 8")
        self.assertEqual(output, [Number(120), Operator.Addition, Number(8)])

This is the new implementation of Tokenizer.

class Tokenizer:
    def tokenize(self, string):
        elements = re.findall("[0-9]+|[+*/\-]", string)
        tokens = list(map(self.create_token, elements))
        return tokens

    def create_token(self, element):
        if(element.isdigit()):
            return Number(int(element))
        elif(element == '+'):
            return Operator.Addition
        elif(element == '-'):
            return Operator.Subtraction
        elif(element == '*'):
            return Operator.Multiplication
        elif(element == '/'):
            return Operator.Division
        else:
            return None

This new implementation makes the integration test pass. But the unit test still fails. This tells us that the functionality of the classes must be correct – otherwise the integration test could not pass. So there must be some problem in the implementation of the unit test.

It turns out that the comparison of the result list of tokenize fails because Python by default checks for identity, not for value equality. The instances in the list returned by tokenize and the ones that are created by the test have the same value, but different identity. This is why the test fails.

We have to tell Python to check for value equality in our case. To achieve this, we have to override the equality check of Number. Then the test will check that the instances of Number created during the test have the same value, not the same identity. The implementation is simple:

class Number():
    def __eq__(self, other):
        return self.value == other.value

Now the test passes.

Failure during Tokenization

There is another loose end hidden in create_token. What should the method return in case it cannot create a token? Right now, it returns None. This might cause a failure when passed to ExpressionValidator or Calculator, because they don’t expect such a value. It would be better if tokenize threw an exception in this case, so we can stop calculation at this point. We add a test for this case:

def test_tokenize_invalid_tokens(self):
    with(self.assertRaises(SyntaxError)):
        self.tokenizer.tokenize("3a + 5")

This test validates that an exception of type SyntaxError is raised. Let’s add that exception to create_token.

class Tokenizer:
    def tokenize(self, string):
        elements = re.findall("[0-9]+|[+*/\-]", string)
        tokens = list(map(self.create_token, elements))
        return tokens

    def create_token(self, element):
        if(element.isdigit()):
            return Number(int(element))
        elif(element == '+'):
            return Operator.Addition
        elif(element == '-'):
            return Operator.Subtraction
        elif(element == '*'):
            return Operator.Multiplication
        elif(element == '/'):
            return Operator.Division
        else:
            raise SyntaxError("Invalid Token")

When we execute the test again, the test is still red!

FAIL: test_tokenize_invalid_tokens (testTokenizer.TestTokenizer)
----------------------------------------------------------------------
Traceback (most recent call last):
  File ".../testTokenizer.py", line 27, in test_tokenize_invalid_tokens
    self.tokenizer.tokenize("3a + 5")
AssertionError: SyntaxError not raised

It appears that the regular expression that we use to extract all the tokens just dismisses the invalid tokens. This means it can never find an invalid token and would just accept an invalid string. It was good that we added this latest test, because there was a bug hidden in this place all the time!

To fix it, the code has to check if there are any elements that do not match the regular expression. If there are any, then the exception is thrown:

class Tokenizer:
    def tokenize(self, string):
        expr = re.compile(r"\d+|[+*/\-]")

        if self.check_for_unmatched_elements(expr, string):
            raise SyntaxError("Invalid Token Found")

        elements = re.findall(expr, string)
        tokens = list(map(self.create_token, elements))
        return tokens

    def check_for_unmatched_elements(self, expr, string):
        rest = re.split(expr, string)
        rest_string = ''.join(rest)
        is_rest_string_not_empty = rest_string.strip()
        return is_rest_string_not_empty

    ...

In the case that tokenize throws an exception, CalculatorApplication should also show the error message “Invalid Expression”.

def test_exception_during_tokenization(self):
    self.tokenizer.tokenize.side_effect = SyntaxError("Invalid Token")
    result = self.app.calculate("3 + 8")
    self.assertEqual(result, "Invalid expression")

To make this test pass, CalculatorApplication has to catch the exception from Tokenizer and return the correct error message.

def calculate(self, expression):
    try:
        tokens = self.tokenizer.tokenize(expression)
    except:
        return "Invalid expression"

    if(self.expression_validator.validate(tokens) == False):
       return "Invalid expression"

    result = self.calculator.calculate(tokens)
    return str(result)

Summary of Step Four

In this step we introduced subtraction of two numbers. We started by adding an integration test for this use case. ExpressionValidator and Calculator needed to be adapted, which we also did test driven.

After all tests turned green, we started to get rid of the duplicated code for the check of the tokens. We did so by introducing the concepts of Numbers and Operators. We could do it safely because the integration test told us that at the end of our refactoring, everything worked as before.

We also had to adapt Tokenizer, where we discovered a use case that was not yet covered. We wrote a new unit test that shattered our first attempt to make it right. Without the unit test, the bug would still be there. Then we found the correct way to check for invalid tokens.

Now we can cross off subtraction and feel good about ourselves. At this point, we created two basic operations and laid the foundation for implementing the others easily.

Tags TDD