Thinking Like a Computer - Data Types

Teaching Your Name to the Computer

How can we make a computer understand the concept of a color or a name? Actually, we cannot teach the computer anything. The computer cannot understand a concept like a human being does.

Instead, it is the job of a programmer to find a representation of a real world concept that can be stored in binary form. The programmer takes a set of bits and assign these bits a meaning. This meaning is also called a data type.

Let’s start with an example. Say we have one byte of memory. What can we do with it? We could say it is a number. As we learned in the article about how a computer works, we can store 2⁸ = 256 different values in it. 00000000b is equal to 0, 00000001b is equal to 1, 10001000b is equal to 136 and so on. The maximum value is 11111111b = 255. This kind of number is also known as an integer. A piece of memory that has a data type is also called a variable.

Once we decide that this byte contains an integer, we can do all the things that we can do with a number. We can do addition, subtraction, division, multiplication and a multitude of other operations. These are operations that are supported by the computer’s ALU.

But we could use this byte also in a different way. Let’s say we want to store a color in this byte. The computer does not know about colors. So we need to think of a representation of a color. Let’s say we need to distinguish only between 50 different colors. Then we can fit this representation in one byte, because 50 is less than 256. And we decide to give each color a unique binary value. We can choose freely from all the 256 possible values, but the easiest way is to start with 0 and then count up. Let’s say black has the value 0, dark grey is 1, light grey 2, cornflower blue is 3 and so on. Doing this, we have created an enumaration or short enum.

Now the byte contains a color. Can we perform the same operations on it that we could for an integer? Yes, we can! The computer can add two colors for us. But does it make sense? Certainly not! What would be the result of adding dark grey and light grey? In our representation it would be 3, which is cornflower blue. It doesn’t make sense, does it?

We have to make sure that our program uses the memory in a way that is consistent with its data types. A memory location that contains an integer can be added to and subtracted from. But this is not allowed for a color, because this would just produce garbage. So beware what you do with your data.

In practice, many programming languages will prevent you from performing an operation on a variable that is not allowed for the data type of this variable. This concept is called type safety. Type safety is checked by the compiler, a special program that translates your commands in a programming language into machine language. If you violate the type safety rules, it will show you an error message. Without this safety check, your computer would execute every commmand you give him, which might just produce garbage.

Basic Data Types

There are some data types that are so basic that almost each programming language supports them out of the box. These are also the basis for creating more advanced data types.

We have already met the integer above. The integer has a brother, that is called the signed integer. It means that it can also store negative numbers. But if the number of bits stays the same, the maximum positive number will only be half as big, because we need to store the negative numbers as well. So in a signed integer of 8 bits, we can store all numbers from -128 to +127, which is 256 different numbers in total.

How can we store negative numbers in binary data? An easy solution would be to use one bit to store the sign. But it turns out this is not the best way to do it. Instead, computers use the Two’s Complement. In this way, all commands for addition and subtraction work the same way for positive and negative numbers.

Figure 1. What You Could Do With a Byte

If you want to work with a decimal number, there are also several ways of doing it. The most common way is to use a so called floating point number. These numbers are stored in such a way that with a fixed number of bits, you can have very high numbers, but with a limited precision in the decimals, or small numbers but with a high number of decimals. Modern computers have special commands to work with these numbers that are executed by hardware floating point units. This makes working with these numbers almost as fast as using integers.

Another very basic type is the Boolean. It only has the truth values true and false. They are most often used to control the flow of the program, e.g. in if-conditions. While such a variable could actually be stored in a single bit, it will often use a whole byte or even more, because computers can access the memory only in whole bytes.

Strings

How can we store a character in memory? We could do the same that we did for the colors above: We assign each character a unique binary value, creating an enum. We could do this in many different ways. Fortunately, we don’t have to create our own enum. We can use the widely used ASCII code. This is a table that contains a unique value for each character from the latin alphabet in lower and upper case, for each number, some extra symbols and some control characters.

Using a standardized character table in each program ensures that text data written by one program can be opened in another text editor.

The ASCII table uses 8 bits for a single character. But there are many more languages with many more characters. 8 bits are not enough to store all of them. This is why another standard was created which is called Unicode. With Unicode, you can represent each character of every known language in binary form – and even emojis have their own Unicode representation.

Normally, you don’t want to work with just one character, but with many of them, to store words or sentences. Then you put them right after another into memory, e.g. the first character in memory location 60, the next in 61 and so on. They sit in memory like a string of pearls. This is why the data type for text with more than one character is called a string.

To use a string, your program only has to know the address of the first character. Than it will treat all the following bytes also as characters of this string. But where to stop? When the program meets a special character, normally a ‘0’. The zero will mark the end of the string. This is why it is also called a null-terminated string.

Let’s see how the simple string “Hello” could be stored in the memory:

Figure 2. A String in Memory

Please Mind the Size

Every data type has a fixed number of bits. This limits the number of different values that you can store in a variable of this data type. You always have to take this in mind when you perform operations on this data.

As we have learned above, an integer with 8 bit can store 256 different values. Imagine you have an 8 bit integer which currently has the value 200. What if you add the value 56 to it? You might expect to have the value 256 afterwards. But instead, you will get the value 0! Why is that?

The binary representation of the value 256 is 100000000b. These are 9 bits. But your variable only has 8 bits. Your computer will throw away the highest bit, leaving you with only 0s. This situation is called an overflow. (This can also happen the other way around: When the result is a negative number that does not fit into the variable, you will have an underflow).

As this is normally not what you want, you always have to make sure that the variables you use have enough bits to store the result. In this example, you could use a 16 bit integer, which has enough space to store 2¹⁶ = 65536 different values. By adding or subtracting two 8 bit integers, you will never reach this limit and you are on the safe side.

So make sure to always use the correct size of your variables for your program.

Advanced Data Types

With the basic data types, you can create more complex data types yourself. With lists and structures, you can aggregate many values of the same or even different data types.

Lists

In many situations, you need more than one variable of the same type. For instance, you might want to store the highest temperature of each day of the year. You don’t want to create 365 different variables for this.

Instead, you can create a list that has 365 entries. Each entry is the temperature in degrees Fahrenheit of a specific day. The type of each entry could be a signed integer of 8 bit. Is this safe? The maximum temperature that could be stored is 127°F or roughly 52°C. The minimum temperature would be -128°F or -89°C. If we look at the temperature records on the surface of the Earth, we could find values that exceed this range. So if you want to use this program worldwide, than you would need 16 Bit instead of 8 Bit per temperature. But if you use this program in a mild climate only, 8 Bit should be enough.

A list with a known and fixed size of elements of the same size is also called an array. You can access a value inside this array with an index. The temperature of January 1st has the index 0, January 2nd has the index 1 and December 31st has the index 364. As you can see, array indices always start with 0.

The following image shows what happened if we took the memory where we stored our string “Hello” and interpreted it as a list of temperatures.

Figure 3. The Memory filled with the String Interpreted as a List of Temperatures

This also produces a somewhat plausible result. But the drop from 111°F to 0°F is a little strange. So it is important to always remember which kind of information your memory contains.

Arrays can also have multiple dimensions. In a program that stores the temperature data for 10 years, the first dimension would be the year and the second the day of the year.

Sometimes you need more flexible lists, e.g. when the size needs to shrink or grow during the runtime of the program. Then you can use a more complex data type such as linked lists or a queue. These types are more flexible, but are also more complex and thus slow down the program execution.

In real programs, you don’t have to develop your own list type. You can use many different implementations from your programming language’s library.

But it is important that you understand the different concepts so you can choose the right list type for your application.

Structures

If you need to keep variables of different types together, you can create a structure. This structure can contain multiple variables that have the same or different data types. They can even be other structures or lists. Each variable has its own name.

By declaring a structure, you create a new data type. When you create a variable with this data type, you actually create all the variables inside this structure, too.

Let’s say you need to store an address of a customer. There you put the name of the street (a string), the name of the city (another string) and the postal code (an integer).

Then you can create another structure for all customer data. It could contain the name of the customer (a string), its customer ID (an integer) and the address (the structure that we created above).

With structures, you can create complex data types that contain all data that you need for a given situation. Most concepts that are needed in a program can be represented by a well-designed aggregation of information in a structure.

Becoming a Data Expert

Every program stands on two pillars: Data and the sequences that transform this data into the output that you need. This article explained how you can represent real world data in binary form so that your program can work with it.

When you create a program, ask yourself: Which kind of data do I need? How do I store it? Can I use a basic data type, or do I need to create a list or a structure?

And always make sure that you give your variables enough bits so you are not bitten by an overflow.