We know that a computer executes a sequence of commands or operations. Each operation has a unique number, which is called an opcode. An operation can also have one or more parameters, like a register to read data from.
Each computer program is a sequence of these operations that is executed by the computer’s Control Unit. While it is perfectly possible to write a program this way, it is not very practical. Writing a program this way is very complicated. You have to do everything on your own. You have to remember the address of each variable that you create. Every time you call a function, you have to know the address of the start of the function, and you have to push and pop all the information on the stack. And if you want to use the program again on a different computer, you have to rewrite it, because every computer architecture has its own set of commands.
Because it is so hard to write a computer program this way, generations of programmers have created abstractions on top of these basic commands. These abstractions allow to write a program with commands that humans can remember easily and that perform many tasks automatically. These more abstract languages are what we call a programming language.
Each programming language comes with its own set of commands that are allowed and a set of rules how they can be arranged. This is also called the syntax of the programming language. A program that is written according to this rules can be translated into opcodes. This is done by a special program that comes with the programming language.
Let’s have a look at some kinds of programming languages.
The most basic programming language is the Assembly Language. It adds only a little abstraction to writing a program with opcodes.
But it allows to write a program with human-readable commands instead of binary numbers. A special program called an assember is used to translate the commands into the machine code that can be executed by the computer.
Like opcodes themselves, the assembly language depends heavily on the computer architecture. Thus programs written in the assembly language of an x86 computer cannot be executed on an ARM computer or vice versa.
High Level Programming Languages
Using only assembly language, it would be impossible to write large programs. So there are many high level programming languages that allow the programmers to think in terms of concepts instead of machine code.
To write code in machine code or assembly language, you have to remember where you stored a variable or how to execute a function. High level programming languages take over many of these tasks by introducing concepts that you can use in your program.
One of the most important concepts of high level programming language are functions. In a programming language that supports functions (which most languages do), you can simply write a function and call it from some place inside your code. The compiler or the interpreter of the programming language (more about this later) translate the function into the correct sets of commands in the machine code. For instance, they remember at which memory location the function is stored and also the memory location where the execution continues when the functions finishes. They put all the parameters on the stack and do all the “housekeeping”. The programmer does not have to program all these steps himself.
Another important concept is type safety. We learned about this in the chapter about data types. Type safety means that the programming language makes sure that you always use the right data for a variable. For instance, it makes sure that you don’t try to write an integer into a variable that holds a string. This helps to prevent nasty bugs in your program.
There are many more concepts that a programming language can support. So part of choosing the right programming language is to know which concepts you need to solve your problem.
Translations for the Computer
High level programming languages differ in the way that their code is translated into machine code. Some of them are compiled before the program executes, others are interpreted at runtime. And then there is also something in between.
A compiled language uses a program that is called a compiler to translate the program code into machine code. This compilation is done by the programmer. The result is a set of machine code instructions. Often the result is an executable program, which we know on Windows as an “exe file”. But it could also be a library that contains code that can be used by other executables.
During the compilation, the compiler checks certain rules. It makes sure that the program uses the syntax of the programming language correctly. It can also check that the rules of type safety are not violated. For this, you have to give each variable a type that can be checked at compile time. The type must not change at run time. Thus programming languages that enforce this rule are also called statically typed.
Compiled languages have the advantage that they produce machine code. This makes them very fast during execution (as opposed to interpreted code, see below). However, as the result is machine code, the program has to be compiled for each computer architecture and also for every operating system that you want to run the program on.
An interpreted language is not compiled. Instead, to execute the program, a special program called the interpreter is started first. The interpreter reads the program code and then executes the corresponding machine code. This means it translates the commands on the fly.
The interpretation saves the compilation step. This is an advantage while you are developing the program, because you don’t have to wait for the compilation to finish when you make a change to the program. However, the interpretation also makes the execution of the program slower and it needs more memory. Thus interpreted languages are not suitable for systems with timing constraints or limited memory, like real-time or embedded systems.
In an interpreted language, there is no compiler that checks the syntax of your program. So if you make a syntax error, your program will crash once it reaches the line with the syntax error. If you are unlucky, the syntax error will hide in a rarely executed place of the program and come up in an unexpected moment. This problem can be mitigated by the use of unit tests.
Some interpreted languages have no type safety at all. Others are dynamically typed. This means the type of a variable can be chosen at run time. But as soon as the variable has a type, the interpreter makes sure that you don’t assign a variable of another type to it.
Compiled languages are safer and faster at execution time, interpreted languages are more flexible and platform independent. But there is also something in between. These are called precompiled languages.
A precompiled language is in many ways like a compiled language. But the compiler does not produce machine code. Instead, it produces a special code that is called byte code. This byte code cannot be executed directly by a computer. It needs an interpreter instead.
A program in byte code can be executed on every computer where the interpreter is available. To speed up execution, the interpreters use a technique called just in time compilation to produce machine code during the execution of the program. Thus the programs are executed almost as fast as compiled programs, while enjoying platform independence.
How to Choose the Right Programming Language
Now that you know what a programming language is, you are ready to learn your first language. But which one? This is the topic of my next article. Stay tuned.