Tuesday, 29 April 2014

Preprocessor Directives

Hi all, many new programmers don't really know that #include does, or that a preprocessor even exists, so I decided to post a short section talking about it.  Hope it helps.

A preprocessor directive is any line that begins with a hash: # symbol.

The preprocessor is a component that runs BEFORE the compiler and looks for preprocessor directives.

Examples of preprocessor directives include: #include, #define, #endif, etc....

Let's start with #include:
Say, your source file, looks like this:

#include <iostream>

int main()
{
}

That's it.  Now, when you Build your project, what happens is that before the compiler runs, the preprocessor starts first.  Its job is to look for any lines that begin with #.  In this case, we only have one preprocessor directive - #include <iostream>

When the preprocessor sees this, it REPLACES this #include directive with the contents of the iostream header file!  Now, your original .cpp file does NOT get modified; the preprocessor will build a temporary copy of your source file with all of the #include lines replaced and THAT is what the compiler actually compiles.

Now, what if <iostream> also has #include lines in it?  Then those lines would also be replaced as well with the contents of the files they include (remember, nothing is changed, a new source file is generated).  As you can imagine, our above .cpp file of only 4 lines can quickly transform into hundreds of thousands of lines!

Now, let's take a look at another kind of preprocessor directive: #define

A common use of #define is to create constants called Macros.

#define PI 3.14159

Then, let's say later on in main(), we do this:
cout<<PI<<endl;

What happens is the preprocessor will replace the word PI with 3.14159 everytime.  Note also that this change will appear only in the temporary file generated by the preprocessor; your original source code .cpp file remains the same.

Here's another example:

#define ARRAY_SIZE 20
int array[ARRAY_SIZE];

In the temporary file that the preprocessor generates, the line:

#define ARRAY_SIZE 20

is gone.

However, the line:

int array[ARRAY_SIZE];

has now been changed to this:

int array[20];

A preprocessor only performs substitution.

We don't use macros much anymore, if we even do.  The reason they existed was during the C programming era, programmers did not have the const keyword to create a const variable like: const int PI = 3.14159;
All they had were macros which allowed the preprocessor to replace a symbol with a value.  Nowadays, programmers encourage the use of 'const' and knowing about #define is mostly used for backwards compatibility (compatibility with source code written a while back, either in C or in the early days of C++).

The last thing I want to explain about preprocessor directives is the existence of inclusion guards.

First off, why do we need them?

Say that a header file, Object.h, contains the following lines:
class Object
{
   int x;
   int y;
}

That's it.  No include guards, nothing.  When you try to include this line later on into main.cpp, the preprocessor will happily oblige you and insert the contents of the file into the temp file.  Now, what if you accidentally #include "Object.h" twice?  Will the preprocessor insert the code twice?  Yes, it will, so the temp file will end up looking like this:

//Contents of temporary main.cpp
class Object
{
   int x;
   int y;
}

class Object
{
   int x;
   int y;
}

int main()
{
    return 0;
}

Now, when the compiler starts compiling this file, it will generate an error because it will see that you defined the Object class, twice!  How can we prevent an error like this?

We can "wrap" the class definition in Object.h inside include guards, like:
#ifndef OBJECT_H
#define OBJECT_H
class Object
{
   int x;
   int y;
}
#endif

The reason I marked a section green, is because that is the only code the preprocessor inserts.  The other lines #ifndef, #define and #endif are only preprocessor directives and do not end up in the temp file (the same goes for any preprocessor directive).

Now, the FIRST time you try to include Object.h, the preprocessor does the following:

- Sees the line: #ifndef OBJECT_H which means "if OBJECT_H is not defined", then:
- #define OBJECT_H which makes the preprocessor "remember" that a symbol called OBJECT_H has        been defined.  It also records which .cpp file defined the OBJECT_H symbol, in this case: main.cpp
- It includes all the lines highlighted above, until it hits the #endif and then it stops.

The SECOND time, you try to include "Object.h" into the SAME source file: main.cpp, the preprocessor does the following:

- #ifndef OBJECT_H makes it check: "Is OBJECT_H defined by main.cpp?"
- Yes!  So the preprocessor skips all the highlighted lines and jumps right down to the #endif.  There is nothing else to do, so it goes and does some other task.

Keep in mind that the preprocessor takes note of WHICH source file defined the OBJECT_H symbol.

This is important, because if another file, say: another.cpp tried to #include "Object.h", the preprocessor WOULD include the file, even if you already #include "Object.h" in main.cpp.
An inclusion guard ONLY prevents the SAME source file from #including a header MORE than once, it doesn't matter if two .cpp files included the same source file.

So, now: main.cpp and another.cpp have the class Object in their files.  What happens, then?  The linker, at the end, will remove any duplicates, so you don't need to worry about that. :)

Joe - sparkprogrammer@gmail.com