BrainsToBytes

A gentle introduction to pointers using the C programming language

Pointers are one of those concepts that make no sense when you first learn about them. Usually, a change of perspective is enough for something in your brain to click and - puff! - everything magically falls in its place.

Despite their bad reputation, pointers are extremely powerful tools. Moreover, in some languages (like C or Go) understanding them is a must if you want to be a proficient developer.

Let's explore pointers using the C programming language and a couple of very simple examples.

The humble pointer

Pointers are variables that hold a reference to something in memory. I know it sounds confusing, but with a couple of examples, it will make more sense. Look at the following code and the result of running it.

#include "stdio.h"
#include "string.h"

void print_character_info(char *char_pointer);

int main() {
  char alphabet[10] = {'a', 'b', 'c', 'd', 'e'};
  char *alphabet_pointer;

  // alphabet_pointer points to the beginning of alphabet, the character 'a'
  alphabet_pointer = alphabet;
  print_character_info(alphabet_pointer);

  // Now. let's make it point one character ahead
  alphabet_pointer++;
  print_character_info(alphabet_pointer);

  // One more, now it should point to 'c'
  alphabet_pointer++;
  print_character_info(alphabet_pointer);

  //  One more, now it should point to 'd'
  alphabet_pointer++;
  print_character_info(alphabet_pointer);

  // Let's move the pointer to the last element of the array, 'e'
  alphabet_pointer++;
  print_character_info(alphabet_pointer);

  return 0;
}


void print_character_info(char *char_pointer){
  printf("The pointer's address is: %p \n", &char_pointer);
  printf("It points to the address: %p\n", char_pointer);
  printf("The address it points to contains the value: %c\n\n", *char_pointer);
}

When run, it prints the following:

The pointer's address is: 0x7fff95b03d08 
It points to the address: 0x7fff95b03d2e
The address it points to contains the value: a

The pointer's address is: 0x7fff95b03d08 
It points to the address: 0x7fff95b03d2f
The address it points to contains the value: b

The pointer's address is: 0x7fff95b03d08 
It points to the address: 0x7fff95b03d30
The address it points to contains the value: c

The pointer's address is: 0x7fff95b03d08 
It points to the address: 0x7fff95b03d31
The address it points to contains the value: d

The pointer's address is: 0x7fff95b03d08 
It points to the address: 0x7fff95b03d32
The address it points to contains the value: e

A gentle, step by step analysis

Let's go line by line to better understand what happens. First, we declare the following two variables:

  char alphabet[10] = {'a', 'b', 'c', 'd', 'e'};
  char *alphabet_pointer;
  • alphabet is an array of characters, containing the first 5 letters of the English alphabet.
  • alphabet_pointer is a pointer of type char. In C, we define a pointer by prepending the asterisk '*' to the variable name. It's important to declare the correct type of pointer: it has to match the type of data it points to, otherwise, things can get a bit whacky (we'll talk about that later).

From the results of running the code, we can form a picture of the memory contents immediately after the declaration of our variables:

pointers1_start

As you can see, the pointer doesn't contain anything useful in the beginning.

Pointing, pointing, pointing

In our code, immediately after declaring the variables, we perform two actions:

  alphabet_pointer = alphabet;
  print_character_info(alphabet_pointer);
  1. We make alphabet_pointer point to the beginning of alphabet.
  2. We inspect the state of the pointer using print_character_info.

As a result, we get this useful info in the console:

The pointer's address is: 0x7fff95b03d08 
It points to the address: 0x7fff95b03d2e
The address it points to contains the value: a

The address of alphabet_pointer remains the same, we didn't perform any change on the position of the pointer itself. The interesting part is its content, now set to the value 0x7fff95b03d2e. Check the table, can you find it? Yes, it's the beginning of alphabet! More precisely, the address that contains the value 'a'.

You can access the value contained in the address being pointed to by using the '*' operator. The following line prints the value 'a' if run at this exact moment in execution:

    // *alphabet_pointer will return the value 'a'
    printf("%c", *alphabet_pointer);

This is the current state of memory, in pretty image form:

pointers1_toa

As you can see, our pointer's content is a simple address. That means we can change what it points to on the fly. C makes this very easy by handling pointer arithmetic for us, as we are about to see.

Pointer arithmetic

Pointer arithmetic means that you can perform basic arithmetic operations on pointers to manipulate which address they point to. For achieving this, you can add and subtract integers relative to their current position. Again, this is easier to understand with an example.

In our code, we repeatedly perform the following two operations:

  alphabet_pointer++;
  print_character_info(alphabet_pointer);

The second line is our trusty print_character_info function, nothing new. The first line, however, is pretty cool: we grab alphabet pointer and increase its value by 1, and as a result, we are now pointing to the next address in line (the one that contains 'b'). After performing the action once, we get the following results:

The pointer's address is: 0x7fff95b03d08 
It points to the address: 0x7fff95b03d2f
The address it points to contains the value: b

And after performing that action once more:

The pointer's address is: 0x7fff95b03d08 
It points to the address: 0x7fff95b03d30
The address it points to contains the value: c

We continue the process until it eventually points to the last element of the array: the character 'e'. In the end, memory looks like this:

pointers1_toe

Pointer types aren't just for show

Remember when I said that the type of a pointer is important? One of the reasons is that the compiler uses the type to perform pointer arithmetic calculations.

Different variable types are represented in memory using different amounts of bytes. On my computer Chars are represented using only one byte (and only one 'address slot'). Ints, on the other hand, need 4 bytes.

This means that when I write pointer++ for a char pointer, I am moving to the address immediately after the current one. If I perform the ++ on an int pointer, it will jump 4 positions ahead. We can see this effect with a different code example:

// The rest of the code is the same
int main() {
  char alphabet[10] = {'a', 'b', 'c', 'd', 'e'};
  // Note that now alphabet_pointer is of type int
  int *alphabet_pointer;

  alphabet_pointer = alphabet;
  print_character_info(alphabet_pointer);

  alphabet_pointer++;
  print_character_info(alphabet_pointer);

  return 0;
}
// The rest of the code is the same

After printing 'a' - the same in as our first example - we ++ the pointer. Because now it's of type int, it will jump ahead 4 positions, pointing to -you guessed it- the address with the character 'e'.

We can easily verify this by running the code, which prints:

The pointer's address is: 0x7fff95b03d08 
It points to the address: 0x7fff95b03d2e
The address it points to contains the value: a

The pointer's address is: 0x7fff95b03d08 
It points to the address: 0x7fff95b03d32
The address it points to contains the value: e

There is another important reason for specifying the type of the pointer. As you just read, different variables have different sizes in memory. So when we need to perform a dereference to get a value, the compiler needs to know how many bytes it should grab and how to interpret that data.

This last topic, alongside memory layout and variable/pointer casting deserve their own articles. We might come back later to talk about them.

I also need to clarify that pointers themselves take several memory positions. In the illustrations, I simplified it as a single address holding the whole pointer. In reality, they require as many bytes as necessary for representing an address in your CPU architecture (typically 4 or 8).

Pointers are cool, right?

I hope reading this article helped you understand pointers, or at the very least spark your interest in the topic. You might require some more time and experimentation to gain a full understanding, so feel free to grab the code sample and move things around. Tweak it and read the output, this approach has always helped me when learning new programming concepts.

Lots of other important things build on top of pointers. Even if you work mostly on higher level programming languages, understanding how things work under the hood is useful.

What to do next:

  • Share this article with friends and colleagues. Thank you for helping me reach people who might find this information useful.
  • The second chapter of Hacking: The Art of Exploitation has an amazing introduction to pointers and memory layout. Give it a try if you want to learn more about this topic. This and other very helpful books can be found in the recommended reading list.
  • Send me an email with questions, comments or suggestions (it's in the About Me page). Come on, don't be shy!
Author image
Budapest, Hungary
Hey there, I'm Juan. A programmer currently living in Budapest. I believe in well-engineered solutions, clean code and sharing knowledge. Thanks for reading, I hope you find my articles useful!