What exactly does processor to your code and reverse engineering

Sometimes people ask me why is reverse engineering so hard. “But you have the files, why don’t you just somehow look in them?”

Well, you can “somehow” look in them, but that is the hard part. Let’s try it on example – we will write a code, generate exe and then we will try to go backwards.

We will write code in C, initializing two variables and then saving their sum to third variable.

The code:



int main()

{

  int a = 3;

  int b = 5;

  int c;

  c = a+b;

}

Okay, so more slowly – I told the language i want to have variables ‘a’ and ‘b’ with values 3 and 5. I want these numbers to be integers, so I wrote “int” before them. Then I want to have a variable ‘c’,  in which I save sum of ‘a’ and ‘b’.

All of this is in “main”, because that is the entry point of the program – if I did not put anything in main, the language would recognize it and assume I don’t want anything happening, so it would throw away anything I wrote about a, b and c.

But all of this is kinda “human readable” – why would your PC know what you mean by “int”, “main” or even equal sign?

To translate to “computer language”, we need instructions of how to move things in memory itself – we have a compiler for this. It reads the code we write and do “some magic” to it. After this magic we get an .exe file. 

Okay, cool, so we called our compiler on our code, it translates code to something else (yielding exe in process), computer understands and somewhere in processor value 3+5 was calculated. But how exactly?

________________________________________________________________

We can use a program to reverse engineer the process – meaning having only the .exe we can check the more instructions much more deeply. The problem is that we don’t get C code we wrote, but something much more complicated.

We get this:

Don’t panic, only few of the lines are relevant for our explanation. The not relevant parts do “something” to memory which we will just ignore here. All the lines are code in language called assembler. On the left side is instruction on the right side some value.
 

The part we are looking for is

mov [rbp+var_4], 3

mov [rbp+var_8], 5

mov edx, [rbp+var_4]

mov eax, [rbp+var_8]

add eax, edx

mov means move and add means… add. There are two explicit values, 3 and 5 – so compiler totally forgot the original names of our variables. They are moved around to some registers ending in edx and eax. And then instruction add adds values from register eax and edx and save the result to eax.

The exact term about what we did is “disassemble”, because we had resulting .exe and check the assembler code.

But we compiled our original code, is it possible to decompile, which would the original C code? Yes, it is possible, but often it does not really help.


It may have several reasons, compiler trying to optimize our code, allocating memory for variables in unexpected places, realizing variable c is never used, so it never uses it etc.   

This case was kinda easy (if one knows C and Assembler ofc), next time we will try something harder.