Reverse engineering challenge – part one

I began a challenge for myself and I want to write about it. I’m documenting this challenge for my own reference, and I hope it might also inspire someone else. I want to try to use only PowerShell on Windows for reverse engineering while googling as least as possible (maybe except for exe “anatomy”, since I do not remember it much and trying to piece it together from examples would be extreme). I however remember some stuff and will use whatever I remember. 

I have found this repo https://github.com/NoraCodes/crackmes/tree/master

I compiled the crackme01.c with gcc (I of course did not look in the code) and will try to check only from powershell what are the hex contents. I tried to run it just by writing crackme01.exe, but that does not work. Luckily powershell advices how to run it – write .\ infront of, so I wrote .\crackme01.exe and it runs and says “Need exactly one argument.” And when I do for example  .\crackme01.exe abc, it says “No, abc is not correct.”. Okay, so we need to find out a string that is correct.

So i began playing with powershell commands. First of all I found out there is help for commands. I wrote “read” and was alt tabbing, but for some reason in powershell, there is only readelf (which would be for Linux). I suppose there is a command for Windows too, but probably begins with different word.

I found out that if I write “help XXX” and XXX doesnt exist, but is substring of any command, it will list the commands containing XXX as substring. But when I do help read, it does not show the readelf. Possibly bcs the whole name is “readelf.exe”, so it runs a program and is not command per se.

I tried “help file”, but cannot see any command that would be helpful. Also tried debug, or print. After a small thought, I tried “help hex” and found out there is a command with that name (actually I noticed later the name is format-hex with alias fhx). If you run it, it asks for path, so i just wrote name of the file (since i opened powershell in the folder with the exe) and it did really give me hex code!

I feel ecstatic, but the code is long and it cuts the top part after a while. I can see the hex numbers and if they have some counterparts in ASCII, I can read it as text right to these numbers. So sometimes there are some words, but most of it are random symbols.

Reading through the text that is right to it did not help me, so either it is not just about finding a string, or the string are not plain words, or I missed it.

First of all I need to find a way how to read the whole contents. I tried stopping powershell after he begins writing the contents, but no luck there. I need to find out where strings are stored in exe, so possibly I will need to google it.

Now I tried looking for how to save text from powershell to some text file. Help print, help text, help txt or help save do not seem they work.

After several minutes of thinking I tried help out. It found several things, but Out-File seems interesting, looking into it now.

“out-file crackme01.exe” nor “out-file abcd” do anything. The problem for me is that help out-file lists lots of arguments and I have no idea which one I have to use. When I copied whole path, like out-file C:\Users\Admin\Documents\ReverseEngineerDWRepo\crackmes-master\crackmes-master\exe
it says Access denied. I am not sure if I will need to need an access to do what I need, it is possible this command isnt the one I need.

I tried help access and a command, but feels a lot like wandering in dark, mainly when I dont know I need the out-file command. I will think for a while and maybe find some more and then maybe just google how to print out stuff to a text file.

Okay, I went back to the fhx cmd, if I did not miss anything, and Im confused af. I run it, input crackme01.exe, run it and it writes only first few bytes… namely first 16 numbers in hex? Exe should begin with hex numbers for M and Z in ascii. Huh?

Ahh, I cant even run the exe, I probably did something when I was playing with the out-file? Anyway, after recompiling it runs again. Also I noticed it isnt I saw first 16 numbers of the file, they are at the top of any file I put into fhx. So I saw only empty file.

But that does mean out-file works and really, I missed I created files in my folder. Unfortunately out-file abc.txt fhx crackme01.exe or out-file abc.txt crackme01.exe ds not work.

The syntax for out-file should be Out-File [-FilePath] <string> [[-Encoding]… (plus million of other arguments), but I cant seem to find out how to write the string.

 out-file abc.txt works,  out-file abc.txt XXX doesnt work. It is complaining about encoding argument, but that should be AFTER string argument. I think it is important to know what [] brackets mean and what <> brackets mean. If I try to do something out-file abc.txt <XXX>, it throws error about those <> brackets.

I tried help <>, and it threw error, BUT said “Missing file specification after redirection operator”. SOO it could mean > or < is used to tell powershell where I want the text to be redirected to.

Huh, out-file abc.txt > “a” doesnt throw error, but creates (empty) files abc.txt and a.
OH! I tried help > abc.txt and it wrote the contents to abc.txt!

I tried fhx crackme01.exe > hex.txt AND IT WORKS LESGO

Okay, so I finally have the hex contents. After some time it would be best to parse it, so I can just write a command and it will show me the header, .data section etc. But that is for much later.

Looking at the hex, I see parts of code, for example
“Need exactly one argument..password1.No, %s is not correct…Yes, %s is correct!”

%s is placeholder for string. So first of all I will try just running it with “password1”. AND IT WORKS!!!

So here we were lucky, because the “password1” was stored in the string that stood out a lot. Why, I dont know, because there could be only the %s IMO. Let’s see the next one.

What exactly does processor to your code and reverse engineering

Sometimes people ask me why is reverse engineering so hard. “But you have the files, why don’t you just somehow look in them?”

Well, you can “somehow” look in them, but that is the hard part. Let’s try it on example – we will write a code, generate exe and then we will try to go backwards.

We will write code in C, initializing two variables and then saving their sum to third variable.

The code:



int main()

{

  int a = 3;

  int b = 5;

  int c;

  c = a+b;

}

Okay, so more slowly – I told the language i want to have variables ‘a’ and ‘b’ with values 3 and 5. I want these numbers to be integers, so I wrote “int” before them. Then I want to have a variable ‘c’,  in which I save sum of ‘a’ and ‘b’.

All of this is in “main”, because that is the entry point of the program – if I did not put anything in main, the language would recognize it and assume I don’t want anything happening, so it would throw away anything I wrote about a, b and c.

But all of this is kinda “human readable” – why would your PC know what you mean by “int”, “main” or even equal sign?

To translate to “computer language”, we need instructions of how to move things in memory itself – we have a compiler for this. It reads the code we write and do “some magic” to it. After this magic we get an .exe file. 

Okay, cool, so we called our compiler on our code, it translates code to something else (yielding exe in process), computer understands and somewhere in processor value 3+5 was calculated. But how exactly?

________________________________________________________________

We can use a program to reverse engineer the process – meaning having only the .exe we can check the more instructions much more deeply. The problem is that we don’t get C code we wrote, but something much more complicated.

We get this:

Don’t panic, only few of the lines are relevant for our explanation. The not relevant parts do “something” to memory which we will just ignore here. All the lines are code in language called assembler. On the left side is instruction on the right side some value.
 

The part we are looking for is

mov [rbp+var_4], 3

mov [rbp+var_8], 5

mov edx, [rbp+var_4]

mov eax, [rbp+var_8]

add eax, edx

mov means move and add means… add. There are two explicit values, 3 and 5 – so compiler totally forgot the original names of our variables. They are moved around to some registers ending in edx and eax. And then instruction add adds values from register eax and edx and save the result to eax.

The exact term about what we did is “disassemble”, because we had resulting .exe and check the assembler code.

But we compiled our original code, is it possible to decompile, which would the original C code? Yes, it is possible, but often it does not really help.


It may have several reasons, compiler trying to optimize our code, allocating memory for variables in unexpected places, realizing variable c is never used, so it never uses it etc.   

This case was kinda easy (if one knows C and Assembler ofc), next time we will try something harder.