In this article we’ll take a look at all the optimizations the compilers use to assembly the high-level switch statements into their assembly representations. Switch Statements The first example that we’ll look like uses the code shownbelow: [cpp] #include “stdafx.h” #include <stdio.h> int _tmain(intargc, _TCHAR* argv[]) { int x = 1; switch(x) { case 1: printf(“Number 1!n”); break; case 2: printf(“Number 2!n”); break; case 3: printf(“Number 3!n”); break; default: printf(“Number %d!n”, x); } /* wait */ getchar(); return 0; } [/cpp] We have saved the number 1 into the variable x and then used the switch statement to print the message about which number we’re dealing with. The switch statement compares the value in variable x to the 1, 2, 3; but it also has a default code block which is executed if none of the previous conditions are evaluated to true and its corresponding code block executed. Upon running the executable, the “Number 1!” message will be displayed, as we can see on the picture below:
Let’s present Ida’s graph, where it’s evident that we have four blocks of code and one of them gets executed.
First, we’re saving the number 1 into the stack offset [ebp+var_D0] and then comparing that value to 1. In our case, the jz (jump if zero) instruction is evaluated to true, which means that we’ll jump to the “Number 1!” If we’re not dealing with number 1, the next comparison is being evaluated,where we’re comparing the variable to number 2. If the number is 2, then we’re jumping to the printf statement that prints the “Number 2!” to the console. Next we’re comparing the number to number 3. If the variable isn’t number 1, 2 or 3, we’re executing the last code block where we’re printing the “Number %d!” to the console window. The actual code presented with subsequent virtual addresses can be seen below (multiple screenshots are presented, because the whole code cannot fit in one picture nicely):
We can see that we’re first comparing the variable to numbers 1, 2 and 3 and jumping to appropriate addresses if the variable is equal (by using the jz: jump if zero instruction). Then we have the last jz routine after which there is another jump jmp instruction. This is because if the number does not equal to 3, we must jump to the last block, which prints the “Number %d!” to the console window. In the example above, we’ve seen that the switch statement was translated into multiple compare and jump instructions, which could as easily be written with an if-else if-else statement. Let’s take a look at another, more complex example. The C++ code is presented below: [cpp] #include “stdafx.h” #include <stdio.h> int _tmain(intargc, _TCHAR* argv[]) { int x = 1; switch(x) { case 1: printf(“Number 1!n”); break; case 2: printf(“Number 2!n”); break; case 3: printf(“Number 3!n”); break; case 4: printf(“Number 4!n”); break; case 5: printf(“Number 5!n”); break; case 6: printf(“Number 6!n”); break; default: printf(“Number %d!n”, x); } /* wait */ getchar(); return 0; } [/cpp] We can see that the code is completely the same, except that now it uses 6 case statements and before it only used 3 (if we don’t count the default case statement). We’re also storing the value 1 into variable x, so the program always prints the “Number 1!” message, but that is not the point. The point is that we’re trying to figure out how the compiler generates the lower-level assembly code from the high-level C++ code that uses the switch statement. Let’s first present the graph overview of the code, because the whole code cannot be fit into one picture. The graph overview of the code that uses switch statements is presented below:
On the picture above, we can see that at the first big block we’re initializing the stack and starting a comparison algorithm. The relevant code from that block is presented on the picture below:
We’re storing the value 1 into [ebp+x], [ebp+var_D0] as well as into the register eax. Then we’re also storing the number 1 into the register ecx and subtracting the value 1 from it, which means that the number stored in the register ecx is 0. After that we’re comparing that number to a number 5 and jumping to the location loc_4137E0 if the number is greater than 5 (if this is the case we’re jumping to the default switch code block and execute it). In our case it isn’t, so we’re continuing our execution in the small block presented on the picture below (we’re not jumping to any location, just continuing the execution of the next virtual address):
First, we’re storing the previous value 0 into the register edx and then we see that something interesting happens… The value in the edx register is just an index into the array, which is where the jump addresses are being stored. In this case, the edx is 0, so we’re accessing the first 413820[0] element of the array. The off_ sting means that Ida generated the name by itself, naming the data that contains the offset values. The whole array is presented on the picture below:
The array has 6 elements, each of type double word (note the dddata type). The array holds the following elements:
arr[0] = $LN7_1 arr[4] = $LN6_2 arr[8] = $LN5_1 arr[C] = $LN4 arr[10] = $LN3 arr[14] = $LN2
The array elements are basically just aliases for the virtual address. Those can be seen in the Names Windows, which presents all the names found within the executable program, even the names that Ida generated by itself for different virtual addresses. The picture below presents all those names as they appear in the Names Window: This clearly states that all the names are actually virtual addresses, which are used as jump targets for the switch statement. The compiler basically generated a pointer table that contains the pointers to different code blocks. When the switch block is executed, the program must calculate the index of the pointer in the pointer table and jump to the address stored at that index. But because the array indexes start with the number zero, the program must take the value of the variable x (from C++ program) and subtract a value 1 from it to get the actual index. Then it accesses the value stored at that index in the pointer array and jumps to that address. In our case, the “off_413820[edx*4]” instruction accesses the first pointer in the pointer table, which is the name LN7_1 that is an alias for the address 0x00413747. This is why the program jumps to that address. The block contained at the address 0x00413747 can be seen below:
We can see that the block code prints the “Number 1!” What’s even more interesting is that the pointer array data is being put in the .text section of the executable. Whenever that happens, we can immediately suspect that we’re dealing with a pointer array structure that’s being used by the switch statement. It’s also worth pointing out that the code blocks are put sequentially in the memory, so in the graph overview picture above, the code blocks are sequentially numbered from left to right. This is all fine, but what happens if we skip one subsequent number used with cases in the switch block. If we delete the second case that checks whether the number is 2, the code will essentially be the same as before, but the second pointer in the array would be blank. This means that the pointer array would be the same size as before, but one of the cells would be empty. This makes sense if the gaps between the use cases are small, because only the array of the size of the biggest element needs to be created. But what happens if we’re using the numbers 1, 2, 3, 4, 5 and 10000. The last element is certainly the biggest and the compiler won’t create an array with 10000 elements. Let’s take a look at how the compiler handles such cases. We can see the important code on the picture below:
First, we’re comparing the number in the variable x to the 0x2710 (10000 in decimal) and if the number is bigger than that, we’re jumping to the default case (visiting the green arrow whose code block is not shown on the picture). Otherwise, we’re jumping to the next block on the picture, where we’re again comparing the number in the variable x to 10000. If the number matches, we’re going to the code block that prints “Number 10000!” directly. Otherwise, we’re visiting a code block that evaluates the rest of the cases with the use of an pointer array. But this time the pointer array only holds five elements, as can be seen on the picture below:
We’ve just seen that if one case really stands out from the others (in this case the number 10000), the compiler generates a special case where the comparison just for that number happens separately from the pointer array implementation. So the compiler still uses an array of pointers to jump to the right code blocks, but it also introduces a special case when the number is 10000. This is all fine, but what if there are no numbers close enough for the compiler to generate an array? In the next example we’ll be using the numbers 1, 10000, 20000, 30000, 40000 and 50000. Let’s present the C++ code again for clarity: [cpp] #include “stdafx.h” #include <stdio.h> int _tmain(intargc, _TCHAR* argv[]) { int x = 1; switch(x) { case 1: printf(“Number 1!n”); break; case 10000: printf(“Number 10000!n”); break; case 20000: printf(“Number 20000!n”); break; case 30000: printf(“Number 30000!n”); break; case 40000: printf(“Number 40000!n”); break; case 50000: printf(“Number 50000!n”); break; default: printf(“Number %d!n”, x); } /* wait */ getchar(); return 0; } [/cpp] In this case the compiler won’t be checking for each of the cases sequentially, but will rather implement a binary search algorithm, where it will first compare the variable x to the median number of all numbers and then decide whether the current number is smaller or bigger to the median number. Then it will traverse down the tree to compare the median number of the one half of the numbers recursively until it finds the right number. This is done so that the problem of searching for the right number isn’t of the complexity O(n), but rather O(logn), which reduces the problem considerably, especially if we’re calling the switch statement. The picture below presents this algorithm (don’t mind the arrows: I had to move the nodes around and arrows are not automatically adjusted, so I had to drag them too; but this is very tedious with so many arrows, which is why they are not presented particularly good.):
Let’s first present the numbers that we’re using: 1, 10000, 20000, 30000, 40000, and 50000. The median number is 30000, where the smaller numbers are 1, 10000 and 20000 and the bigger numbers are 40000 and 50000. This is why we’re comparing the number in variable x with the median number 30000 first. If the number is bigger than 30000, we’re immediately jumping to the comparison of the number 40000 and then 50000. But if the number is smaller, we’re first comparing the number to 1, then 10000 and at last,20000. I guess a true O(logn) algorithm would have been if, after comparing to the number 30000, we would compare the number 10000, then 1 and at last 20000, but the compiler decided otherwise. This is compiler specific and the compiler probably decided to compare the rest of the numbers 1, 10000 and 20000 in sequence, since we’re only dealing with three numbers. This would have been particularly different if we would be dealing with a lot more numbers than we are now. Conclusion We’ve seen how the compiler can implement the switch statements in assembly language. First, we’ve seen the easiest approach, where the compiler compares the numbers in sequence. After that we’ve seen that the compiler used a pointer table approach. Last, we’ve seen that it uses a binary search algorithm. This makes us think that compilers really do have a lot of optimization tricks up their sleeve when compiling a source code to binary executable. References [1] Chris Eagle, The IDA Pro Book: The unofficial guide to the world’s most popular disassembler.