Throwaway variables in C

I'm re-reading "Refactoring" by Martin Fowler, which in turn was the result of me re-reading Steve Yegge's blog post (unfortunately I can't remember the post). One of the common things that programmers do, that may be a sacrifice of readability for optimisation, are temporary variables. As it turns out, this is an optimisation that seems like it doesn't have to be done.

Unfortunately I don't have any large project that I can use as an example, so I've had to create my own contrived test.

#include <stdio.h>

int square(int);

int main()
  for (int i = 0; i < 5; i++)
      printf("Result: %d\n", square(i));

  printf("Called again: %d\n", square(3));

int square(int toBeSquared)
  return toBeSquared * toBeSquared;

This code doesn't have any temporary variables, because I wanted to see what the compiler would do without them. The usual case for a temporary variable is something like the following:

  int tempVariable = someFunction(x);

  for (int i = 0; i < tempVariable; i++)
      printf("Blah: %d\n");

Of course this is done so you're not calling someFunction on every test of the loop. But, at least with optimisations turned on, you don't have to worry.

Anyway, I compiled my contrived example with GCC:

gcc -fverbose-asm -std=c99 -O3 blah.c -S

Checking the resulting code, we have:

  pushq %rbx  #
  .cfi_def_cfa_offset 16
  .cfi_offset 3, -16
  xorl  %ebx, %ebx  # i
  movl  %ebx, %esi  # i, D.2001
  xorl  %eax, %eax  #
  movl  $.LC0, %edi #,
  imull %ebx, %esi  # i, D.2001
  addl  $1, %ebx  #, i
  call  printf  #
  cmpl  $5, %ebx  #, i
  jne .L2 #,
  movl  $9, %esi  #,
  movl  $.LC1, %edi #,
  xorl  %eax, %eax  #
  call  printf  #
  xorl  %eax, %eax  #
  popq  %rbx  #
  .cfi_def_cfa_offset 8
  .size main, .-main
  .section  .text.unlikely
  .section  .text.startup
  .section  .text.unlikely
  .p2align 4,,15
  .globl  square
  .type square, @function
  movl  %edi, %eax  # toBeSquared, D.2006
  imull %edi, %eax  # toBeSquared, D.2006

Well, we have a function that gets made inline inside the for loop, and looking at my call to the function outside the loop, we can see it doesn't even bother, it just loads the value 9.

Now, this is a very simple function, and I would have to do more tests, but it would appear that the performance of certain refactorings, in particular ones involving replacing math and Boolean operations should not be a concern.

Write it as clear as possible, even to the extent of using a function just so you can give Boolean conditions a better name, and don't bother to use temp variables to cache results to avoid what you might think will be a function call. The compiler probably does it anyway.

Drinking the Emacs firehose.

In my wanderings of the Internet, I can't really remember what I was searching for, something Emacs related, I came across an interesting blog. Living an awesome life, is the blog of Sacha Chua a software developer who is very heavily into both blogging (her blog has regular posts spanning back to 2001), and organising her life using Emacs. Wow, really, wow. Her productivity really puts me to shame, so, I've been inspired.

I've decided that my current hobbyist usage of Emacs, and general organisation skills are simply not up to par and I should really make the effort to get better organised. Part of that is to actually blog regularly, and switch to using Emacs for everything I can.

The first stage being to switch to using Gnus to read my e-mails. Which I've done. It's still a little rough, but I can read and reply to e-mails (and it seems to be working). This also has a bonus, I can link to my e-mails from org-mode! Next thing I want to do is get my org-mode workflow a bit smoother, and then try to get any blogging as friction-less as possible.

Trying to understand activation records.

It's interesting to see the development of programming languages over time, in particular to see how programming languages influenced the development of the underlying hardware. I'm currently studying programming languages at Uni, and the majority of the course is the study of past programming languages. We've started off with Pseudo-codes (very early assembly type language), and progressed from there. FORTRAN, ALGOL, Pascal, and some other languages are mentioned off-hand.

It's interesting to note that early computers didn't have the concept of the stack, that wonderful little memory space that the processor can use for keeping track of where it has been, and what it's doing (in the context of the current running function). The concept of the stack came about as the result of programming languages. In particular, the block structure of ALGOL called for the tracking of both static and dynamic "activation records". Best I can figure out is that this is what's known as the call stack.

But, I'm just not quite sure how exactly these static and dynamic activation records are supposed to work. In particular the idea of traversing the stack to go and find variables that are outside the scope of the current block. I know that global variables cover this, but, as far as I'm aware, the direct address of the global variable is used. So, to help with my understanding, I wrote a contrived C program and went through the assembly code that resulted.

I compiled with GCC on an Arch Linux install, with 64bit compilation. Just using -S to generate the assembly in AT&T syntax. (operator source destination)

When I say contrived, I mean it, I've plonked a statement block in the middle of main, all by itself. I've even used, gasp, GCC extensions that aren't standard C, something I loathe to do. Anyway, I wanted to test to see if statement blocks created an activation record, so I have:

    int x = 22;
    printf("main inner block x: %d\n", x);
    printf("main inner block y: %d\n", y);
    x += z;

    printf("main inner block x after z added: %d\n", x);

Right in the middle of my main. I suspect that the x variable, is just logically scoped. By that I mean that no activation record would be created, because it wouldn't be needed, x is just a symbol of reference in the source code.

Looking at the assembly, we can see that we have that inner statement block labelled:

    movl    $22, -12(%rbp)  // Inner statement block, x = 22.
    movl    -12(%rbp), %eax // Set up parameters for function call.
    movl    %eax, %esi
    movl    $.LC3, %edi
    movl    $0, %eax
    call    printf          // Save next address onto stack, jump to printf.
    movl    -8(%rbp), %eax  // Set up parameters for function call.
    movl    %eax, %esi
    movl    $.LC4, %edi
    movl    $0, %eax
    call    printf          // Save next address onto stack, jump to printf.
    movl    -16(%rbp), %eax // Load z into eax.
    addl    %eax, -12(%rbp) // x += z
    movl    -12(%rbp), %eax // Set up parameters for function call.
    movl    %eax, %esi
    movl    $.LC5, %edi
    movl    $0, %eax
    call    printf          // Save next address onto stack, jump to printf.

We can see that 22 is being loaded onto our stack, the "movl $22, -12(%rbp)" bit. The stack grows down, and the rbp register holds current address of the stack frame (activation record), -12 being the index.

The variable x that is scoped outside this block, in the main function itself has the index -4. So this means we have the following stack:

All through this bit of code we can't see anything other than indexed stack load and stores, so this means no activation records. With a call to a function (call outerFunction), we will see the creation of a new stack frame.

    pushq   %rbp            // Push the previous ref to stackframe on stack.
    movq    %rsp, %rbp      // Save this function's stack frame
                            // ref into rbp.
    subq    $16, %rsp       // Reserve 16 bytes of space. (Pad stack frame).
    movl    $88, -4(%rbp)   // z = 88
    movl    -4(%rbp), %eax  // Call to printf.
    movl    %eax, %esi      // Parameter passing on registers.
    movl    $.LC0, %edi
    movl    $0, %eax
    call    printf          // Pushes address of next instruction onto
                            // the stack and jumps to printf
    leave                   // Move rbp to rsp, and pop off the
                            // stack into rbp.
    ret                     // Pop address off the stack and
                            // return to it.

Concerning stack frames, we only care about stack operations and things involving the rsp, and rbp registers. The "pushq %rbp" instruction, is what does this. It then takes the current value of the stack (rsp) and copies it into the stack frame register (rbp). This means that rbp can then be used for referring to anything local to the functions stack frame.

Pushing onto the stack advances the stack and copies the value into the resulting memory address. Since the stack frame is copied into the rbp register this means that rbp points to the calling functions stack frame.

However, we are still not getting any traversal of any of these stack frames, at least nothing that would indicate anything about dynamic or static activation records. The problem seems to be that you need to try and access variables outside your scope across stack frame boundaries. You can't do this with C, sure, you have global scope, but that doesn't have a stack frame, any globals will be referenced directly.

So, what to do? Well to force the issue of a function using a variable out of its local scope, but not using the global scope, I've had to use a GCC trick.

I have a function declared inside another function. This is not standard C:

  // In main()
  void innerFunction() {
    printf("Inner function z: %d\n", z);


Ah ha! This will produce a stack frame, and also reference a variable that's out of a local scope.

    pushq   %rbp            // Push the previous stackframe onto the stack.
    movq    %rsp, %rbp      // Save this function's stack frame.
    movq    %r10, %rax      // Wait a minute....
    movl    (%rax), %eax    // Set up for call to printf
    movl    %eax, %esi
    movl    $.LC1, %edi
    movl    $0, %eax
    call    printf          // Push address of next instruction onto
                            // the stack and jumps to printf.
    popq    %rbp            // Pop off old rbp from stack into rbp.
    ret                     // Pop address off the stack and return to it.

But, it doesn't seem to have worked as I would have thought. In particular where it actually gets the value of the variable, "movq %r10, %rax". Damnit! If I go tracing back before the function was called, we come across:

    leaq    -16(%rbp), %rax     // Load z's address into rax register.
    movq    %rax, %r10          // Copy rax to r10 register.
    movl    $0, %eax            // Void function.
    call    innerFunction.2206  // Save next address onto stack, jump to innerFunction.

The compiler is being tricky, rather than traverse back through to the previous stack frame, it loads the memory address of the variable into a register. This register is used to effectively pass in a reference to the variable which is on the caller's stack frame. I'm guessing it can do this hard coding of register passing because it's an internal function, so it's not like the scope of innerFunction is going to change, because C doesn't support dynamic scoping.

Effectively it seems that C itself only supports static activation records, and even then any operations in the functions will not traverse the stack, a function will not use the pointer to the previous stack frame to look back at variables. Thanks to the normal C scoping rules, there's no need. I can only force the issue via GNU only internal functions, and even then variables are just passed as per any normal sort of function parameter passing (although it uses one of the misc registers as per the useful info on X64_64 ABI).

I suspect any backtracking of activation records will require looking at a language that has a sort of dynamic binding, which I hope to follow up in a later post.

Magic Binary Numbers

How would you reverse a binary sequence? Say, given a 16-bit number?

unsigned int v;     // input bits to be reversed
unsigned int r = v; // r will be reversed bits of v; first get LSB of v
int s = sizeof(v) * CHAR_BIT - 1; // extra shift needed at end

for (v >>= 1; v; v >>= 1)
  r <<= 1;
  r |= v & 1;
r <<= s; // shift when v's highest bits are zero
unsigned int v; // 32-bit word to reverse bit order

// swap odd and even bits
v = ((v >> 1) & 0x55555555) | ((v & 0x55555555) << 1);
// swap consecutive pairs
v = ((v >> 2) & 0x33333333) | ((v & 0x33333333) << 2);
// swap nibbles ...
v = ((v >> 4) & 0x0F0F0F0F) | ((v & 0x0F0F0F0F) << 4);
// swap bytes
v = ((v >> 8) & 0x00FF00FF) | ((v & 0x00FF00FF) << 8);
// swap 2-byte long pairs
v = ( v >> 16             ) | ( v               << 16);


Well, to be honest, more like a work in progress. A work in progress that will probably stay a work in progress for awhile. At least up until, well, I better not get ahead of myself here.