LuminousMonkey

Trying to understand activation records.

It's interesting to see the development of programming languages over time, in particular to see how programming languages influenced the development of the underlying hardware. I'm currently studying programming languages at Uni, and the majority of the course is the study of past programming languages. We've started off with Pseudo-codes (very early assembly type language), and progressed from there. FORTRAN, ALGOL, Pascal, and some other languages are mentioned off-hand.

It's interesting to note that early computers didn't have the concept of the stack, that wonderful little memory space that the processor can use for keeping track of where it has been, and what it's doing (in the context of the current running function). The concept of the stack came about as the result of programming languages. In particular, the block structure of ALGOL called for the tracking of both static and dynamic "activation records". Best I can figure out is that this is what's known as the call stack.

But, I'm just not quite sure how exactly these static and dynamic activation records are supposed to work. In particular the idea of traversing the stack to go and find variables that are outside the scope of the current block. I know that global variables cover this, but, as far as I'm aware, the direct address of the global variable is used. So, to help with my understanding, I wrote a contrived C program and went through the assembly code that resulted.

I compiled with GCC on an Arch Linux install, with 64bit compilation. Just using -S to generate the assembly in AT&T syntax. (operator source destination)

When I say contrived, I mean it, I've plonked a statement block in the middle of main, all by itself. I've even used, gasp, GCC extensions that aren't standard C, something I loathe to do. Anyway, I wanted to test to see if statement blocks created an activation record, so I have:

{
    int x = 22;
    printf("main inner block x: %d\n", x);
    printf("main inner block y: %d\n", y);
    x += z;

    printf("main inner block x after z added: %d\n", x);
}

Right in the middle of my main. I suspect that the x variable, is just logically scoped. By that I mean that no activation record would be created, because it wouldn't be needed, x is just a symbol of reference in the source code.

Looking at the assembly, we can see that we have that inner statement block labelled:

.LBB2:
    movl    $22, -12(%rbp)  // Inner statement block, x = 22.
    movl    -12(%rbp), %eax // Set up parameters for function call.
    movl    %eax, %esi
    movl    $.LC3, %edi
    movl    $0, %eax
    call    printf          // Save next address onto stack, jump to printf.
    movl    -8(%rbp), %eax  // Set up parameters for function call.
    movl    %eax, %esi
    movl    $.LC4, %edi
    movl    $0, %eax
    call    printf          // Save next address onto stack, jump to printf.
    movl    -16(%rbp), %eax // Load z into eax.
    addl    %eax, -12(%rbp) // x += z
    movl    -12(%rbp), %eax // Set up parameters for function call.
    movl    %eax, %esi
    movl    $.LC5, %edi
    movl    $0, %eax
    call    printf          // Save next address onto stack, jump to printf.
.LBE2:

We can see that 22 is being loaded onto our stack, the "movl $22, -12(%rbp)" bit. The stack grows down, and the rbp register holds current address of the stack frame (activation record), -12 being the index.

The variable x that is scoped outside this block, in the main function itself has the index -4. So this means we have the following stack:

All through this bit of code we can't see anything other than indexed stack load and stores, so this means no activation records. With a call to a function (call outerFunction), we will see the creation of a new stack frame.

outerFunction:
.LFB0:
    pushq   %rbp            // Push the previous ref to stackframe on stack.
    movq    %rsp, %rbp      // Save this function's stack frame
                            // ref into rbp.
    subq    $16, %rsp       // Reserve 16 bytes of space. (Pad stack frame).
    movl    $88, -4(%rbp)   // z = 88
    movl    -4(%rbp), %eax  // Call to printf.
    movl    %eax, %esi      // Parameter passing on registers.
    movl    $.LC0, %edi
    movl    $0, %eax
    call    printf          // Pushes address of next instruction onto
                            // the stack and jumps to printf
    leave                   // Move rbp to rsp, and pop off the
                            // stack into rbp.
    ret                     // Pop address off the stack and
                            // return to it.

Concerning stack frames, we only care about stack operations and things involving the rsp, and rbp registers. The "pushq %rbp" instruction, is what does this. It then takes the current value of the stack (rsp) and copies it into the stack frame register (rbp). This means that rbp can then be used for referring to anything local to the functions stack frame.

Pushing onto the stack advances the stack and copies the value into the resulting memory address. Since the stack frame is copied into the rbp register this means that rbp points to the calling functions stack frame.

However, we are still not getting any traversal of any of these stack frames, at least nothing that would indicate anything about dynamic or static activation records. The problem seems to be that you need to try and access variables outside your scope across stack frame boundaries. You can't do this with C, sure, you have global scope, but that doesn't have a stack frame, any globals will be referenced directly.

So, what to do? Well to force the issue of a function using a variable out of its local scope, but not using the global scope, I've had to use a GCC trick.

I have a function declared inside another function. This is not standard C:

  // In main()
  void innerFunction() {
    printf("Inner function z: %d\n", z);
  }

  innerFunction();

Ah ha! This will produce a stack frame, and also reference a variable that's out of a local scope.

innerFunction.2206:
.LFB2:
    pushq   %rbp            // Push the previous stackframe onto the stack.
    movq    %rsp, %rbp      // Save this function's stack frame.
    movq    %r10, %rax      // Wait a minute....
    movl    (%rax), %eax    // Set up for call to printf
    movl    %eax, %esi
    movl    $.LC1, %edi
    movl    $0, %eax
    call    printf          // Push address of next instruction onto
                            // the stack and jumps to printf.
    popq    %rbp            // Pop off old rbp from stack into rbp.
    ret                     // Pop address off the stack and return to it.

But, it doesn't seem to have worked as I would have thought. In particular where it actually gets the value of the variable, "movq %r10, %rax". Damnit! If I go tracing back before the function was called, we come across:

    leaq    -16(%rbp), %rax     // Load z's address into rax register.
    movq    %rax, %r10          // Copy rax to r10 register.
    movl    $0, %eax            // Void function.
    call    innerFunction.2206  // Save next address onto stack, jump to innerFunction.

The compiler is being tricky, rather than traverse back through to the previous stack frame, it loads the memory address of the variable into a register. This register is used to effectively pass in a reference to the variable which is on the caller's stack frame. I'm guessing it can do this hard coding of register passing because it's an internal function, so it's not like the scope of innerFunction is going to change, because C doesn't support dynamic scoping.

Effectively it seems that C itself only supports static activation records, and even then any operations in the functions will not traverse the stack, a function will not use the pointer to the previous stack frame to look back at variables. Thanks to the normal C scoping rules, there's no need. I can only force the issue via GNU only internal functions, and even then variables are just passed as per any normal sort of function parameter passing (although it uses one of the misc registers as per the useful info on X64_64 ABI).

I suspect any backtracking of activation records will require looking at a language that has a sort of dynamic binding, which I hope to follow up in a later post.

Magic Binary Numbers

How would you reverse a binary sequence? Say, given a 16-bit number?

unsigned int v;     // input bits to be reversed
unsigned int r = v; // r will be reversed bits of v; first get LSB of v
int s = sizeof(v) * CHAR_BIT - 1; // extra shift needed at end

for (v >>= 1; v; v >>= 1)
{
  r <<= 1;
  r |= v & 1;
  s--;
}
r <<= s; // shift when v's highest bits are zero
unsigned int v; // 32-bit word to reverse bit order

// swap odd and even bits
v = ((v >> 1) & 0x55555555) | ((v & 0x55555555) << 1);
// swap consecutive pairs
v = ((v >> 2) & 0x33333333) | ((v & 0x33333333) << 2);
// swap nibbles ...
v = ((v >> 4) & 0x0F0F0F0F) | ((v & 0x0F0F0F0F) << 4);
// swap bytes
v = ((v >> 8) & 0x00FF00FF) | ((v & 0x00FF00FF) << 8);
// swap 2-byte long pairs
v = ( v >> 16             ) | ( v               << 16);

Reboot

Well, to be honest, more like a work in progress. A work in progress that will probably stay a work in progress for awhile. At least up until, well, I better not get ahead of myself here.