Copy Constructor Magic

20 May 2018

This post is co-authored with Meet Udeshi.

Examine this piece of code. It has two classes foo and bar, bar has an instance of foo and a function which fetches a new instance of foo, but the local variable in the function uses the same name as the member instance of foo in bar. This is allowed, -Wshadow will provide you a warning. But it is understood that this can result in unexpected or garbage output.

#include <iostream>
using namespace std;
                                                                                
class foo{                                                                      
    int x,y,z;                                                                  
                                                                                
    public:                                                                     
    int gx(){return x;};                                                        
    int gy(){return y;};                                                        
    int gz(){return z;};                                                        
    foo(int _x,int _y,int _z):x(_x),y(_y),z(_z){};                              
                                                                                
    /*foo(const foo &kk){                                                       
        x = kk.x;                                                               
        y = kk.y;                                                               
        z = kk.z;                                                               
    }*/                                                                         
                                                                                
    void print(void){                                                           
        cout << "x: " << x << ", " << "y: " << y << ", " << "z: " << z << endl; 
                                                                                
    }                                                                           
};                                                                              
                                                                                
class bar{                                                                      
    foo doo;                                                                    
                                                                                
    public:                                                                     
    bar(foo _doo):doo(_doo){};                                                  
    foo scoo(void);                                                             
};                                                                              
                                                                                
                                                                                
foo bar::scoo(void){                                                            
    foo doo(doo.gx(), doo.gy(), doo.gz());                                      
    return doo;                                                                 
}                                                                               
                                                                                
int main(){                                                                     
    bar obj(foo(1,1,1));                                                        
                                                                                
    for(int i = 0;i<5;i++){                                                     
        obj.scoo().print();                                                     
    }                                                                           
    return 0;                                                                   
}

The above code, for example gives the following output. g++ -O0 -std=c++11 with g++ 5.4

x: 6299752, y: 0, z: -19107175
x: -19107175, y: 0, z: 4197253
x: 4197253, y: 0, z: 4197253
x: 4197253, y: 0, z: 4197253
x: 4197253, y: 0, z: 4197253

Well, what were we thinking? What we have done is b**t and we get that for the output! Just a moment though, uncomment the explicit definition of the copy constructor for class foo and here is what you get

x: 1, y: 1, z: 1
x: 1, y: 1, z: 1
x: 1, y: 1, z: 1
x: 1, y: 1, z: 1
x: 1, y: 1, z: 1

That is the magic that I had to show you. Now we shall investigate where it comes from. The copy constructor definition that we put in place is equivalent to the default copy constructor that gcc would provide. What is different here? To help me further with this issue, I brought in help from Meet. Most of the following is written by him, with a few minor edits by me.

Basics of X86 assembly

The syntax is called “intel” syntax which is basically inst dest, source.

There are various addressing modes in X86 for either source or destination. They can either be a register (inside the processor) or a memory location (in RAM), or a name of some register.

Memory locations are written inside square brackets. They can be static, or dynamically computed using an expression based on registers.

Registers are named e** and r**. When same last two letters of such two registers, they refer to the whole register and 32-bit section. Example: eax is 64-bit register, and rax refers to 32-bit MSB of eax. So if you store a value in eax and read from rax, you will get higher 32 bits of the number stored. This was done as an addition for 64-bit processors over 32-bit processors (and also backward compatibility reasons).

Basic instructions are mov which copies value from source to destination, lea or load effective address which copies memory address of the source memory location to destination (much like creating a pointer), call which is responsible for calling functions.

Special purpose registers are rsp and rbp which are stack pointer and stack-base pointer or frame pointer. They store the location of the stack for the function.

What goes wrong when shadowing

This is the assembly implementation of function bar::scoo()

push   rbp
mov    rbp,rsp
push   r12
push   rbx
sub    rsp,0x40
mov    QWORD PTR [rbp-0x38],rdi
mov    rax,QWORD PTR fs:0x28

mov    QWORD PTR [rbp-0x18],rax
xor    eax,eax
lea    rax,[rbp-0x30]
mov    rdi,rax
call   b28 <_ZN3foo2gzEv>
mov    r12d,eax
lea    rax,[rbp-0x30]
mov    rdi,rax
call   b16 <_ZN3foo2gyEv>
mov    ebx,eax
lea    rax,[rbp-0x30]
mov    rdi,rax
call   b06 <_ZN3foo2gxEv>
mov    esi,eax
lea    rax,[rbp-0x30]
mov    ecx,r12d
mov    edx,ebx
mov    rdi,rax
call   b3a <_ZN3fooC1Eiii>
mov    rax,QWORD PTR [rbp-0x30]
mov    QWORD PTR [rbp-0x24],rax
mov    eax,DWORD PTR [rbp-0x28]
mov    DWORD PTR [rbp-0x1c],eax
mov    rdx,QWORD PTR [rbp-0x24]
mov    eax,DWORD PTR [rbp-0x1c]
mov    rcx,rdx
mov    edx,eax
mov    rax,rcx
mov    rbx,QWORD PTR [rbp-0x18]
xor    rbx,QWORD PTR fs:0x28

je     a05 <_ZN3bar4scooEv+0x8b>
call   840 <__stack_chk_fail@plt>
add    rsp,0x40
pop    rbx
pop    r12
pop    rbp
ret

`sub rsp,0x40` subtracts 0x40 or 64 from register `rsp` i.e. allocating 64 bytes of space on the stack (stacks grow downward nowadays, so push operatins will decrement stack pointer)
When calling any function of class foo, we have to provide a pointer to instance of foo. Much similar to how you pass a special variable `self` in python for member function calls. So in this, the foo instance has been stored at `[rbp-0x30]`. You can see that there are `lea rax, [rbp-0x30]` instructions before every function call.
`call <_ZN3foo2gzEv>` calls foo::gz() using instance of foo at `[rbp-0x30]`. Notice that this instance isn't initialised at all when we call gx(), gy() or gz(). Hence, whatever values `[rbp-0x30]` has would be garbage. The calls return garbage and store it in local registers
The function then passes these garbage value to the final init call which is `call <_ZN3fooC1Eiii>`. Basically our `doo` instance is reading from its uninitialized self and using those values to then later finally initialize itself.

Note: In this case, the new foo variable was first allocated on the stack of function scoo and hence held garbage. This happens because the lack of copy constructor makes C++ restort to using internal copy operations i.e. when returning the final object, the total 12 bytes of object foo are read into one 32-bit register and one 64-bit regster (rdx and eax) and returned

How does the copy constructor affect this?

The copy constructor leads to a change in the calling convention of the function scoo. Previously, it returned the value by storing it inside registers. Now the caller(main function) provides callee(scoo function) with a pointer where it should store return value. You could think of it as pass by reference but for the return value.

Assembly for function bar::scoo()


push   rbp
mov    rbp,rsp
push   r12
push   rbx
sub    rsp,0x20
mov    QWORD PTR [rbp-0x28],rdi
mov    QWORD PTR [rbp-0x30],rsi
mov    rax,QWORD PTR fs:0x28

mov    QWORD PTR [rbp-0x18],rax
xor    eax,eax
mov    rax,QWORD PTR [rbp-0x28]
mov    rdi,rax
call   b0c <_ZN3foo2gzEv>
mov    r12d,eax
mov    rax,QWORD PTR [rbp-0x28]
mov    rdi,rax
call   afa <_ZN3foo2gyEv>
mov    ebx,eax
mov    rax,QWORD PTR [rbp-0x28]
mov    rdi,rax
call   aea <_ZN3foo2gxEv>
mov    esi,eax
mov    rax,QWORD PTR [rbp-0x28]
mov    ecx,r12d
mov    edx,ebx
mov    rdi,rax
call   b1e <_ZN3fooC1Eiii>
nop
mov    rax,QWORD PTR [rbp-0x28]
mov    rdx,QWORD PTR [rbp-0x18]
xor    rdx,QWORD PTR fs:0x28

je     9f1 <_ZN3bar4scooEv+0x77>
call   840 <__stack_chk_fail@plt>
add    rsp,0x20
pop    rbx
pop    r12
pop    rbp
ret

Notice the difference here: instead of loading rax value using lea instruction, mov rax,QWORD PTR [rbp-0x28] is being used. This will not load address rbp-0x28 into rax, but simply copy value at location [rbp-0x28] into rax.

But still, the same procedure is being conducted. The function calls gx(), gy(), gz() on this uninitialised location given by value of [rbp-0x28] and then calls the constructor with those values.

Then how does the correct value pop out? We need to look at main() function where scoo() is called.


mov    QWORD PTR [rbp-0x8],rax
xor    eax,eax
lea    rax,[rbp-0x14]
mov    ecx,0x1
mov    edx,0x1
mov    esi,0x1
mov    rdi,rax
call   b1e <_ZN3fooC1Eiii>
lea    rdx,[rbp-0x14]
lea    rax,[rbp-0x20]
mov    rsi,rdx
mov    rdi,rax
call   c36 <_ZN3barC1E3foo>
mov    DWORD PTR [rbp-0x24],0x0
cmp    DWORD PTR [rbp-0x24],0x4
jg     a71 <main+0x77>
lea    rax,[rbp-0x14]
lea    rdx,[rbp-0x20]
mov    rsi,rdx
mov    rdi,rax
call   97a <_ZN3bar4scooEv>
lea    rax,[rbp-0x14]
mov    rdi,rax
call   b88 <_ZN3foo5printEv>
add    DWORD PTR [rbp-0x24],0x1
jmp    a46 <main+0x4c>
mov    eax,0x0
mov    rcx,QWORD PTR [rbp-0x8]
xor    rcx,QWORD PTR fs:0x28

We store the value of register rdi in [rbp-0x28] in function scoo(), hence whatever is stored in rdi at time of calling would be used as instance location of return foo class.

lea rax,[rbp-0x14] and mov rdi,rax lead to value rbp-0x14 being passed to scoo as location for return foo class. Note that is location is in the stack of main() and not scoo() like previous case.

Notice that rbp-0x14 is first passed to the foo constructor. This is because of this line of code

    bar obj(foo(1,1,1));

Here a temp foo object has to be initialised and then passed to bar constructor. When that constructor is called, the foo object is not needed, hence compiler tries to use that memory slot for the returned foo object. Ideally it should not have optimised this because we have switched off optimisation with -O0 but this may be a flaw in GCC. So, instead of picking up garbage values in scoo() it picks up the correct values from the temp variable because of reused stack slot.

The proof of this comes from the fact that, when compiled with another compiler like clang, this behaviour is not replicated. Hence it is important to know that even the second code has the error of shadowing, it just isn’t being seen in the output due to a happy accident.

Our four cents!