Copy Constructor Magic
20 May 2018
This post is co-authored with Meet Udeshi.
Examine this piece of code. It has two classes foo
and bar
, bar
has an instance of foo
and a function which fetches a new instance of foo
, but the local variable in the function uses the same name as the member instance of foo
in bar
. This is allowed, -Wshadow
will provide you a warning. But it is understood that this can result in unexpected or garbage output.
#include <iostream>
using namespace std;
class foo{
int x,y,z;
public:
int gx(){return x;};
int gy(){return y;};
int gz(){return z;};
foo(int _x,int _y,int _z):x(_x),y(_y),z(_z){};
/*foo(const foo &kk){
x = kk.x;
y = kk.y;
z = kk.z;
}*/
void print(void){
cout << "x: " << x << ", " << "y: " << y << ", " << "z: " << z << endl;
}
};
class bar{
foo doo;
public:
bar(foo _doo):doo(_doo){};
foo scoo(void);
};
foo bar::scoo(void){
foo doo(doo.gx(), doo.gy(), doo.gz());
return doo;
}
int main(){
bar obj(foo(1,1,1));
for(int i = 0;i<5;i++){
obj.scoo().print();
}
return 0;
}
The above code, for example gives the following output. g++ -O0 -std=c++11 with g++ 5.4
x: 6299752, y: 0, z: -19107175 x: -19107175, y: 0, z: 4197253 x: 4197253, y: 0, z: 4197253 x: 4197253, y: 0, z: 4197253 x: 4197253, y: 0, z: 4197253
Well, what were we thinking? What we have done is b**t and we get that for the output! Just a moment though, uncomment the explicit definition of the copy constructor for class foo
and here is what you get
x: 1, y: 1, z: 1 x: 1, y: 1, z: 1 x: 1, y: 1, z: 1 x: 1, y: 1, z: 1 x: 1, y: 1, z: 1
That is the magic that I had to show you. Now we shall investigate where it comes from. The copy constructor definition that we put in place is equivalent to the default copy constructor that gcc would provide. What is different here? To help me further with this issue, I brought in help from Meet. Most of the following is written by him, with a few minor edits by me.
Basics of X86 assembly
The syntax is called “intel” syntax
which is basically inst dest, source
.
There are various addressing modes in X86 for either source or destination. They can either be a register (inside the processor) or a memory location (in RAM), or a name of some register.
Memory locations are written inside square brackets. They can be static, or dynamically computed using an expression based on registers.
Registers are named e**
and r**
. When same last two letters of
such two registers, they refer to the whole register and 32-bit section.
Example: eax
is 64-bit register, and rax
refers to 32-bit MSB of eax
.
So if you store a value in eax
and read from rax
, you will get higher
32 bits of the number stored. This was done as an addition for 64-bit processors over 32-bit processors (and also backward compatibility reasons).
Basic instructions are mov
which copies value from source to destination,
lea
or load effective address which copies memory address of the source
memory location to destination (much like creating a pointer),
call
which is responsible for calling functions.
Special purpose registers are rsp
and rbp
which are stack pointer
and stack-base pointer or frame pointer. They store the location of the stack for
the function.
What goes wrong when shadowing
This is the assembly implementation of function bar::scoo()
push rbp
mov rbp,rsp
push r12
push rbx
sub rsp,0x40
mov QWORD PTR [rbp-0x38],rdi
mov rax,QWORD PTR fs:0x28
mov QWORD PTR [rbp-0x18],rax
xor eax,eax
lea rax,[rbp-0x30]
mov rdi,rax
call b28 <_ZN3foo2gzEv>
mov r12d,eax
lea rax,[rbp-0x30]
mov rdi,rax
call b16 <_ZN3foo2gyEv>
mov ebx,eax
lea rax,[rbp-0x30]
mov rdi,rax
call b06 <_ZN3foo2gxEv>
mov esi,eax
lea rax,[rbp-0x30]
mov ecx,r12d
mov edx,ebx
mov rdi,rax
call b3a <_ZN3fooC1Eiii>
mov rax,QWORD PTR [rbp-0x30]
mov QWORD PTR [rbp-0x24],rax
mov eax,DWORD PTR [rbp-0x28]
mov DWORD PTR [rbp-0x1c],eax
mov rdx,QWORD PTR [rbp-0x24]
mov eax,DWORD PTR [rbp-0x1c]
mov rcx,rdx
mov edx,eax
mov rax,rcx
mov rbx,QWORD PTR [rbp-0x18]
xor rbx,QWORD PTR fs:0x28
je a05 <_ZN3bar4scooEv+0x8b>
call 840 <__stack_chk_fail@plt>
add rsp,0x40
pop rbx
pop r12
pop rbp
ret
- `sub rsp,0x40` subtracts 0x40 or 64 from register `rsp` i.e. allocating 64 bytes of space on the stack (stacks grow downward nowadays, so push operatins will decrement stack pointer)
- When calling any function of class foo, we have to provide a pointer to instance of foo. Much similar to how you pass a special variable `self` in python for member function calls. So in this, the foo instance has been stored at `[rbp-0x30]`. You can see that there are `lea rax, [rbp-0x30]` instructions before every function call.
- `call <_ZN3foo2gzEv>` calls foo::gz() using instance of foo at `[rbp-0x30]`. Notice that this instance isn't initialised at all when we call gx(), gy() or gz(). Hence, whatever values `[rbp-0x30]` has would be garbage. The calls return garbage and store it in local registers
- The function then passes these garbage value to the final init call which is `call <_ZN3fooC1Eiii>`. Basically our `doo` instance is reading from its uninitialized self and using those values to then later finally initialize itself.
Note: In this case, the new foo
variable was first allocated on the stack of function scoo
and hence held garbage. This happens because the lack of copy constructor
makes C++ restort to using internal copy operations
i.e. when returning the final object, the total 12 bytes of object foo
are read into one 32-bit register and one 64-bit regster (rdx and eax) and returned
How does the copy constructor affect this?
The copy constructor leads to a change in the calling convention of the function scoo. Previously, it returned the value by storing it inside registers. Now the caller(main function) provides callee(scoo function) with a pointer where it should store return value. You could think of it as pass by reference but for the return value.
Assembly for function bar::scoo()
push rbp
mov rbp,rsp
push r12
push rbx
sub rsp,0x20
mov QWORD PTR [rbp-0x28],rdi
mov QWORD PTR [rbp-0x30],rsi
mov rax,QWORD PTR fs:0x28
mov QWORD PTR [rbp-0x18],rax
xor eax,eax
mov rax,QWORD PTR [rbp-0x28]
mov rdi,rax
call b0c <_ZN3foo2gzEv>
mov r12d,eax
mov rax,QWORD PTR [rbp-0x28]
mov rdi,rax
call afa <_ZN3foo2gyEv>
mov ebx,eax
mov rax,QWORD PTR [rbp-0x28]
mov rdi,rax
call aea <_ZN3foo2gxEv>
mov esi,eax
mov rax,QWORD PTR [rbp-0x28]
mov ecx,r12d
mov edx,ebx
mov rdi,rax
call b1e <_ZN3fooC1Eiii>
nop
mov rax,QWORD PTR [rbp-0x28]
mov rdx,QWORD PTR [rbp-0x18]
xor rdx,QWORD PTR fs:0x28
je 9f1 <_ZN3bar4scooEv+0x77>
call 840 <__stack_chk_fail@plt>
add rsp,0x20
pop rbx
pop r12
pop rbp
ret
Notice the difference here: instead of loading rax
value using lea
instruction,
mov rax,QWORD PTR [rbp-0x28]
is being used.
This will not load address rbp-0x28
into rax
, but simply copy value at
location [rbp-0x28]
into rax
.
But still, the same procedure is being conducted.
The function calls gx()
, gy()
, gz()
on this
uninitialised location given by value of [rbp-0x28]
and then calls the constructor with those values.
Then how does the correct value pop out? We need to look at main() function where scoo() is called.
mov QWORD PTR [rbp-0x8],rax
xor eax,eax
lea rax,[rbp-0x14]
mov ecx,0x1
mov edx,0x1
mov esi,0x1
mov rdi,rax
call b1e <_ZN3fooC1Eiii>
lea rdx,[rbp-0x14]
lea rax,[rbp-0x20]
mov rsi,rdx
mov rdi,rax
call c36 <_ZN3barC1E3foo>
mov DWORD PTR [rbp-0x24],0x0
cmp DWORD PTR [rbp-0x24],0x4
jg a71 <main+0x77>
lea rax,[rbp-0x14]
lea rdx,[rbp-0x20]
mov rsi,rdx
mov rdi,rax
call 97a <_ZN3bar4scooEv>
lea rax,[rbp-0x14]
mov rdi,rax
call b88 <_ZN3foo5printEv>
add DWORD PTR [rbp-0x24],0x1
jmp a46 <main+0x4c>
mov eax,0x0
mov rcx,QWORD PTR [rbp-0x8]
xor rcx,QWORD PTR fs:0x28
We store the value of register rdi
in [rbp-0x28]
in function scoo(),
hence whatever is stored in rdi
at time of calling would be used as
instance location of return foo class.
lea rax,[rbp-0x14]
and mov rdi,rax
lead to value rbp-0x14
being passed to scoo as
location for return foo class. Note that is location is in the stack of main()
and not scoo()
like previous case.
Notice that rbp-0x14
is first passed to the foo constructor. This is because of this line
of code
bar obj(foo(1,1,1));
Here a temp foo
object has to be initialised and then passed to bar constructor.
When that constructor is called, the foo
object is not needed, hence compiler
tries to use that memory slot for the returned foo
object. Ideally it should not have
optimised this because we have switched off optimisation with -O0
but this may be a flaw in GCC. So, instead of picking up garbage values in scoo()
it picks up the correct values from the temp variable because of reused stack slot.
The proof of this comes from the fact that, when compiled with another compiler like clang, this behaviour is not replicated. Hence it is important to know that even the second code has the error of shadowing, it just isn’t being seen in the output due to a happy accident.
Our four cents!