Swift Weekly - Issue 02 - The Swift Runtime (Part 1) === Vandad Nahavandipoor http://www.oreilly.com/pub/au/4596 Email: vandad.np@gmail.com Blog: http://vandadnp.wordpress.com Skype: vandad.np Introduction === In this edition, I wanted to write about arrays and dictionaires and take the easy route. But I thought to myself: wouldn't be cool if _somebody_ dug deep into the Swift runtime for crying out loud? Then I thought that I cannot wait for somebody to do that so I'm going to have to do that myself. So here, this edition of Swift Weekly is about the Swift runtime. At least the basics. Please note that I am using a disassembler + dSYM file. I am disassembling the contents of the AppDelegate with some basic code in it and then hooking my disassembler up with the dSYM file to see more details. Also in this article I am testing the output disassembly of Xcode 6.1 on the x86_64 architecture, not ARM which is available on iOS devices. Constants in Swift === I wrote the following code in Swift ```swift func example1(){ let a = 0xabcdefa println(a) let b = 0xabcdefb println(b) } ``` And then I had a look at the generated assembly for the `example1()` function: ```asm push rbp ; XREF=0x1000000d0 mov rbp, rsp sub rsp, 0x20 mov rax, qword [ds:imp___got___TMdSi] ; imp___got___TMdSi add rax, 0x8 lea rcx, qword [ss:rbp+var_10] mov qword [ss:rbp+var_8], 0xabcdefa mov qword [ss:rbp+var_10], 0xabcdefa mov rdi, rcx mov rsi, rax call imp___stubs___TFSs7printlnU__FQ_T_ mov rax, qword [ds:imp___got___TMdSi] ; imp___got___TMdSi add rax, 0x8 lea rcx, qword [ss:rbp+var_20] mov qword [ss:rbp+var_18], 0xabcdefb mov qword [ss:rbp+var_20], 0xabcdefb mov rdi, rcx mov rsi, rax call imp___stubs___TFSs7printlnU__FQ_T_ add rsp, 0x20 pop rbp ret ``` This is quite a bit of code really for that very simple Swift code that we wrote but let's try to understand what is happening: 1. The code is setting up the stack 2. The code is then placing the value of `0xabcdefa` into the stack segment `ss:rbp+var_8`. However, as you can see, the `mov` instruction is called twice on two offsets into the stack with names of `var_8` and `var_10` with the exact same value. And the mov operation is a `qword` instruction which is a _move quad_ operation in fact, moving 64-bits of data to a specific address. Now if I try to get the actual offsets of `var_8` and `var_10` in the disassembler, I get the following results: ```asm lea rcx, qword [ss:rbp+0xfffffffffffffff0] mov qword [ss:rbp+0xfffffffffffffff8], 0xabcdefa mov qword [ss:rbp+0xfffffffffffffff0], 0xabcdefa ``` So this tells me that `var_10` comes __before__ `var_8` in memory. So the compiler is placing the value of `0xabcdefa` into the memory address at `0xfffffffffffffff0` in the stack and then placing the same value in the stack again at 8 bytes __after__ the first one. So the reason for this is that the first `mov` instruction places the value of `0xabcdefa` into our constant and the next one places the same value into the stack, ready for the `println` call. So the compiler is intelligent enough to know that the `println` instruction is passed the value of the constant `a` but since the value of this constant is now in the stack, it is more efficient to place the same value directly into the stack for the `println` call rather than read the value of the `a` constant from the stack and place it in the stack again. So this is what we learnt. 3. As you can see, the rest is also self explanatory. The value of `0xabcdefb` is placed inside the `b` constant, `println` again and so on. Now let's see what the compiler will generate if we execute this code: ```swift func example2(){ let a = 0xabcdefa println(0xabcdefa) } ``` The reason that I want to find this information out is to find out if the compiler will be intelligent enough to somehow understand that the value we are printing is the same value in the `a` constant and use that instead... Let's see what happens: ```asm push rbp mov rbp, rsp sub rsp, 0x10 mov rax, qword [ds:imp___got___TMdSi] ; imp___got___TMdSi add rax, 0x8 lea rcx, qword [ss:rbp+var_10] mov qword [ss:rbp+var_8], 0xabcdefa mov qword [ss:rbp+var_10], 0xabcdefa mov rdi, rcx mov rsi, rax call imp___stubs___TFSs7printlnU__FQ_T_ add rsp, 0x10 pop rbp ret ``` Well, it turns out that the compiler generated the same code as before without reusing the value of the `a` constant. One more observation is that the `rdi` and the `rsi` registers are being set up before the `println` function is called. The `rdi` register is set to `rcx` which as you can see, itself is set to `wrod [ss:rbp+var_10]`. `rcx` is loading the effective address for a location in stack where the value of `0xabcdefb` is stored and then `rdi` will point to that address. This tells me that whenever Swift calls a function like `println`, two things will happen: 1. The `rdi` register will point to the top of the stack where the parameters for the function are stored. 2. The `rsi` register will be set to _a_ value in the data-segment (I don't really understand that part of the code, `[ds:imp___got___TMdSi]`. If you know what this means, please correct this sentence and send a pull request. Mixing Constants and Variables === Now let's see how the Swift compiler deals with constants and variables in how it generates the assembly code: ```swift func example3(){ let a = 0xabcdefa var b = 0xabcdefb let c = a + b } ``` The assembly for this is: ```asm push rbp mov rbp, rsp mov qword [ss:rbp+var_10], 0xabcdefa mov qword [ss:rbp+var_8], 0xabcdefb mov qword [ss:rbp+var_18], 0x1579bdf5 pop rbp ret ``` The results are very clear: 1. Both __local__ constants and variables of type `Int` are stack values. 2. When a constant and a variable of type `Int` are added, Swift does not write code for the addition, but instead, if the information is available, adds the values at compile time and puts the results into the stack directly, saving execution time. Now let's have a look at some more data types like _Bool_, _double_ and _CGFloat_. ```swift func example4(){ let intConstant = 0xabcdefa let intVariable = 0xabcdefb let boolConstant = true var boolVariable = false let doubleConstant = 1.23 let doubleVariable = 2.34 let floatConstant:Float = 1.23 let floatVariable: Float = 2.34 } ``` And let's have a look at the output assembly: ```asm push rbp mov rbp, rsp movss xmm0, dword [ds:0x1000033b8] ; 0x1000033b8 movss xmm1, dword [ds:0x1000033bc] ; 0x1000033bc movsd xmm2, qword [ds:0x1000033c0] ; 0x1000033c0 movsd xmm3, qword [ds:0x1000033c8] ; 0x1000033c8 mov qword [ss:rbp+var_10], 0xabcdefa mov qword [ss:rbp+var_18], 0xabcdefb mov byte [ss:rbp+var_20], 0x1 mov byte [ss:rbp+var_8], 0x0 movsd xmmword [ss:rbp+var_28], xmm3 movsd xmmword [ss:rbp+var_30], xmm2 movss xmmword [ss:rbp+var_38], xmm1 movss xmmword [ss:rbp+var_40], xmm0 pop rbp ret ``` What is happening here is that the Swift compiler, for the x86_64 architecture: 1. Is placing the values of the doubles and the floats into the 128-bit SSE before the function even starts. The values for the floats and the doubles are stored in the data segment, so they are loaded into the `xmm0` through to the `xmm3` SSE registers. 2. Is loading the values of `0xabcdefa` and `0xabcdefb` into the stack segment, for the `Int` values, as we saw before. 3. Is loading the values of `true` and `false` as `0x01` and `0x00` into the stack, as bytes. That makes perfect sense. 4. It is then placing the 2x double values and 2x float values from the SSE registers of `xmm3` to `xmm0` into the stack, using the `movsd` instruction for doubles and `movss` for floats. `movsd` is for moving double precision floating point values and `movss` is for single precision so in fact Swift is differentiating between double and float. By defaut, we are encouraged to use doubles in Swift by the way instead of floats. However, reading the actual address of the `var_28`, `var_30`, 38 and 40 we can see the following: ```asm movsd xmmword [ss:rbp+0xffffffffffffffd8], xmm3 movsd xmmword [ss:rbp+0xffffffffffffffd0], xmm2 movss xmmword [ss:rbp+0xffffffffffffffc8], xmm1 movss xmmword [ss:rbp+0xffffffffffffffc0], xmm0 ``` This tells me that each one of the floating points and doubles is 8 bytes long. So single precision and double precision values are both stored in an 8-byte long data-segment space. So that's good to know. If you use floating values instead of double, you are __not__ making your binary smaller, so you might as well use double! Structures === Let's say that we have a structure like so: ```swift struct Person{ var age: Int } ``` And then we want to allocate an instance of it like so: ```swift func example5(){ let person = Person(age: 30) } ``` The output assembly for the `example5()` function will be like so: ```asm push rbp ; XREF=-[_TtC12swift_weekly11AppDelegate example5]+29 mov rbp, rsp sub rsp, 0x20 mov rax, 0x1e mov qword [ss:rbp+var_8], rdi mov qword [ss:rbp+var_10], rdi mov qword [ss:rbp+var_20], rdi mov rdi, rax ; argument #1 for method __TFV12swift_weekly6PersonCfMS0_FT3ageSi_S0_ call __TFV12swift_weekly6PersonCfMS0_FT3ageSi_S0_ mov qword [ss:rbp+var_18], rax mov rdi, qword [ss:rbp+var_20] ; argument #1 for method imp___stubs__objc_release call imp___stubs__objc_release add rsp, 0x20 pop rbp ``` So what happens here is that the stack is first set up and the value of 30 (the person's age) is placed inside the `rax` register and then `rax` is placed inside the `rdi` register before the `__TFV12swift_weekly6PersonCfMS0_FT3ageSi_S0_` function is called. What this really means is that we are following the System V calling convention when Swift compiles for x86_64 architecture. You can read more about the System V calling convention online but the gist is that the parameters to a method are placed inside `rdi`, then `rsi` and then `rdx` and `rcx` registers. In this case, the age of the person to be created (30) is being placed inside the `rdi` register. I can see that Swift in this case first put the value of 30 inside the `rax` and then moves the `rax` into `rdi`. Obviously this is very redundant but probably it's because the debug code is not optimized (optimization level = none, O). Then the important thing is the call to the `__TFV12swift_weekly6PersonCfMS0_FT3ageSi_S0_` system function. This is where the actual creation of the `Person` instance is done. Let's have a look at it: ```asm push rbp ; XREF=0x1000000d0, __TFC12swift_weekly11AppDelegate8example5fS0_FT_T_+33 mov rbp, rsp mov qword [ss:rbp+var_8], rdi mov rax, rdi pop rbp ret ``` Holy cow that was nothing like what you expected, right? You can see that the value of the `rdi` register is placed inside the stack at the address of `ss:rbp+var_8` and in my assembler `var_8` is defined to have the displacement of -8, so read that code as `ss:rbp-8`. So what is happening here is that the code is going into the stack and placing the age inside it. Well this tells us something. That the `Person` instance aws actually created in the stack of the `example5()` function. So this is very interesting. The caller creates the instance. This is very important to remember about Swift. No system call was made in this case to create an instance of the `Person` structure, nothing like an `alloc` or `init` method in Objective-C. Then once the value is placed into the stack, the `ret` instruction is called to return the instruction pointer to the caller, aka, `example5()`. So let's extend this example and have a look at an example where we set a few properties for the `Person` class. Let's change the `Person` class a bit: ```swift struct Person{ var age: Int = 0 var sex: Int = 0 var numberOfChildren: Int = 0 mutating func setAge(paramAge: Int){ age = paramAge } mutating func setSex(paramSex: Int){ sex = paramSex } mutating func setNumberOfChildren(paramNumberOfChildren: Int){ numberOfChildren = paramNumberOfChildren } } ``` And then create an instance: ```swift func example6(){ var person = Person() person.age = 0xabcdefa person.sex = 0xabcdefb person.numberOfChildren = 0xabcdefc } ``` And the assembly for `example6()` is like so: ```asm push rbp ; XREF=-[_TtC12swift_weekly11AppDelegate example6]+29 mov rbp, rsp sub rsp, 0x30 mov qword [ss:rbp+var_20], rdi mov qword [ss:rbp+var_28], rdi mov qword [ss:rbp+var_30], rdi call __TFV12swift_weekly6PersonCfMS0_FT_S0_ mov qword [ss:rbp+var_18], rax mov qword [ss:rbp+var_10], rdx mov qword [ss:rbp+var_8], rcx mov qword [ss:rbp+var_18], 0xabcdefa mov qword [ss:rbp+var_10], 0xabcdefb mov qword [ss:rbp+var_8], 0xabcdefc mov rdi, qword [ss:rbp+var_30] ; argument #1 for method imp___stubs__objc_release call imp___stubs__objc_release add rsp, 0x30 pop rbp ret ``` Well what you can see here is that: (based on a few speculations, submit pull-request if you can tell better please): 1. Again the stack is set up 2. The three quad-word `mov` instructions after the first `sub` instruction are actually setting up the Person structure in the stack. So here again, the Swift runtime is __not__ allocating an instance of Person as such, it is just freeing up memory in the stack for the 3 variables that this structure contains. Later in the code you can see that the mov quad-word instructions are being called to place the values of `0xabcdefa` and so on into the stack, or the instance of the `Person` structure. Conclusion === 1. Local variables are stored in the stack for structure types. No allocation or initialization is done such as those in `alloc` or `init` methods of the Objective-C class of `NSObject`. 2. The calling convention that Swift follows for x86_64 architecture is System V. 3. Double and Float values are stored into the memory using the `movsd` and `movss` instructions respectively, creating a real difference between how they are stored. Both these types take 8 bytes on a 64-bit iOS. 4. The `Bool` type is truely a `byte`, not a 32-bit or 64-bit natural data-type on a 64-bit operating system. I know that on x86_32 at least, doing `byte` operations are naturally slower than doing `dword` operations so keep a look out for that. If you are really concerned about optimization, use Int instead of Bool! 5. On 0-optimization, the compiler is intelligent enough to not move values from stack to stack, but rather reserver values into the registers directly, even if the value is the result of the addition or subtraction of 2 constants on the stack. The addition or the subtraction is done at compile-time! Where to go from Here === Obviously when I started writing this article, I knew I was opening a can of worms and that's the way daddy likes it so if you want to continue somewhere from here, just wait one week for the next issue of Swift Weekly where I will explore the Swift runtime even more.