Tamas Ban | 8a7a551 | 2020-05-29 16:25:07 +0100 | [diff] [blame] | 1 | ###################################################### |
| 2 | Code sharing between independently linked XIP binaries |
| 3 | ###################################################### |
| 4 | |
| 5 | :Authors: Tamas Ban |
| 6 | :Organization: Arm Limited |
| 7 | :Contact: tamas.ban@arm.com |
| 8 | :Status: Draft |
| 9 | |
| 10 | ********** |
| 11 | Motivation |
| 12 | ********** |
| 13 | Cortex-M devices are usually constrained in terms of flash and RAM. Therefore, |
| 14 | it is often challenging to fit bigger projects in the available memory. The PSA |
| 15 | specifications require a device to both have a secure boot process in place at |
| 16 | device boot-up time, and to have a partition in the SPE which provides |
| 17 | cryptographic services at runtime. These two entities have some overlapping |
| 18 | functionality. Some cryptographic primitives (e.g. hash calculation and digital |
| 19 | signature verification) are required both in the bootloader and the runtime |
| 20 | environment. In the current TF-M code base, both firmware components use the |
| 21 | mbed-crypto library to implement these requirements. During the build process, |
| 22 | the mbed-crpyto library is built twice, with different configurations (the |
| 23 | bootloader requires less functionality) and then linked to the corresponding |
| 24 | firmware component. As a result of this workflow, the same code is placed in the |
| 25 | flash twice. For example, the code for the SHA-256 algorithm is included in |
| 26 | MCUboot, but the exact same code is duplicated in the SPE cryptography |
| 27 | partition. In most cases, there is no memory isolation between the bootloader |
| 28 | and the SPE, because both are part of the PRoT code and run in the secure |
| 29 | domain. So, in theory, the code of the common cryptographic algorithms could be |
| 30 | reused among these firmware components. This could result in a big reduction in |
| 31 | code footprint, because the cryptographic algorithms are usually flash hungry. |
| 32 | Code size reduction can be a good opportunity for very constrained devices, |
| 33 | which might need to use TF-M Profile Small anyway. |
| 34 | |
| 35 | ******************* |
| 36 | Technical challenge |
| 37 | ******************* |
| 38 | Code sharing in a regular OS environment is easily achievable with dynamically |
| 39 | linked libraries. However, this is not the case in Cortex-M systems where |
| 40 | applications might run bare-metal, or on top of an RTOS, which usually lacks |
| 41 | dynamic loading functionality. One major challenge to be solved in the Cortex-M |
| 42 | space is how to share code between independently linked XIP applications that |
| 43 | are tied to a certain memory address range to be executable and have absolute |
| 44 | function and global data memory addresses. In this case, the code is not |
| 45 | relocatable, and in most cases, there is no loader functionality in the system |
| 46 | that can perform code relocation. Also, the lack of an MMU makes the address |
| 47 | space flat, constant and not reconfigurable at runtime by privileged code. |
| 48 | |
| 49 | One other difficulty is that the bootloader and the runtime use the same RAM |
| 50 | area during execution. The runtime firmware is executed strictly after the |
| 51 | bootloader, so normally, it can reuse the whole secure RAM area, as it would be |
| 52 | the exclusive user. No attention needs to be paid as to where global data is |
| 53 | placed by the linker. The bootloader does not need to retain its state. The low |
| 54 | level startup of the runtime firmware can freely overwrite the RAM with its data |
| 55 | without corrupting bootloader functionality. However, with code sharing between |
| 56 | bootloader and runtime firmware, these statements are no longer true. Global |
| 57 | variables used by the shared code must either retain their value or must be |
| 58 | reinitialised during low level startup of the runtime firmware. The startup code |
| 59 | is not allowed to overwrite the shared global variables with arbitrary data. The |
| 60 | following design proposal provides a solution to these challenges. |
| 61 | |
| 62 | ************** |
| 63 | Design concept |
| 64 | ************** |
| 65 | The bootloader is sometimes implemented as ROM code (BL1) or stored in a region |
| 66 | of the flash which is lockable, to prevent tampering. In a secure system, the |
| 67 | bootloader is immutable code and thus implements a part of the Root of Trust |
| 68 | anchor in the device, which is trusted implicitly. The shared code is primarily |
| 69 | part of the bootloader, and is reused by the runtime SPE firmware at a later |
| 70 | stage. Not all of the bootloader code is reused by the runtime SPE, only some |
| 71 | cryptographic functions. |
| 72 | |
| 73 | Simplified steps of building with code sharing enabled: |
| 74 | |
| 75 | - Complete the bootloader build process to have a final image that contains |
| 76 | the absolute addresses of the shared functions, and the global variables |
| 77 | used by these functions. |
| 78 | - Extract the addresses of the functions and related global variables that are |
| 79 | intended to be shared from the bootloader executable. |
| 80 | - When building runtime firmware, provide the absolute addresses of the shared |
| 81 | symbols to the linker, so that it can pick them up, instead of instantiating |
| 82 | them again. |
| 83 | |
| 84 | The execution flow looks like this: |
| 85 | |
| 86 | .. code-block:: bash |
| 87 | |
| 88 | SPE MCUboot func1() MCUboot func2() MCUboot func3() |
| 89 | | |
| 90 | | Hash() |
| 91 | |------------->| |
| 92 | |----------------->| |
| 93 | | |
| 94 | Return | |
| 95 | Return |<-----------------| |
| 96 | |<-------------| |
| 97 | | |
| 98 | | |
| 99 | |----------------------------------------------------->| |
| 100 | | |
| 101 | Function pointer in shared global data() | |
| 102 | |<-----------------------------------------------------| |
| 103 | | |
| 104 | | Return |
| 105 | |----------------------------------------------------->| |
| 106 | | |
| 107 | Return | |
| 108 | |<-----------------------------------------------------| |
| 109 | | |
| 110 | | |
| 111 | |
| 112 | The execution flow usually returns from a shared function back to the SPE with |
| 113 | an ordinary function return. So usually, once a shared function is called in the |
| 114 | call path, all further functions in the call chain will be shared as well. |
| 115 | However, this is not always the case, as it is possible for a shared function to |
| 116 | call a non-shared function in SPE code through a global function pointer. |
| 117 | |
| 118 | For shared global variables, a dedicated data section must be allocated in the |
| 119 | linker configuration file. This area must have the same memory address in both |
| 120 | MCUboot's and the SPE's linker files, to ensure the integrity of the variables. |
| 121 | For simplicity's sake, this section is placed at the very beginning of the RAM |
| 122 | area. Also, the RAM wiping functionality at the end of the secure boot flow |
| 123 | (that is intended to remove any possible secrets from the RAM) must not clear |
| 124 | this area. Furthermore, it must be ensured that the linker places shared globals |
| 125 | into this data section. There are two way to achieve this: |
| 126 | |
| 127 | - Put a filter pattern in the section body that matches the shared global |
| 128 | variables. |
| 129 | - Mark the global variables in the source code with special attribute |
| 130 | `__attribute__((section(<NAME_OF_SHARED_SYMBOL_SECTION>)))` |
| 131 | |
| 132 | RAM memory layout in MCUboot with code sharing enabled: |
| 133 | |
| 134 | .. code-block:: bash |
| 135 | |
| 136 | +------------------+ |
| 137 | | Shared symbols | |
| 138 | +------------------+ |
| 139 | | Shared boot data | |
| 140 | +------------------+ |
| 141 | | Data | |
| 142 | +------------------+ |
| 143 | | Stack (MSP) | |
| 144 | +------------------+ |
| 145 | | Heap | |
| 146 | +------------------+ |
| 147 | |
| 148 | RAM memory layout in SPE with code sharing enabled: |
| 149 | |
| 150 | .. code-block:: bash |
| 151 | |
| 152 | +-------------------+ |
| 153 | | Shared symbols | |
| 154 | +-------------------+ |
| 155 | | Shared boot data | |
| 156 | +-------------------+ |
| 157 | | Stack (MSP) | |
| 158 | +-------------------+ |
| 159 | | Stack (PSP) | |
| 160 | +-------------------+ |
| 161 | | Partition X Data | |
| 162 | +-------------------+ |
| 163 | | Partition X Stack | |
| 164 | +-------------------+ |
| 165 | . |
| 166 | . |
| 167 | . |
| 168 | +-------------------+ |
| 169 | | Partition Z Data | |
| 170 | +-------------------+ |
| 171 | | Partition Z Stack | |
| 172 | +-------------------+ |
| 173 | | PRoT Data | |
| 174 | +-------------------+ |
| 175 | | Heap | |
| 176 | +-------------------+ |
| 177 | |
| 178 | Patching mbedTLS |
| 179 | ================ |
| 180 | In order to share some global function pointers from mbed-crypto that are |
| 181 | related to dynamic memory allocation, their scope must be extended from private |
| 182 | to global. This is needed because some compiler toolchain only extract the |
| 183 | addresses of public functions and global variables, and extraction of addresses |
| 184 | is a requirement to share them among binaries. Therefore, a short patch was |
| 185 | created for the mbed-crypto library, which "globalises" these function pointers: |
| 186 | |
| 187 | `lib/ext/mbedcrypto/0005-Enable-crypto-code-sharing-between-independent-binar.patch` |
| 188 | |
| 189 | The patch need to manually applied in the mbedtls repo, if code sharing is |
| 190 | enabled. The patch has no effect on the functional behaviour of the |
| 191 | cryptographic library, it only extends the scope of some variables. |
| 192 | |
| 193 | ************* |
| 194 | Tools support |
| 195 | ************* |
| 196 | All the currently supported compilers provide a way to achieve the above |
| 197 | objectives. However, there is no standard way, which means that the code sharing |
| 198 | functionality must be implemented on a per compiler basis. The following steps |
| 199 | are needed: |
| 200 | |
| 201 | - Extraction of the addresses of all global symbols. |
| 202 | - The filtering out of the addresses of symbols that aren't shared. The goal is |
| 203 | to not need to list all the shared symbols by name. Only a simple pattern |
| 204 | has to be provided, which matches the beginning of the symbol's name. |
| 205 | Matching symbols will be shared. Examples are in : |
| 206 | `bl2/src/shared_symbol_template.txt` |
| 207 | - Provision of the addresses of shared symbols to the linker during the SPE |
| 208 | build process. |
| 209 | - The resolution of symbol collisions during SPE linking. Because mbed-crypto |
| 210 | is linked to both firmware components as a static library, the external |
| 211 | shared symbols will conflict with the same symbols found within it. In order |
| 212 | to prioritize the external symbol, the symbol with the same name in |
| 213 | mbed-crypto must be marked as weak in the symbol table. |
| 214 | |
| 215 | The above functionalities are implemented in the toolchain specific CMake files: |
| 216 | |
| 217 | - `toolchain_ARMCLANG.cmake` |
| 218 | - `toolchain_GNUARM.cmake` |
| 219 | |
| 220 | By the following two functions: |
| 221 | |
| 222 | - `compiler_create_shared_code()`: Extract and filter shared symbol addresses |
| 223 | from MCUboot. |
| 224 | - `compiler_link_shared_code()`: Link shared code to the SPE and resolve symbol |
| 225 | conflict issues. |
| 226 | |
| 227 | ARMCLANG |
| 228 | ======== |
| 229 | The toolchain specific steps are: |
| 230 | |
| 231 | - Extract all symbols from MCUboot: add `-symdefs` to the compiler command line |
| 232 | - Filter shared symbols: call CMake script `FilterSharedSymbols.cmake` |
| 233 | - Weaken duplicated (shared) symbols in the mbed-crypto static library that are |
| 234 | linked to the SPE: `arm-none-eabi-objcopy` |
| 235 | - Link shared code to SPE: Add the filtered output of `-symdefs` to the SPE |
| 236 | source file list. |
| 237 | |
| 238 | GNUARM |
| 239 | ====== |
| 240 | The toolchain specific steps are: |
| 241 | |
| 242 | - Extract all symbols from MCUboot: `arm-none-eabi-nm` |
| 243 | - Filter shared symbols: call CMake script: `FilterSharedSymbols.cmake` |
| 244 | - Strip unshared code from MCUboot: `arm-none-eabi-strip` |
| 245 | - Weaken duplicated (shared) symbols in the mbed-crypto static library that are |
| 246 | linked to the SPE: `arm-none-eabi-objcopy` |
| 247 | - Link shared code to SPE: Add `-Wl -R <SHARED_STRIPPED_CODE.axf>` to the |
| 248 | compiler command line |
| 249 | |
| 250 | IAR |
| 251 | === |
| 252 | Functionality currently not implemented, but the toolchain supports doing it. |
| 253 | |
| 254 | ************************** |
| 255 | Memory footprint reduction |
| 256 | ************************** |
| 257 | Build type: MinSizeRel |
| 258 | Platform: mps2/an521 |
| 259 | Version: TF-Mv1.2.0 + code sharing patches |
| 260 | MCUboot image encryption support is disabled. |
| 261 | |
| 262 | +------------------+-------------------+-------------------+-------------------+ |
| 263 | | | ConfigDefault | ConfigProfile-M | ConfigProfile-S | |
| 264 | +------------------+----------+--------+----------+--------+----------+--------+ |
| 265 | | | ARMCLANG | GNUARM | ARMCLANG | GNUARM | ARMCLANG | GNUARM | |
| 266 | +------------------+----------+--------+----------+--------+----------+--------+ |
| 267 | | CODE_SHARING=OFF | 122268 | 124572 | 75936 | 75996 | 50336 | 50224 | |
| 268 | +------------------+----------+--------+----------+--------+----------+--------+ |
| 269 | | CODE_SHARING=ON | 113264 | 115500 | 70400 | 70336 | 48840 | 48988 | |
| 270 | +------------------+----------+--------+----------+--------+----------+--------+ |
| 271 | | Difference | 9004 | 9072 | 5536 | 5660 | 1496 | 1236 | |
| 272 | +------------------+----------+--------+----------+--------+----------+--------+ |
| 273 | |
| 274 | If MCUboot image encryption support is enabled then saving could be up to |
| 275 | ~13-15KB. |
| 276 | |
| 277 | .. Note:: |
| 278 | |
| 279 | Code sharing on Musca-B1 was tested only with SW only crypto, so crypto |
| 280 | hardware acceleration must be turned off: -DCRYPTO_HW_ACCELERATOR=OFF |
| 281 | |
| 282 | |
| 283 | ************************* |
| 284 | Useability considerations |
| 285 | ************************* |
| 286 | Functions that only use local variables can be shared easily. However, functions |
| 287 | that rely on global variables are a bit tricky. They can still be shared, but |
| 288 | all global variables must be placed in the shared symbol section, to prevent |
| 289 | overwriting and to enable the retention of their values. |
| 290 | |
| 291 | Some global variables might need to be reinitialised to their original values by |
| 292 | runtime firmware, if they have been used by the bootloader, but need to have |
| 293 | their original value when runtime firmware starts to use them. If so, the |
| 294 | reinitialising functionality must be implemented explicitly, because the low |
| 295 | level startup code in the SPE does not initialise the shared variables, which |
| 296 | means they retain their value after MCUboot stops running. |
| 297 | |
| 298 | If a bug is discovered in the shared code, it cannot be fixed with a firmware |
| 299 | upgrade, if the bootloader code is immutable. If this is the case, disabling |
| 300 | code sharing might be a solution, as the new runtime firmware could contain the |
| 301 | fixed code instead of relying on the unfixed shared code. However, this would |
| 302 | increase code footprint. |
| 303 | |
| 304 | API backward compatibility also can be an issue. If the API has changed in newer |
| 305 | version of the shared code. Then new code cannot rely on the shared version. |
| 306 | The changed code and all the other shared code where it is referenced from must |
| 307 | be ignored and the updated version of the functions must be compiled in the |
| 308 | SPE binary. The mbedTLS library is API compatible with its current version |
| 309 | (``v2.24.0``) since the ``mbedtls-2.7.0 release`` (2018-02-03). |
| 310 | |
| 311 | To minimise the risk of incompatibility, use the same compiler flags to build |
| 312 | both firmware components. |
| 313 | |
| 314 | The artifacts of the shared code extraction steps must be preserved so as to |
| 315 | remain available if new SPE firmware (that relies on shared code) is built and |
| 316 | released. Those files are necessary to know the address of shared symbols when |
| 317 | linking the SPE. |
| 318 | |
| 319 | ************************ |
| 320 | How to use code sharing? |
| 321 | ************************ |
| 322 | Considering the above, code sharing is an optional feature, which is disabled |
| 323 | by default. It can be enabled from the command line with a compile time switch: |
| 324 | |
| 325 | - `TFM_CODE_SHARING`: Set to `ON` to enable code sharing. |
| 326 | |
| 327 | With the default settings, only the common part of the mbed-crypto library is |
| 328 | shared, between MCUboot and the SPE. However, there might be other device |
| 329 | specific code (e.g. device drivers) that could be shared. The shared |
| 330 | cryptography code consists mainly of the SHA-256 algorithm, the `bignum` library |
| 331 | and some RSA related functions. If image encryption support is enabled in |
| 332 | MCUboot, then AES algorithms can be shared as well. |
| 333 | |
| 334 | Sharing code between the SPE and an external project is possible, even if |
| 335 | MCUboot isn't used as the bootloader. For example, a custom bootloader can also |
| 336 | be built in such a way as to create the necessary artifacts to share some of its |
| 337 | code with the SPE. The same artifacts must be created like the case of MCUboot: |
| 338 | |
| 339 | - `shared_symbols_name.txt`: Contains the name of the shared symbols. Used by |
| 340 | the script that prevents symbol collision. |
| 341 | - `shared_symbols_address.txt`: Contains the type, address and name of shared |
| 342 | symbols. Used by the linker when linking runtime SPE. |
| 343 | - `shared_code.axf`: GNUARM specific. The stripped version of the firmware |
| 344 | component, only contains the shared code. It is used by the linker when |
| 345 | linking the SPE. |
| 346 | |
| 347 | .. Note:: |
| 348 | |
| 349 | The artifacts of the shared code extraction steps must be preserved to be |
| 350 | able to link them to any future SPE version. |
| 351 | |
| 352 | When an external project is sharing code with the SPE, the `SHARED_CODE_PATH` |
| 353 | compile time switch must be set to the path of the artifacts mentioned above. |
| 354 | |
| 355 | ******************** |
| 356 | Further improvements |
| 357 | ******************** |
| 358 | This design focuses only on sharing the cryptography code. However, other code |
| 359 | could be shared as well. Some possibilities: |
| 360 | |
| 361 | - Flash driver |
| 362 | - Serial driver |
| 363 | - Image metadata parsing code |
| 364 | - etc. |
| 365 | |
| 366 | -------------- |
| 367 | |
| 368 | *Copyright (c) 2020, Arm Limited. All rights reserved.* |