Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 1 | ################################ |
| 2 | Profiler tool and TF-M Profiling |
| 3 | ################################ |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 4 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 5 | The profiler is a tool for profiling and benchmarking programs. The developer can |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 6 | leverage it to get the interested data of runtime. |
| 7 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 8 | Initially, the profiler supports only count logging. You can add "checkpoint" |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 9 | in the program. The timer count or CPU cycle count of this checkpoint can be |
| 10 | saved at runtime and be analysed in the future. |
| 11 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 12 | ********************************* |
| 13 | TF-M Profiling Build Instructions |
| 14 | ********************************* |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 15 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 16 | TF-M has integrated some built-in profiling cases. There are two configurations |
| 17 | for profiling: |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 18 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 19 | * ``CONFIG_TFM_ENABLE_PROFILING``: Enable profiling building in TF-M SPE and NSPE. |
| 20 | It cannot be enabled together with any regression test configs, for example ``TEST_NS``. |
| 21 | * ``TFM_TOOLS_PATH``: Path of tf-m-tools repo. The default value is ``DOWNLOAD`` |
| 22 | to fetch the remote source. |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 23 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 24 | The section `TF-M Profiling Cases`_ introduces the profiling cases in TF-M. |
| 25 | To enable the built-in profiling cases in TF-M, run: |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 26 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 27 | .. code-block:: console |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 28 | |
Jianliang Shen | 25f6d7b | 2023-11-07 14:30:48 +0800 | [diff] [blame] | 29 | cd <path to tf-m-tools>/profiling/profiling_cases/tfm_profiling |
| 30 | mkdir build |
| 31 | |
| 32 | # Build SPE |
| 33 | cmake -S <path to tf-m> -B build/spe -DTFM_PLATFORM=arm/mps2/an521 \ |
| 34 | -DCONFIG_TFM_ENABLE_PROFILING=ON -DCMAKE_BUILD_TYPE=Release \ |
| 35 | -DTFM_EXTRA_PARTITION_PATHS=${PWD}/../prof_psa_client_api/partitions/prof_server_partition;${PWD}/../prof_psa_client_api/partitions/prof_client_partition \ |
| 36 | -DTFM_EXTRA_MANIFEST_LIST_FILES=${PWD}/../prof_psa_client_api/partitions/prof_psa_client_api_manifest_list.yaml \ |
| 37 | -DTFM_PARTITION_LOG_LEVEL=TFM_PARTITION_LOG_LEVEL_INFO |
| 38 | |
| 39 | # Another simple way to configure SPE: |
| 40 | cmake -S <path to tf-m> -B build/spe -DTFM_PLATFORM=arm/mps2/an521 \ |
| 41 | -DTFM_EXTRA_CONFIG_PATH=${PWD}/../prof_psa_client_api/partitions/config_spe.cmake |
| 42 | cmake --build build/spe -- install -j |
| 43 | |
| 44 | # Build NSPE |
| 45 | cmake -S . -B build/nspe -DCONFIG_SPE_PATH=build/spe/api_ns \ |
| 46 | -DTFM_TOOLCHAIN_FILE=build/spe/api_ns/cmake/toolchain_ns_GNUARM.cmake |
| 47 | cmake --build build/nspe -- -j |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 48 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 49 | ****************************** |
| 50 | Profiler Integration Reference |
| 51 | ****************************** |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 52 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 53 | `profiler/profiler.c` is the main source file to be complied with the tagert program. |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 54 | |
| 55 | Initialization |
| 56 | ============== |
| 57 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 58 | ``PROFILING_INIT()`` defined in `profiling/export/prof_intf_s.h` shall be called |
| 59 | on the secure side before calling any other API of the profiler. It initializes the |
| 60 | HAL and the backend database which can be customized by users. |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 61 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 62 | Implement the HAL |
| 63 | ----------------- |
| 64 | |
| 65 | `export/prof_hal.h` defines the HAL that should be implemented by the platform. |
| 66 | |
| 67 | * ``prof_hal_init()``: Initialize the counter hardware. |
| 68 | |
| 69 | * ``prof_hal_get_count()``: Get current counter value. |
| 70 | |
| 71 | Users shall implement platform-specific hardware support in ``prof_hal_init()`` |
| 72 | and ``prof_hal_get_count()`` under `export/platform`. |
| 73 | |
| 74 | Take `export/platform/tfm_hal_dwt_prof.c` as an example, it uses Data Watchpoint |
| 75 | and Trace unit (DWT) to count the CPU cycles which can be a reference for |
| 76 | performance. |
| 77 | |
| 78 | Setup Database |
| 79 | -------------- |
| 80 | |
| 81 | The size of the database is determined by ``PROF_DB_MAX`` defined in |
| 82 | `export/prof_common.h`. |
| 83 | |
| 84 | The developer can override the size by redefining ``PROF_DB_MAX``. |
| 85 | |
| 86 | Add Checkpoints |
| 87 | =============== |
| 88 | |
| 89 | The developer should identify the places in the source code for adding the |
| 90 | checkpoints. The count value of the timer or CPU cycle will be saved into the |
| 91 | database for the checkpoints. The interface APIs are defined in `export/prof_intf_s.h` for the secure side. |
| 92 | |
| 93 | It's also supported to add checkpoints on the non-secure side. |
| 94 | Add `export/ns/prof_intf_ns.c` to the source file list of the non-secure side. |
| 95 | The interface APIs for the non-secure side are defined in `export/ns/prof_intf_ns.h`. |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 96 | |
| 97 | The counter logging related APIs are defined in macros to keep the interface |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 98 | consistent between the secure and non-secure sides. |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 99 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 100 | Users can call macro ``PROF_TIMING_LOG()`` logs the counter value. |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 101 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 102 | .. code-block:: c |
| 103 | PROF_TIMING_LOG(topic_id, cp_id); |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 104 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 105 | +------------+--------------------------------------------------------------+ |
| 106 | | Parameters | Description | |
| 107 | +============+==============================================================+ |
| 108 | | topic_id | Topic is used to gather a group of checkpoints. | |
| 109 | | | It's useful when you have many checkpoints for different | |
| 110 | | | purposes. Topic can help to organize them and filter the | |
| 111 | | | related information out. It's an 8-bit unsigned value. | |
| 112 | +------------+--------------------------------------------------------------+ |
| 113 | | cp_id | Checkpoint ID. Different topics can have same cp_id. | |
| 114 | | | It's a 16-bit unsigned value. | |
| 115 | +------------+--------------------------------------------------------------+ |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 116 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 117 | Collect Data |
| 118 | ============ |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 119 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 120 | After successfully running the program, the data should be saved into the database. |
| 121 | The developer can dump the data through the interface defined in the header |
| 122 | files mentioned above. |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 123 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 124 | For the same consistent reason as counter logging, the same macros are defined as |
| 125 | the interfaces for both secure and non-secure sides. |
| 126 | |
| 127 | The data fetching interfaces work in a stream way. ``PROF_FETCH_DATA_START`` and |
| 128 | ``PROF_FETCH_DATA_BY_TOPIC_START`` search the data that matches the given pattern |
| 129 | from the beginning of the database. ``PROF_FETCH_DATA_CONTINUE`` and |
| 130 | ``PROF_FETCH_DATA_BY_TOPIC_CONTINUE`` search from the next data set of the |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 131 | previous result. |
| 132 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 133 | .. Note:: |
Kevin Peng | dc06d4b | 2023-07-13 15:31:15 +0800 | [diff] [blame] | 134 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 135 | All the APIs increase the internal search index, be careful about mixing using them |
Kevin Peng | dc06d4b | 2023-07-13 15:31:15 +0800 | [diff] [blame] | 136 | for different checkpoints and topics at the same time. |
| 137 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 138 | The match condition of a search is controlled by the tag mask. It's ``tag value`` |
| 139 | & ``tag_mask`` == ``tag_pattern``. To enumerate the whole database, set |
| 140 | ``tag_mask`` and ``tag_pattern`` both to ``0``. |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 141 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 142 | * ``PROF_FETCH_DATA_XXX``: The generic interface for getting data. |
| 143 | * ``PROF_FETCH_DATA_BY_TOPIC_XXX``: Get data for a specific ``topic``. |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 144 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 145 | The APIs return ``false`` if no matching data is found until the end of the database. |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 146 | |
| 147 | Calibration |
| 148 | =========== |
| 149 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 150 | The profiler itself has the tick or cycle cost. To get more accurate data, a |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 151 | calibration system is introduced. It's optional. |
| 152 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 153 | The counter logging APIs can be called from the secure or non-secure side. And the |
| 154 | cost of calling functions from these two worlds is different. So, secure and |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 155 | non-secure have different calibration data. |
| 156 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 157 | The system performance might float during the initialization, for example, change |
| 158 | CPU frequency, enable cache, etc. So, it's recommended that the calibration is |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 159 | done just before the first checkpoint. |
| 160 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 161 | * ``PROF_DO_CALIBRATE``: Call this macro to get the calibration value. The more ``rounds`` |
| 162 | the more accurate. |
| 163 | * ``PROF_GET_CALI_VALUE_FROM_TAG``: Get the calibration value from the tag. |
| 164 | The calibrated counter is ``current_counter - previous_counter - current_cali_value``. |
| 165 | Here ``current_cali_value`` equals ``PROF_GET_CALI_VALUE_FROM_TAG`` (current_tag). |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 166 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 167 | Data Analysis |
Summer Qin | 07e8f21 | 2023-07-05 17:05:07 +0800 | [diff] [blame] | 168 | ============= |
| 169 | |
| 170 | Data analysis interfaces can be used to do some basic analysis and the data |
| 171 | returned is calibrated already. |
| 172 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 173 | ``PROF_DATA_DIFF``: Get the counter value difference for the two tags. Returning |
| 174 | ``0`` indicates errors. |
Summer Qin | 07e8f21 | 2023-07-05 17:05:07 +0800 | [diff] [blame] | 175 | |
| 176 | If the checkpoints are logged by multi-times, you can get the following counter |
| 177 | value differences between two tags: |
| 178 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 179 | * ``PROF_DATA_DIFF_MIN``: Get the minimum counter value difference for the two tags. |
| 180 | Returning ``UINT32_MAX`` indicates errors. |
| 181 | * ``PROF_DATA_DIFF_MAX``: Get the maximum counter value difference for the two tags. |
| 182 | Returning ``0`` indicates errors. |
| 183 | * ``PROF_DATA_DIFF_AVG``: Get the average counter value difference for the two tags. |
| 184 | Returning ``0`` indicates errors. |
Summer Qin | 07e8f21 | 2023-07-05 17:05:07 +0800 | [diff] [blame] | 185 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 186 | A customized software or tool can be used to generate the analysis report based |
| 187 | on the data. |
Summer Qin | 07e8f21 | 2023-07-05 17:05:07 +0800 | [diff] [blame] | 188 | |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 189 | Profiler Self-test |
| 190 | ================== |
| 191 | |
| 192 | `profiler_self_test` is a quick test for all interfaces above. To build and run |
| 193 | in the Linux: |
| 194 | |
| 195 | .. code-block:: console |
| 196 | |
| 197 | cd profiler_self_test |
| 198 | mkdir build && cd build |
| 199 | cmake .. && make |
| 200 | ./prof_self_test |
| 201 | |
| 202 | ******************** |
| 203 | TF-M Profiling Cases |
| 204 | ******************** |
| 205 | |
| 206 | The profiler tool has already been integrated into TF-M to analyze the program |
| 207 | performance with the built-in profiling cases. Users can also add a new |
| 208 | profiling case to get a specific profiling report. TF-M profiling provides |
| 209 | example profiling cases in `profiling_cases`. |
| 210 | |
| 211 | PSA Client API Profiling |
| 212 | ======================== |
| 213 | |
| 214 | This profiling case analyzes the performance of PSA Client APIs called from SPE |
| 215 | and NSPE, including ``psa_connect()``, ``psa_call()``, ``psa_close()`` and ``stateless psa_call()``. |
| 216 | The main structure is: |
| 217 | |
| 218 | :: |
| 219 | |
| 220 | prof_psa_client_api/ |
| 221 | ├── cases |
| 222 | │ ├── non_secure |
| 223 | │ └── secure |
| 224 | └── partitions |
| 225 | ├── prof_server_partition |
| 226 | └── prof_client_partition |
| 227 | |
| 228 | * The `cases` folder is the basic SPE and NSPE profiling log and analysis code. |
Jianliang Shen | 25f6d7b | 2023-11-07 14:30:48 +0800 | [diff] [blame] | 229 | * NSPE can use `prof_log` library to print the analysis result. |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 230 | * `prof_server_partition` is a dummy secure partition. It immediately returns |
| 231 | once it receives a PSA client call from a client. |
| 232 | * `prof_client_partition` is the SPE profiling entry to trigger the secure profiling. |
| 233 | |
| 234 | To make this profiling report more accurate, It is recommended to disable other |
| 235 | partitions and all irrelevant tests. |
| 236 | |
| 237 | Adding New TF-M Profiling Case |
| 238 | ============================== |
| 239 | |
| 240 | Users can add source folder `<prof_example>` under path `profiling_cases` to |
| 241 | customize performance analysis of target processes, such as the APIs of secure |
| 242 | partitions, the functions in the SPM, or the user's interfaces. The |
| 243 | integration requires these steps: |
| 244 | |
| 245 | 1. Confirm the target process block to create profiling cases. |
| 246 | 2. Enable or create the server partition if necessary. Note that the other |
| 247 | irrelevant partitions shall be disabled. |
| 248 | 3. Find ways to output profiling data. |
| 249 | 4. Trigger profiling cases in SPE or NSPE. |
| 250 | |
| 251 | a. For SPE, a secure client partition can be created to trigger the secure profiling. |
Jianliang Shen | 25f6d7b | 2023-11-07 14:30:48 +0800 | [diff] [blame] | 252 | b. For NSPE, the profiling case entry can be added to the 'tfm_ns' target under the `tfm_profiling` folder. |
Jianliang Shen | eba9772 | 2023-08-16 13:34:50 +0800 | [diff] [blame] | 253 | |
| 254 | .. Note:: |
| 255 | |
| 256 | If the profiling case requires extra out-of-tree secure partition build, the |
| 257 | paths of extra partitions and manifest list file shall be appended in |
| 258 | ``TFM_EXTRA_PARTITION_PATHS`` and ``TFM_EXTRA_MANIFEST_LIST_FILES``. Refer to |
| 259 | `Adding Secure Partition`_. |
| 260 | |
| 261 | .. _Adding Secure Partition: https://git.trustedfirmware.org/TF-M/trusted-firmware-m.git/tree/docs/integration_guide/services/tfm_secure_partition_addition.rst |
Summer Qin | 07e8f21 | 2023-07-05 17:05:07 +0800 | [diff] [blame] | 262 | |
David Wang | bcb8b14 | 2022-02-17 17:31:40 +0800 | [diff] [blame] | 263 | -------------- |
| 264 | |
Summer Qin | 07e8f21 | 2023-07-05 17:05:07 +0800 | [diff] [blame] | 265 | *Copyright (c) 2022-2023, Arm Limited. All rights reserved.* |