scripts/gen_ldelf_hex.py: relax rules for PT_LOAD segments

Latest Clang [1] generates the following ldelf.elf:

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x001000 0x00000000 0x00000000 0x04834 0x04834 R E 0x1000
  LOAD           0x005838 0x00004838 0x00004838 0x01620 0x01620 R   0x1000
  LOAD           0x007000 0x00006000 0x00006000 0x0006c 0x0006c RW  0x1000
  LOAD           0x00706c 0x0000606c 0x0000606c 0x00068 0x00078 RW  0x1000
  DYNAMIC        0x007000 0x00006000 0x00006000 0x00060 0x00060 RW  0x4
  GNU_RELRO      0x007000 0x00006000 0x00006000 0x0006c 0x01000 R   0x1
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0
  EXIDX          0x006800 0x00005800 0x00005800 0x002b8 0x002b8 R   0x4

Nothing wrong with that from a strict ELF compliance point of view, but
it doesn't meet the requirements of our current gen_ldelf_hex.py script
which makes the build fail:

 $ scripts/gen_ldelf_hex.py --input out/arm-plat-vexpress/ldelf/ldelf.elf \
                            --output out/arm-plat-vexpress/core/ldelf_hex.c
 Expected load segment to be read/write

I think our script is a bit too strict, what really matters is that
OP-TEE creates two memory mappings for the PT_LOAD segments of ldelf,
one is RX and the other is RW. We can therefore concatenate segments as
long as we have one or more non-writable segments followed by one or
more writable ones.

This commit relaxes the requirements in gen_ldelf_hex.py and implements
the above conditions instead.

[1] clang version 11.0.0 (https://github.com/llvm/llvm-project.git
    6b3168f8cdb46656330929877b0b4daab35d30de)

Signed-off-by: Jerome Forissier <jerome@forissier.org>
Tested-by: Jerome Forissier <jerome@forissier.org> (QEMU, GCC 8.3/Clang 10/Clang pre-11)
Tested-by: Jerome Forissier <jerome@forissier.org> (QEMUv8, GCC 8.3/Clang 10)
Reviewed-by: Jens Wiklander <jens.wiklander@linaro.org>
diff --git a/scripts/gen_ldelf_hex.py b/scripts/gen_ldelf_hex.py
index dfc9e06..79f96b2 100755
--- a/scripts/gen_ldelf_hex.py
+++ b/scripts/gen_ldelf_hex.py
@@ -37,46 +37,79 @@
 
 def emit_load_segments(elffile, outf):
     load_size = 0
+    code_size = 0
     data_size = 0
-    next_rwseg_va = 0
+    load_segments = [s for s in elffile.iter_segments()
+                     if s['p_type'] == 'PT_LOAD']
+    prev_segment = None
+    pad = 0
+    pad_size = []
+    w_found = False
     n = 0
-    for segment in elffile.iter_segments():
-        if segment['p_type'] == 'PT_LOAD':
-            if n == 0:
-                if segment['p_flags'] != (P_FLAGS.PF_R | P_FLAGS.PF_X):
-                    print('Expected first load segment to be read/execute')
-                    sys.exit(1)
-                code_size = segment['p_filesz']
-            else:
-                if segment['p_flags'] != (P_FLAGS.PF_R | P_FLAGS.PF_W):
-                    print('Expected load segment to be read/write')
-                    sys.exit(1)
-                if next_rwseg_va and segment['p_vaddr'] != next_rwseg_va:
-                    print('Expected contiguous read/write segments')
-                    print(segment['p_vaddr'])
-                    print(next_rwseg_va)
-                    sys.exit(1)
-                data_size += segment['p_filesz']
-                next_rwseg_va = segment['p_vaddr'] + segment['p_filesz']
-            load_size += segment['p_filesz']
-            n = n + 1
-
+    # Check that load segments ordered by VA have the expected layout:
+    # read only first, then read-write. Compute padding at end of each segment,
+    # 0 if none is required.
+    for segment in load_segments:
+        if prev_segment:
+            pad = segment['p_vaddr'] - (prev_segment['p_vaddr'] +
+                                        prev_segment['p_filesz'])
+        else:
+            if segment['p_flags'] & P_FLAGS.PF_W:
+                print('Expected RO load segment(s) first')
+                sys.exit(1)
+        if segment['p_flags'] & P_FLAGS.PF_W:
+            if not w_found:
+                # End of RO segments, discard padding for the last one (it
+                # would just take up space in the generated C file)
+                pad = 0
+                w_found = True
+        else:
+            if w_found:
+                print(f'RO load segment found after RW one(s) (m={n})')
+                sys.exit(1)
+        if prev_segment:
+            if pad > 31:
+                # We expect segments to be tightly packed together for memory
+                # efficiency. 31 is an arbitrary, "sounds reasonable" value
+                # which might need to be adjusted -- who knows what the
+                # compiler/linker can do.
+                print(f'Warning: suspiciously large padding ({pad}) after '
+                      f'load segment {n-1}, please check')
+            pad_size.append(pad)
+        prev_segment = segment
+        n = n + 1
+    pad_size.append(0)
+    n = 0
+    # Compute code_size, data_size and load_size
+    for segment in load_segments:
+        sz = segment['p_filesz'] + pad_size[n]
+        if segment['p_flags'] & P_FLAGS.PF_W:
+            data_size += sz
+        else:
+            code_size += sz
+        load_size += sz
+        n = n + 1
+    n = 0
+    i = 0
+    # Output data to C file
     outf.write(b'const uint8_t ldelf_data[%d]' % round_up(load_size, 4096))
     outf.write(b' __aligned(4096) = {\n')
-    i = 0
-    for segment in elffile.iter_segments():
-        if segment['p_type'] == 'PT_LOAD':
-            data = segment.data()
-            for n in range(segment['p_filesz']):
-                if i % 8 == 0:
-                    outf.write(b'\t')
-                outf.write(b'0x' + '{:02x}'.format(data[n]).encode('utf-8')
-                           + b',')
-                i = i + 1
-                if i % 8 == 0 or i == load_size:
-                    outf.write(b'\n')
-                else:
-                    outf.write(b' ')
+    for segment in load_segments:
+        data = segment.data()
+        if pad_size[n]:
+            # Pad with zeros if needed
+            data += bytearray(pad_size[n])
+        for j in range(len(data)):
+            if i % 8 == 0:
+                outf.write(b'\t')
+            outf.write(b'0x' + '{:02x}'.format(data[j]).encode('utf-8')
+                       + b',')
+            i = i + 1
+            if i % 8 == 0 or i == load_size:
+                outf.write(b'\n')
+            else:
+                outf.write(b' ')
+        n = n + 1
     outf.write(b'};\n')
 
     outf.write(b'const unsigned int ldelf_code_size = %d;\n' % code_size)