Reduced the input / output overhead with 200+ bytes and covered corner
case

The actual input / output buffer overhead is only 301 instead of 512.
This requires a proper check on the padding_idx to prevent out of bounds
reads.

Previously a remote party could potentially trigger an access error and
thus stop the application when sending a malicious packet having
MAX_CONTENT_LEN of data, 32 bytes of MAC and a decrypted padlen of .
This would result in reading from in_ctr + 13 + 32 + MAX_CONTENT_LEN - 1 - 1
for 256 bytes (including fake padding check). Or 13 + 32 bytes over the
buffer length.

We now reset padding_idx to 0, if it's clear that it will never be a
valid padding (padlen > msg_len || msg_len + padlen + 256 > buffer_len)
diff --git a/library/ssl_tls.c b/library/ssl_tls.c
index 75ba907..6ea2821 100644
--- a/library/ssl_tls.c
+++ b/library/ssl_tls.c
@@ -1610,6 +1610,21 @@
             size_t pad_count = 0, real_count = 1;
             size_t padding_idx = ssl->in_msglen - padlen - 1;
 
+            /*
+             * Padding is guaranteed to be incorrect if:
+             *   1. padlen - 1 > ssl->in_msglen
+             *
+             *   2. ssl->in_msglen + padlen >
+             *        SSL_MAX_CONTENT_LEN + 256 (max padding)
+             *
+             * In both cases we reset padding_idx to a safe value (0) to
+             * prevent out-of-buffer reads.
+             */
+            correct &= ( ssl->in_msglen >= padlen - 1 );
+            correct &= ( ssl->in_msglen + padlen <= SSL_MAX_CONTENT_LEN + 256 );
+
+            padding_idx *= correct;
+
             for( i = 1; i <= 256; i++ )
             {
                 real_count &= ( i <= padlen );