Update doc about minimum max_ops value

Ok, so the original plan was to make mpi_inv_mod() the smallest block that
could not be divided. Updated plan is that the smallest block will be either:
- ecp_normalize_jac_many() (one mpi_inv_mod() + a number or mpi_mul_mpi()s)
- or the second loop in ecp_precompute_comb()

With default settings, the minimum non-restartable sequence is:
- for P-256: 222M
- for P-384: 341M

This is within a 2-3x factor of originally planned value of 120M. However,
that value can be approached, at the cost of some performance, by setting
ECP_WINDOW_SIZE (w below) lower than the default of 6. For example:
- w=4 -> 166M for any curve (perf. impact < 10%)
- w=2 -> 130M for any curve (perf. impact ~ 30%)

My opinion is that the current state with w=4 is a good compromise, and the
code complexity need to attain 120M is not warranted by the 1.4 factor between
that and the current minimum with w=4 (which is close to optimal perf).
diff --git a/library/ecp.c b/library/ecp.c
index a1f019d..b3bddbf 100644
--- a/library/ecp.c
+++ b/library/ecp.c
@@ -1397,7 +1397,7 @@
     for( i = 1; i < T_len; i <<= 1 )
         TT[j++] = T + i;
 
-    ECP_BUDGET( ECP_OPS_INV + 6 * j - 2 ); // XXX: split next function?
+    ECP_BUDGET( ECP_OPS_INV + 6 * j - 2 );
 
     MBEDTLS_MPI_CHK( ecp_normalize_jac_many( grp, TT, j ) );
 
@@ -1414,7 +1414,7 @@
 add:
 #endif
 
-    ECP_BUDGET( ( T_len - 1 ) * ECP_OPS_ADD ); // XXX: split loop?
+    ECP_BUDGET( ( T_len - 1 ) * ECP_OPS_ADD );
 
     for( i = 1; i < T_len; i <<= 1 )
     {
@@ -1440,7 +1440,7 @@
     for( j = 0; j + 1 < T_len; j++ )
         TT[j] = T + j + 1;
 
-    ECP_BUDGET( ECP_OPS_INV + 6 * j - 2 ); // XXX: split next function?
+    ECP_BUDGET( ECP_OPS_INV + 6 * j - 2 );
 
     MBEDTLS_MPI_CHK( ecp_normalize_jac_many( grp, TT, j ) );