Task definition

Computation of premium and reserve values for endowment insurance policies. The basis version is stripped down version of a computation module as used by a insurance company. All computations are done with double precision and double-operations as defined by IEEE-754.

The optimized versions has to exactly reproduce the results of the original program. Discrepancies because of optimizations / reordering of floating point operations or due to usage of higher-precision intermediate results are therefore not tolerated.

Team

Bernhard Urban, lewurm@gmail.com
Josef Eisl, josef.eisl@student.tuwien.ac.at
Stefan Neubauer, sneubauer@gmx.at

Presentation

You can find the presentation here.

Running things

Location on g0: /nfs/unsafe/httpd/ftp/pub/anton/lvas/effizienz-abgaben/2011w/bjs
Compiling and running a version, e.g. v5:
> cd src
> cd v5
> make
> ./v5

Basis version - v1

For each of the three present value formulas (survival, death, annuity) there is one function with formulas implemented in straightforward way. Thus the program does match the technical tariff description, is comprehensible and easy to maintain.

With optimization switch -O3 for gcc a drastical performance boost is reached. The compiler inlines all functions except the math library function 'pow'.

The test cases for the performance measurements are generated using the stdlib random number generator with fixed seed value:

v2

Methods for calculation of present values put together into one method. Loop fusion for present values of death and annuity.

v3

Incremential computation of probability of survival over the years for present values of death and annuity. The final probability of survival is used for computation of present values for survival.

v4

Precomputation of often used values vn(...).

v5

Move conditional out of method 'qx' into 'barwert' where cases with x>=100 are handled separately and knowledge about q/p is used to simplify that case.

v6

Specialication for cases for people without extra hazards (addrisk=0).

v7

Precomputation of barwert for all common cases.

v8

v9

Precomputation of normalized results values for addrisk=0

v10

Caching of mostly used vn-values.

v11

v12

v13

Optimization overview

v1 v2 v3 v3.1 v3.2 v4 v4.1 v4.2 v5 v6 v7 v7.x v8 v9 v10 v11 v11.1 v11.2 v12 v13 v14
Loop Rule 1: Code motion out of loops xxx x x x x x x x x x x
Loop Rule 2: Combining Tests
Loop Rule 3: Loop Unrolling x
Loop Rule 4: Transfer-Driven Unrolling / Modulo Variable Renaming x
Loop Rule 5: Unconditional Branch Removal xxxxxxxxxxx x xxxx xx xx xx xx xx xx xx
Loop Rule 6: Loop Fusion xxxxxxxxxx x x x x x x x x x x
Loop Peeling x x
Software Pipelining x
Logic Rule 1: Exploit Algebraic Identities xxx x x x x x x x xx x x
Logic Rule 2: Short-Circuiting Monotone Functions x x x x x x
Logic Rule 3: Reordering Tests
Logic Rule 4: Precompute logical functions
Logic rule 5: Boolean/State Variable Elimination x x x x x x x x x
Long-circuiting
Arithmetic with flags
Alternate flag representation
Space-For-Time 1: Data Structure Augmentation x x x xx xx xx xx xx xx
Space-For-Time 2: Precompute Functions xxxxxxxxxxxxx xx xx xx xx xx xx xx
Space-For-Time 3: Exploit Common Cases - Memoization (Caching)
Space-For-Time 4: Lazy Evaluation
Time-For-Space 1: Packing xx x
Time-For-Space 2: Interpreters, Factoring
Space-For-Time 1 opposite: Data Structure Reduction x
Space-For-Time 2 opposite: Recompute Results
Space-For-Time 3 opposite: Uncaching
Space-For-Time 4 opposite: Eager Evaluation
Time-For-Space 1 opposite: Unpacking
Time-For-Space 2 opposite: Compilation to code x
Procedure Rule 1: Collapsing Procedure Hierarchies - Inlining ccccccccccc c c c c c c c c c c
Procedure Rule 1: Collapsing Procedure Hierarchies - Specialization xx x x x x x x x x x x
Procedure Rule 2: Exploit Common Cases - Precomputed tables/code sequences x x xx xx xx xx xx xx xx
Procedure Rule 3: Coroutines
Procedure Rule 4: Transformation on Recursive Procedures
Procedure Rule 5: Parallelism - prefetching, instruction scheduling ccccccccccc c c c c c c c c c c
Recursion or strip mining for cache-blocking
Expression Rule 1: Compile-Time Initialization x xx xx xx xx xx xx xx
Expression Rule 2: Strength Reduction/Incremental Algorithms/Differentiation xxxxxxxxx xxxxxxxxxxx- xx x xx xx xx
Expression Rule 3: Common Subexpression Elimination x x x x x x xx xx xx
Expression Rule 4: Pairing Computation xxxxxxxxxx x x x x x x x x x x
Expression Rule 5: Exploit word parallelism / SIMD ccccccccccc c c c c c c cx c c c

Results

  took [ms] total cycles (tsc) loc binsize branch mispred L2 access u+s instr user cycles system cycles
v1 -O040.590107.811.766.30019114.610252.992.7004.971.190133.925.626.549107.750.516.05861.302.306
v1 -O317.04045.321.190.74019114.563167.488.975957.97766.647.366.78145.302.381.32818.828.864
v2 14.82039.445.528.51017713.57195.634.6182.350.90754.201.707.20139.427.922.79517.624.268
v3 12.43033.079.188.56016513.37937.303.026120.55340.739.516.82633.064.169.83815.037.324
v3.1 12.72033.821.425.20033917.23372.105.9171.513.05340.924.678.03433.802.838.32518.614.920
v3.2 12.27032.663.476.68016013.21942.012.284902.27540.567.699.94132.650.397.44813.091.556
v4 5801.541.626.24017213.3864.098.58255.7402.593.109.5891.539.988.6911.638.380
v4.11.0102.690.269.53017212.5544.132.31881.0652.645.301.0072.688.077.0822.195.196
v5 5301.422.501.06018113.4183.842.25450.6452.233.418.4941.420.768.0571.734.528
v6 5201.389.941.13025814.2183.616.45449.0632.155.970.1461.386.996.2152.946.660
v7 300817.998.80032015.0572.058.8312.859.7871.275.602.609815.891.1792.109.510
v7.1340917.032.10034315.3131.953.6712.798.8851.341.472.808914.662.7262.370.438
v7.2330895.924.07039616.0491.944.5632.817.1801.352.886.151885.217.34710.707.768
v7.35701.517.942.4009.626166.4012.979.24881.797.8141.386.377.6181.514.630.1163.313.492
v7.41.3503.605.461.3901.391.81131.745.6552.429.86995.398.0042.217.719.0653.564.199.60241.279.628
v7.5350931.469.85058.639800.5802.060.6285.329.4641.261.986.622927.600.0513.872.154
v7.6340915.661.6701.80039.5672.147.9473.085.2771.314.510.004912.532.9523.129.372
v7.74701.255.015.1508.499162.5961.253.87841.884.1801.253.091.0701.252.388.4082.627.942
v8290785.230.01036916.1191.765.8792.868.7681.203.447.326783.038.1862.192.938
v8.1300821.937.63036717.6942.076.7302.936.6261.215.385.324819.759.2002.180.086
v8.2270725.439.71034917.7341.742.2883.120.0481.130.889.549722.526.4432.914.142
v8.3270725.644.620340657.1111.798.7643.065.3481.130.497.499723.775.1451.870.684
v8.4260718.872.8003447.021.5261.815.043468.9361.070.422.006713.808.2305.070.574
v8.5260730.616.09035017.7511.815.683455.6321.071.800.789714.974.23015.644.196
v8.6260712.085.0103513.416.0541.814.148454.3041.069.933.817708.925.3303.162.056
v8.7260717.114.95035715.8631.808.583431.7831.071.039.959707.310.8859.804.884
v9260712.035.0703513.416.0541.813.777455.2381.069.944.597708.755.9313.280.980
v10120320.681.0603583.416.9201.581.238359.985573.303.002317.670.1763.013.888
v11100289.020.0102713.415.9001.555.709359.794552.358.602285.803.2763.218.148
v11.1100296.199.8602733.415.9961.554.152360.598536.571.561293.265.0482.936.862
v11.26801.834.271.5903013.416.1561.548.280373.646736.784.3351.830.587.0913.687.406
v12100283.449.2802843.416.1971.614.741360.126553.853.065280.217.2873.235.192
v13100276.029.2402893.416.2611.504.346360.429537.868.968273.025.6203.005.446
v14100263.174.8902903.416.2931.502.030362.193545.477.489259.987.5653.188.864

Dropped Optimizations