Bringing Back Moore's Law

Harshad Deo

4 minutes read

The SimianQuant library provides a Fluent API on top of its symbolic engine to implement functions with feedback. Technical indicators are a classic example of the usage of this pattern. This article presents the results of benchmarking the generated implementations (in C++) of common indicators.

To summarise:

  1. The runtime performance on commodity hardware is in the single/double digit nanosecond range, and comparable with custom hardware
  2. The runtime cost of evaluating multiple indicators together is less than the total cost of evaluating them separately. This is partially because of graph compression.
  3. The gains of efficiency scale with increasing complexity, i.e. the more complex the strategy/indicator, the more the library is able to do to optimise it.


To illustrate the capabilities of the library, three examples are considered:

  1. Simple Moving Average, with a period ranging from 3 ticks to 20 ticks
  2. Multi Simple Moving Average, in which the all the all previous simple moving averages are computed.
  3. Exponentially Weighted Moving Average, in which multiple exponentially weighted moving averages are computed, with the decay factors specified at runtime.

The next section covers how the measurements were made and how the results were interpreted. While interesting in their own right, the details are not relevant.

Simple Moving Average

The figure below presents the results for the evaluation of the Simple Moving Average, from 3 ticks to 20 ticks. The time taken to evaluate scales approximately linearly, however the time taken to evaluate the 10 tick moving average is not twice that taken to evaluate the 5 tick moving average.

Multi Simple Moving Average

The figure below presents the results for the evaluation of multiple Simple Moving Averages. The period axis is the period upto which the SMAs are computed, i.e. the implementation with period 5 evaluates the 3, 4 and 5 period SMA. The period 20 Multiple Simple Moving Average evaluates 18 SMAs, and evaluating them together is 5 times faster than evaluating them separately. This is partially because evaluating them separately scales as O(n2) while evaluating them together scales as O(n), because of graph compression.

Exponentially Weighted Moving Average

The figure below presents the result of evaluating multiple Exponentially Weighted Moving Averages. The Count axis is the number of EWMAs computed, i.e. the implementation with count 5 evaluates 5 EWMAs. The sawtooth pattern is because certain configurations are amenable to being implemented as data parallel operations using Advanced Vector Extensions.


Given the problem specification using a StatefulTransformer, i.e. the Fluent API that to model functions with feedback, C++ implementations for all of the datastructures were generated. For each implementation, the time taken to step through an input vector, i.e. evaluate the indicator for each element of the vector and assign the result to a non-elidable variable, was measured using Google Benchmark. This is illustrated by the following code snippet:

static void BM_movingaverage_3(benchmark::State &state) {
  for (auto _ : state) {
    std::vector<double> &vec = // fetch the vector
    double res;
    Sma3 sma3(0, 0); // initialize the calculator
    for (double &x : vec) {
      benchmark::DoNotOptimize(res = sma3.getSma());

The time taken to fetch the input vector and initialize the indicator datastructure was ignored. In practice it would alter the presented benchmarks by less than less than 0.1%. To ensure consistency of the benchmarks,

  1. The size of the input vector was varied over 10 orders of magnitude, from 210 to 220 elements
  2. Each measurement was repeated 10 times
  3. The entire suite of benchmarks was run on two machines, a Google Compute Engine instance running at a clock speed of 2 GHz and an Amazon Web Services EC2 instance running at 3 GHz.

The operating system (Ubuntu 18.04), compiler (g++ 7.3.0), microarchitecture (Skylake) and compiler configuration -O3 -march=native were held constant across all runs.

To simplify the presentation and interpretation of the benchmarks, this article adopted a heuristic to obtain a single number that represents the time taken to evaluate an indicator: given a log-log plot of the median runtime against the length of the argument vector, a least squares straight line was fit and the y-intercept was interpreted as the log of the time taken to evaluate the function once. In principle, the slope of the line should be 1, in practice, it deviated from 1 by less then 0.5%.


You don't win friends with salad

Recent posts

See more