Measuring the performance of lustre-mt

This tutorial will illustrate :

  • the use of recursive nodes
  • the use the tool lustre-mt to parallelize these tasks
  • xxx

The objective is to write a Lustre program that show the interest of using tasks to speed-up execution, and to measure it.

To achieve that, we want to write a Lustre program that computes several task in parallel in a parametric manner:

  • n: will define the number of tasks
  • m: will define the complexity of each task

To do that, we will generate n matrices of size m x m, and computes their

To compute the determinant, we need to be able to transpose a matrix (it may be be the most efficient way to do it, but the point is precisely to perform heavy computations

function transpose<<const n:int; const m:int; type t>>(x:t^m^n) returns (y:t^n^m);
%expand:call%
let
  y =
  -- with n = 0 then []
  -- No empty array in Lustre! Sigh.
  with n = 1 then
-- since scal_to_array is expanded, this forces to use the --expand-iterators option
-- which is the only way to expand map (and other iterators)
     map<<scal_to_array<<t>>, m >>(x[0])
  else
     add_vect_to_matrix <<n-1,m,t>>(x[0], transpose<<n-1,m,t>>(x[1..n-1]));
tel

function scal_to_array<<type t>>(x:t) returns (res:t^1);
%expand:call%
let
  res = [x];
tel

function add_vect_to_matrix<<const n:int; const m:int; type t>>(x:t^m; y:t^n^m)
returns (res:t^(n+1)^m);
%expand:call%
let
  res =
  with m = 1 then [ [ x[0]] | y[0] ]
  else            [ [ x[0]] | y[0] ]
                  | add_vect_to_matrix<<n,m-1,t>> (x[1..m-1], y[1..m-1]);
tel

function test_transpose(x:int^4^2) returns (res:int^2^4);
%expand:call%
let
  res = transpose<<2,4,int>>(x);
tel

function transpose_is_involutive(x:bool^4^2) returns (res:bool);
var
  y: bool^2^4;
  z: bool^4^2;
let
  y = transpose<<2,4,bool>>(x);
  z = transpose<<4,2,bool>>(y);
  res = ( x = z );
tel
-- The following works
--   lv6 transpose.lus -n transpose_is_involutive -ec -o proove_me.ec &&
--   lesar proove_me.ec transpose_is_involutive
-- so I guess transpose is correct...

Now we can define a node that computes the determinant:

-- Compute the determinant of a matrix
function determinant<<const n:int>>(m : int^n^n) returns (res : int);
let
  res = with n = 2 then m[0][0] * m[1][1] - m[0][1] * m[1][0]
                   else iter_line<<n,0>>(1,m);

tel
function iter_line<<const n:int; const j:int>>(sign:int; m:int^n^n) returns (s : int);
%expand:call%
let
  s = with j = n then 0
      else sign * m[0][j] * determinant<<n-1>>(sub_matrix<<0,j,n>> (m)) + iter_line<<n,j+1>>(-sign, m);

tel

-- returns m without its i_th line and j_th column
--   maybe using transpose is not the most efficient way to do that,
--   but here we are interested in heavy computations :)
function sub_matrix<<const i:int; const j:int; const n:int>>(m:int^n^n)
returns (res : int^(n-1)^(n-1));
%expand:call%
var
  t1: int ^ n ^ (n-1);
  t2: int ^ (n-1) ^ n;
  t3: int ^ (n-1) ^ (n-1);
let
  t1 = with i=0   then m[1..n-1] else
       with i=n-1 then m[0..n-2] else
                       m[0..i-1] | m[i+1..n-1];
  t2 = transpose<<n-1,n,int>>(t1);
  t3 = with j=0   then t2[1..n-1] else
       with j=n-1 then t2[0..n-2] else
                       t2[0..j-1] | t2[j+1..n-1];
  res = transpose<<n-1,n-1,int>>(t3);
tel

function test_determinant(m:int^3^3) returns (res:int);
let
  res = determinant<<3>>(m);
tel

To generate random matrices, we need a Pseudo Random Numbers Generator

-- PRNG
-- A linear congruential generator
-- https://en.wikipedia.org/wiki/Linear_congruential_generator
-- X(n+1) = (a X(n) + c) mod m
--m=2^16+1 a=75 c=74

function lcg<<const m: int; const a: int; const c: int>>(X_n:int) returns (X_np1:int);
let
  X_np1 =  (a * X_n + c) mod m;
tel

-- a variant, with a memory
node lcg2<<const m: int; const a: int; const c: int>>(init:int) returns (X:int);
let
  X = init -> (a * pre X + c) mod m;
tel

node zx81 = lcg<<65537, 75, 74>>

So now can we generate Random 3D matrices:

-- a stupid function to fill n x m x p matrices with some numbers
function random_3D<<const n:int; const m:int; const p:int>>(init : int) returns (x : int^n^m^p);
var accout : int;
let
  accout, x = fill<< fill<< fill<<random_acc; n>>; m >>; p>>(init);
tel
function random_acc(accin : int) returns (accout, val : int);
%expand:call%
let
  accout = val;
  val = zx81(accin);
tel
function test_random_3D(i:int) returns (res:int^3^3^3);
let
  res = random_3D<<3,3,3>>(i);
tel

Let’s write a node that computes the determinant of m nxn matrices, in m different tasks, and computes the sum of them:

-- computes the sum of the determinants of t[i] for i=0 to n-1
-- Each determinant will be computed in its own task
node sum_determinant<<const m:int; const n:int; const i:int>>(t:int^n^n^m) returns (res:int)
%expand:call%
var
  x : int;
let
  x =  %MT:task% determinant<<n>>(t[i-1]);
  res = -- 0 -> -- XXX pour que les taches soient initialisées !!!
  -- en effet, pour l'instant, les noeuds sans memoire ne sont
  -- pas initialisés, ce qui pose un pb quand on les met dans des taches: XXX TODO
        with i = 1
        then x
        else x + sum_determinant<<m,n,i-1>>(t) ;
tel

Now we have everything in hand to perform out experiment:

  • generate m random squares matrices of size n
  • compute the sum of their determinant, in m tasks:
-- Illustrate the use of --2c-multi-task
node multitask_gen<<
        const N:int; -- we work with matrices of size N x N
        const M: int -- controls the tasks number
>> (x:int) returns (res:int);
var
  big_3D_array : int ^ N ^ N ^ M;
let
  big_3D_array = random_3D<<N,N,M>>(x);
  res = sum_determinant<<M,N,M>>(big_3D_array);
tel

Note the use of the %expand:call% pragmas in most of the recursive nodes. Indeed, to be able to put the computation of the determinants in different tasks, we do not want the lv6 compiler to expand nodes with static arguments, which it does by default. We will therefore use the --do-not-expand-static-nodes option.

But on the other hand, if we don’t expand any of those nodes, the necessary memory would be huge for big values of n (XXX explain that). This is actually the reason why such parametric nodes are expanded by default.

Hence, we want to expand all generic nodes, except determinant, the only way is to use the --do-not-expand-static-nodes option, plus carefully located %expand:call% pragmas.

As you may have noticed, those pragmas need to be placed in node definitions. Iterators (such as map that is used in transpose) are predefined; but they can be expanded using the --expand-iterators option .

XXX fixit ? it is actually mandatory here to use this option because there is a map to an expanded node (scal_to_array). kind of a lv6 bug

Let’s define a few instances of this generic node:

node multitask = multitask_gen<<4,5>>
node small = multitask_gen<<3,3>>
node big = multitask_gen<<10,20>>

To execute for instance the small node:

lv6  multitask.lus -n small --expand-iterators --do-not-expand-static-nodes --to-c-execute

or, using CLI option short versions:

lv6  multitask.lus -n small -ei -desn -2c-exec

In order to take advantage of task parallelism, we can use the --2c-multi-task; or -2cmt for short:

lv6  multitask.lus -n small -ei -desn -2c-exec  -2cmt

Here the results using several value of n and m