Measuring the performance of lustre-mt
This tutorial will illustrate :
- the use of recursive nodes
- the use the tool
lustre-mt
to parallelize these tasks - xxx
The objective is to write a Lustre program that show the interest of using tasks to speed-up execution, and to measure it.
To achieve that, we want to write a Lustre program that computes several task in parallel in a parametric manner:
- n: will define the number of tasks
- m: will define the complexity of each task
To do that, we will generate n
matrices of size m x m
, and computes their
To compute the determinant, we need to be able to transpose a matrix (it may be be the most efficient way to do it, but the point is precisely to perform heavy computations
function transpose<<const n:int; const m:int; type t>>(x:t^m^n) returns (y:t^n^m);
%expand:call%
let
y =
-- with n = 0 then []
-- No empty array in Lustre! Sigh.
with n = 1 then
-- since scal_to_array is expanded, this forces to use the --expand-iterators option
-- which is the only way to expand map (and other iterators)
map<<scal_to_array<<t>>, m >>(x[0])
else
add_vect_to_matrix <<n-1,m,t>>(x[0], transpose<<n-1,m,t>>(x[1..n-1]));
tel
function scal_to_array<<type t>>(x:t) returns (res:t^1);
%expand:call%
let
res = [x];
tel
function add_vect_to_matrix<<const n:int; const m:int; type t>>(x:t^m; y:t^n^m)
returns (res:t^(n+1)^m);
%expand:call%
let
res =
with m = 1 then [ [ x[0]] | y[0] ]
else [ [ x[0]] | y[0] ]
| add_vect_to_matrix<<n,m-1,t>> (x[1..m-1], y[1..m-1]);
tel
function test_transpose(x:int^4^2) returns (res:int^2^4);
%expand:call%
let
res = transpose<<2,4,int>>(x);
tel
function transpose_is_involutive(x:bool^4^2) returns (res:bool);
var
y: bool^2^4;
z: bool^4^2;
let
y = transpose<<2,4,bool>>(x);
z = transpose<<4,2,bool>>(y);
res = ( x = z );
tel
-- The following works
-- lv6 transpose.lus -n transpose_is_involutive -ec -o proove_me.ec &&
-- lesar proove_me.ec transpose_is_involutive
-- so I guess transpose is correct...
Now we can define a node that computes the determinant:
-- Compute the determinant of a matrix
function determinant<<const n:int>>(m : int^n^n) returns (res : int);
let
res = with n = 2 then m[0][0] * m[1][1] - m[0][1] * m[1][0]
else iter_line<<n,0>>(1,m);
tel
function iter_line<<const n:int; const j:int>>(sign:int; m:int^n^n) returns (s : int);
%expand:call%
let
s = with j = n then 0
else sign * m[0][j] * determinant<<n-1>>(sub_matrix<<0,j,n>> (m)) + iter_line<<n,j+1>>(-sign, m);
tel
-- returns m without its i_th line and j_th column
-- maybe using transpose is not the most efficient way to do that,
-- but here we are interested in heavy computations :)
function sub_matrix<<const i:int; const j:int; const n:int>>(m:int^n^n)
returns (res : int^(n-1)^(n-1));
%expand:call%
var
t1: int ^ n ^ (n-1);
t2: int ^ (n-1) ^ n;
t3: int ^ (n-1) ^ (n-1);
let
t1 = with i=0 then m[1..n-1] else
with i=n-1 then m[0..n-2] else
m[0..i-1] | m[i+1..n-1];
t2 = transpose<<n-1,n,int>>(t1);
t3 = with j=0 then t2[1..n-1] else
with j=n-1 then t2[0..n-2] else
t2[0..j-1] | t2[j+1..n-1];
res = transpose<<n-1,n-1,int>>(t3);
tel
function test_determinant(m:int^3^3) returns (res:int);
let
res = determinant<<3>>(m);
tel
To generate random matrices, we need a Pseudo Random Numbers Generator
-- PRNG
-- A linear congruential generator
-- https://en.wikipedia.org/wiki/Linear_congruential_generator
-- X(n+1) = (a X(n) + c) mod m
--m=2^16+1 a=75 c=74
function lcg<<const m: int; const a: int; const c: int>>(X_n:int) returns (X_np1:int);
let
X_np1 = (a * X_n + c) mod m;
tel
-- a variant, with a memory
node lcg2<<const m: int; const a: int; const c: int>>(init:int) returns (X:int);
let
X = init -> (a * pre X + c) mod m;
tel
node zx81 = lcg<<65537, 75, 74>>
So now can we generate Random 3D matrices:
-- a stupid function to fill n x m x p matrices with some numbers
function random_3D<<const n:int; const m:int; const p:int>>(init : int) returns (x : int^n^m^p);
var accout : int;
let
accout, x = fill<< fill<< fill<<random_acc; n>>; m >>; p>>(init);
tel
function random_acc(accin : int) returns (accout, val : int);
%expand:call%
let
accout = val;
val = zx81(accin);
tel
function test_random_3D(i:int) returns (res:int^3^3^3);
let
res = random_3D<<3,3,3>>(i);
tel
Let’s write a node that computes the determinant of m
nxn
matrices, in m
different tasks, and computes the sum of them:
-- computes the sum of the determinants of t[i] for i=0 to n-1
-- Each determinant will be computed in its own task
node sum_determinant<<const m:int; const n:int; const i:int>>(t:int^n^n^m) returns (res:int)
%expand:call%
var
x : int;
let
x = %MT:task% determinant<<n>>(t[i-1]);
res = -- 0 -> -- XXX pour que les taches soient initialisées !!!
-- en effet, pour l'instant, les noeuds sans memoire ne sont
-- pas initialisés, ce qui pose un pb quand on les met dans des taches: XXX TODO
with i = 1
then x
else x + sum_determinant<<m,n,i-1>>(t) ;
tel
Now we have everything in hand to perform out experiment:
- generate
m
random squares matrices of sizen
- compute the sum of their determinant, in
m
tasks:
-- Illustrate the use of --2c-multi-task
node multitask_gen<<
const N:int; -- we work with matrices of size N x N
const M: int -- controls the tasks number
>> (x:int) returns (res:int);
var
big_3D_array : int ^ N ^ N ^ M;
let
big_3D_array = random_3D<<N,N,M>>(x);
res = sum_determinant<<M,N,M>>(big_3D_array);
tel
Note the use of the %expand:call%
pragmas in most of the recursive
nodes. Indeed, to be able to put the computation of the determinants
in different tasks, we do not want the lv6
compiler to expand nodes
with static arguments, which it does by default. We will therefore
use the --do-not-expand-static-nodes
option.
But on the other hand, if we don’t expand any of those nodes, the
necessary memory would be huge for big values of n
(XXX explain
that). This is actually the reason why such parametric nodes are
expanded by default.
Hence, we want to expand all generic nodes, except determinant
, the
only way is to use the --do-not-expand-static-nodes
option, plus
carefully located %expand:call%
pragmas.
As you may have noticed, those pragmas need to be placed in node
definitions. Iterators (such as map
that is used in transpose
)
are predefined; but they can be expanded using the
--expand-iterators
option .
XXX fixit ? it is actually mandatory here to use this option because
there is a map to an expanded node (scal_to_array
). kind of a lv6
bug
Let’s define a few instances of this generic node:
node multitask = multitask_gen<<4,5>>
node small = multitask_gen<<3,3>>
node big = multitask_gen<<10,20>>
To execute for instance the small
node:
lv6 multitask.lus -n small --expand-iterators --do-not-expand-static-nodes --to-c-execute
or, using CLI option short versions:
lv6 multitask.lus -n small -ei -desn -2c-exec
In order to take advantage of task parallelism, we can use the
--2c-multi-task
; or -2cmt
for short:
lv6 multitask.lus -n small -ei -desn -2c-exec -2cmt
Here the results using several value of n
and m