Verimag

Cholesky Factorization

 Introduction

Certain applications require the execution of a system of numerical equations on complex numbers of very large data-sets. These tasks, computed sequentially, consume a large computation time thus, necessitating the exploitation of parallelization strategies. Presented here is a parallelized model of Cholesky Decomposition of square symmetric matrices into its Lower Triangular matrix. We have used a right-looking version of the block Cholesky algorithm. The package is a DOL functional simulation which uses the DOL API routines for process computation and communication.

The given package gives user the freedom to define the custom dimensions of the matrix and more importantly the number of its divisions. Depending on the number of divisions, the process source code will be generated each pertaining to one single sub-matrix and each computing the Cholesky factorization of its own sub-matrix. The given package also generates an XML file which defines the process network, consisting of processes, ports, FIFO software channels, and their connections. These are then compiled into a SystemC application and executed.

 Downloads

  • DOL : The DOL Package should be downloaded and installed from the DOL website here.
  • Model Generator : Cholesky Model Generator can be downloaded from here. Once downloaded, extract the contents of the package in dol/examples/cholesky/.

 Run

To run the model do the following steps.

  • STEP 1: Change directory
    ~$ cd dol/examples/cholesky
  • STEP 2: Execute
    ~/dol/examples/cholesky$ ./runmodel.sh 1 2 8 1

It does two things:

  • 2.a: Generates the process network defined in XML and generates the process C source code files. The output should look like the following:
    ~/dol/examples/cholesky$ ./runmodel.sh 1 2 8 1

    [RUNMODEL] Create directory model1.


    [RUNMODEL] Generate the XML


    [RUNMODEL] check compliance for model1_flattened.xml.

    model1.xml is valid.

    [RUNMODEL] Create flattened XML model1_flattened.xml.

    ....................................................................................................................................
    [RUNMODEL] Generate C source files.


    [RUNMODEL] Run dol.

    Read process network from XML file
    -- full filename: file:~/dol/examples/cholesky/model1/model1_flattened.xml
    -- Process network model from XML [Finished]

    Generating ProcessNetwork in Dotty format:
    -- Generation [Finished]

    Generating HdS package:
    -- Generation [Finished]


    [RUNMODEL] Make SystemC application.
  • 2.b: Next it executes the generated SystemC application. The output should look like the following:
           
    [RUNMODEL] Run SystemC application.


                SystemC 2.2.0 --- Feb 21 2011 18:02:10
           Copyright (c) 1996-2006 by all Contributors
                       ALL RIGHTS RESERVED
    Generator: Lower Triangular Input:
    4.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    7.00 6.00 0.00 0.00 0.00 0.00 0.00 0.00
    5.00 6.00 9.00 0.00 0.00 0.00 0.00 0.00
    4.00 2.00 8.00 8.00 0.00 0.00 0.00 0.00
    9.00 4.00 9.00 5.00 7.00 0.00 0.00 0.00
    7.00 6.00 7.00 3.00 10.00 1.00 0.00 0.00
    8.00 3.00 3.00 4.00 9.00 9.00 1.00 0.00
    10.00 9.00 8.00 6.00 6.00 7.00 5.00 6.00


    Generator Multiply:
    16.00 28.00 20.00 16.00 36.00 28.00 32.00 40.00
    28.00 85.00 71.00 40.00 87.00 85.00 74.00 124.00
    20.00 71.00 142.00 104.00 150.00 134.00 85.00 176.00
    16.00 40.00 104.00 148.00 156.00 120.00 94.00 170.00
    36.00 87.00 150.00 156.00 252.00 235.00 194.00 270.00
    28.00 85.00 134.00 120.00 235.00 244.00 206.00 265.00
    32.00 74.00 85.00 94.00 194.00 206.00 261.00 277.00
    40.00 124.00 176.00 170.00 270.00 265.00 277.00 427.00



    Output:
    4.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    7.00 6.00 0.00 0.00 0.00 0.00 0.00 0.00
    5.00 6.00 9.00 0.00 0.00 0.00 0.00 0.00
    4.00 2.00 8.00 8.00 0.00 0.00 0.00 0.00
    9.00 4.00 9.00 5.00 7.00 0.00 0.00 0.00
    7.00 6.00 7.00 3.00 10.00 1.00 0.00 0.00
    8.00 3.00 3.00 4.00 9.00 9.00 1.00 0.00
    10.00 9.00 8.00 6.00 6.00 7.00 5.00 6.00

    [RUNMODEL] Exited normally.
    ~/dol/examples/cholesky$

runmodel.sh has the following usage in general:

        ./runmodel.sh MODEL_NUM NCUTS DIM LENGTH

where:

  • MODEL_NUM : is a unique ID which you can provide to your cholesky model. If an existing number is used then the tool will overwrite the previous generated model.
  • NCUTS : Number of matrix divisions. For e.g. if the matrix is divided into n*n parts then n is the value of NCUTS and n(n+1)/2 will be the number of processes generated - pertaining to the lower triangle.
  • DIM : Dimension of the square symmetric matrix which you want to factorize. IMPORTANT: NCUTS must fully divide the parameter DIM.
  • LENGTH : Number of matrices you want to factorize.
    PNG - 9.3 kb
    Figure 1. Process Network

Figure 1 shows the example process network consisting of 3 processes in addition to a generator and a consumer. It is described in the file model1.xml. The behavior of different processes are described in their source files p_0_0.c, p_1_0.c and p_1_1.c.

 Description

The code generators are written in Java and included in cholesky/lib/ directory along with their binaries. Also included are the C source codes for generator and consumer. The generator generates a random low triangle matrix and its square symmetric matrix and sends the sub-matrices to their respective processes. The consumer on the other hand congregates the factorized sub-matrices from individual processes and prints the result on the standard I/O. This result can be verified with the low-triangle matrix generated by the generator previously.

PNG - 6 kb
Figure 2. Process Network

Figure 2 shows the process network for a matrix divided into 4*4 parts and hence the 10 sub-matrices. For simplicity, we have not shown the FIFO software channels, the generator and the consumer processes. The similar colored communication paths pertain to the parallel communication. Similarly, different colored communication paths pertain to sequential communication. Hence, we see, as the number of processes is increased so will be the parallel computation vertically, sequential data dependency horizontally and the overall cost of interprocess communication. Also, if we generate more than one matrix we will benefit from pipelining. Further, an intelligent mapping and scheduling scheme can help in lowering the overall computation time.


Attached documents

2 May 2011
Document : GZ
2.2 Mb

2 May 2011
Document : XML
5.2 kb

2 May 2011
Document : C source
754 bytes

2 May 2011
Document : C source
703 bytes

2 May 2011
Document : C source
776 bytes

Contact | Site Map | Site created with SPIP 2.1.26 + AHUNTSIC [CC License]

Logged in visitors: 19 ; visits: 472192