Exo 1
======

Q1
---

Path B0-B1-B2-B5 :

1. Path predicate:

	y0=x0+1 and y0<=5 and x0<=0 and y1=y0*2

2. Condition To trigger the vulnerability
	(y0=x0+1 and y0<=5 and x0<=0 and y1=y0*2) and (y1 > 9 or y1<0)

3. solution ? 
   y1 <= 2, e.g,  y1=-2, thus y0=-1 and x0=-2

Q2
---

Path B0-B1-B2-B4-B1-B2-B5
	
1. Condition to follow this path and trigger the vuln.: 

  y0=x0+1 and y0<=5 and x0>0 and y1=y0*3 and x1=x0+2 and y1<=5 and x1<=0 and y2=y1*2
		and
	(y2 > 9 or y2<0)

solutions ?
	x0>0 and x1=x0+2 et x1<0 and ...
=> no solution in Z

  Solution in signed integers (with wrap-around) : 
	x0 = MAX_INT and y0=-1 and y1=-3 and x1=-2 and y2=-6 
(assuming MAX_INT + 1 = -1)


Q3
---

1. Code with assertion:  

int x, y ;
int T[10] ;

read(x) ;
assert (x<MAX_INT-1)	(A0)
y = x+1 ;

while (y <= 5) {
   if (x > 0) {
	  assert (y<MAXINT/3)   (A1)
      y = y*3 ;
	  assert (x<MAXINT-2)   (A2)
      x = x+2 ;
   } else { 
	  assert (y<MAXINT/2)   (A3)
	  y = y*2 ;
      assert (y>=0 && y<10)  (A4)
      T[y] = x ;
   } ;
} ;


2. Abstract Interpretation with intervals

iteration 1

out(B0) = 
	x in [-inf, +inf]
	y in [-inf, +inf]

in(B2) = 
	x in [-inf, +inf]
	y in [-inf, 5]

in(B4) = 
	x in [1, +inf]
	y in [-inf, 5]


in(B5) = 
	x in [-inf, O]
	y in [-inf, 5]

A0 is not discharged in B0
A1 and A2 are discharged in B4
A3 is discharged in B5
A4 is not discharged in B5

These results won't change in further iterations ...

% =======================================================

DOWSER

Q1 :
    Dowser objectives = 
		- find buffer overflows in C/C++ pgms
        - based on a combination of static and dymanic analysis 
			- static analysis to detect "hot spot", i.e., pgm locations where BoF are more likely to occure
			- dynamic analysis (actually concolic execution) to attempt to confirm the BoF, with two 
               characteristics:
                     - (pointer) value coverage, to increase the chance of getting inavlid memory access
					- taint analysis, to reduce the size of the symbolic inputs 
		-> try to "stress" as much as possible the pointers used in loops to catch BoFs

	Workflow = 
		1. Static analysis on the source code to get loop ranks
		2. Dynamic taint analysis to indentify the pgm inputs influencing the BoF prone loops
		3. Symbolic execution to try to trigger the BoF 
		4. Use AdSan as an oracle, to detect BoF occurrences ...


Q2:
	1. input = source-level C/C++ pgm, translated in LLVM 
       output = scores associated to pgm loops, indicating there "probability" to lead to a BoF
	2. it's a static analysis

	3.  metric to select interesting loops = "complex array accesses", depending on the complexity 
        of control and data-flow involved in the array index computations, such that 
			- the pointer value is modified within the loop 
			- complex "pointer arithmetic" (casts, non-constant operations, non local operations, etc.)
			- non trivial guards on pointer operations (i.e.. <> from comparusons with a constant) 
       is modified in the loop 

	4. There exists at least trivial BoF which are not caught, e.g., 

			int T[10] ;
			int x = 100 ;
			while (....)
				T[x] = ...   /* x never modified in the loop */
			
Q3:           
	1. input = a pgm + ranked loops with a "vulnerable instruction group" + test inputs able to exercise 
	    these instrutions
		output = set of pgm inputs bytes influencing the execution of each vulnerable instruction group
	2. dynamic analysis 
    3. track so-called "strict control dependencies", i.e., direct comparisons between taint variables and constants
		e.g : 
				// x is tainted 
	            if (x == 'a') 
					 y = ... // y considered as tainted
             
	
				// x is tainted, z is not tainted 
	            if (x == z) 
					 y = ... // y not considered as tainted

	4. false dependencies = nested if-then-else in an arbitary order 
        (i.e, no real dependencies between the cases)
		partially solved byu shifting the order of unrelated input fields
		need some knowledge about he pgm input ...

	            if (x == 'a') 
					 y = ... // y considered as tainted
  

Q4:
	1. input = a pgm + ranked loops with a "vulnerable instruction group" + set of input bytes able
                  to influence the execution of these instructions
		output = BoF detected by AdSan

	2. dynamic analysis (concolic execution)

	3. Since symbolic execution can only execute *finite* paths, this technique cannot fully handle loops
       (the number of iterations has to be bounded), making the approach unsound in general (real bugs 
         can be missed).
	
		Dowser proceeds in 2 steps:
			- learning, where paths traversing interesting loops are ranked according to their potential 
              influence on pointer dereferences. This allows to prune "irrelevant" branches (the ones 
   	          not leading to any new dereference addresses), and to flag branches according to their ability 
              to exercise "unique" pointer access patterns.  In this learning phase only a subset of the 
              symbolic input is considered (for performance reasons). 	
           - symbolic execution, guided by the results from the learning phase, to cover at most as possible 
             the various pointer access patterns in the target loop.
	    However, Dowser can still miss interesting paths, the approach is therefore not sound.
        Conversly, even if false positives may happen, they do not lead to incorrect BoF reports 
        (they only penalize the excution time).

Q5: 
	  1. Dowser cannot report false positives, but it can miss bugs (false negatives).

	  2. The same general approach (static analysis + guided symbolic execution) can be used for 
         detecting other vulnerabilities, provided a suitable static analysis technique to identify 
         the "hot spots" (+ a good guiding strategy).

	   3. In principle Dowser could work directly from the binary code, but its efficiency would certainly be lower 
          (disassembling may lower a lot the accuracy of the  static analysis step). 

		4. The vulnerability of Exercise 1 could be detected by Dowser (the weight of the loop is non-zero).


Q5: