Room 206 (2nd floor, badged access)
18 juin 2024 - 09h30
Verified Extraction from Coq to OCaml
par Matthieu Sozeau de Inria (Galinette), Université de Nantes
Abstract: Joint work of Yannick Forster, Matthieu Sozeau and Nicolas Tabareau.
One of the central original features of the Coq proof assistant is extraction, i.e., the ability to obtain efficient programs in industrial programming languages such as OCaml, Haskell, or Scheme from programs written in Coq's expressive dependent type theory. Extraction is of great practical usefulness, used crucially e.g., in the CompCert project. However, for such executables obtained by extraction, the extraction process is part of the trusted code base (TCB), as are Coq's kernel and the compiler used to compile the extracted code. The extraction process contains intricate semantic transformation of programs that rely on subtle operational features of both the source and target language. Its code has also evolved since the last theoretical exposition in the seminal PhD thesis of Pierre Letouzey. Furthermore, while the exact correctness statements for the execution of extracted code are described clearly in academic literature, the interoperability with unverified code has never been investigated formally, and yet is used in virtually every project relying on extraction.
We will describe the development of a novel extraction pipeline from Coq to OCaml, implemented and verified in Coq itself, with a clear correctness theorem and guarantees for safe interoperability. We build our work on the MetaCoq project, which aims at decreasing the TCB of Coq's kernel by reimplementing it in Coq itself and proving it correct w.r.t. a formal specification of Coq's type theory in Coq. Since OCaml does not have a formal specification, we make use of the Malfunction project specifying the semantics of the first, untyped intermediate language of the OCaml compiler. Our work fills some gaps in the literature and highlights important differences between the operational semantics of Coq programs and their extracted variants. In particular, we focus on the guarantees that can be provided for interoperability with unverified code, identify guarantees that are infeasible to provide, and raise interesting open question regarding semantic guarantees that could be provided.
Our central result is a formal proof that extracted programs of first-order data type are always correct (they terminate producing the same value as the original program) and can safely interoperate with arbitrary OCaml code, whereas for higher-order programs already simple interoperation can lead to incorrect behaviour and even outright crashes.