designing large projects in OCaml [closed]

I am going to answer for a medium-sized project in the conditions that I am familiar with, that is between 100K and 1M lines of source code and up to 10 developers. This is what we are using now, for a project started two months ago in August 2013.

Build system and code organization:

  • one source-able shell script defines PATH and other variables for our project
  • one .ocamlinit file at the root of our project loads a bunch of libraries when starting a toplevel session
  • omake, which is fast (with -j option for parallel builds); but we avoid making crazy custom omake plugins
  • one root Makefile contains all the essential targets (setup, build, test, clean, and more)
  • one level of subdirectories, not two
  • most subdirectories build into an OCaml library
  • some subdirectories contain other things (setup, scripts, etc.)
  • OCAMLPATH contains the root of the project; each library subdirectory produces a META file, making all OCaml parts of the projects accessible from the toplevel using #require.
  • only one OCaml executable is built for the whole project (saves a lot of linking time; still not sure why)
  • libraries are installed via a setup script using opam
  • local opam packages are made for software that it not in the official opam repository
  • we use an opam switch which is an alias named after our project, avoiding conflicts with other projects on the same machine

Source-code editing:

  • emacs with opam packages ocp-indent and ocp-index

Source control and management:

  • we use git and github
  • all new code is peer-reviewed via github pull requests
  • tarballs for non-opam non-github libraries are stored in a separate git repository (that can be blown away if history gets too big)
  • bleeding-edge libraries existing on github are forked into our github account and installed via our own local opam package

Use of OCaml:

  • OCaml will not compensate for bad programming practices; teaching good taste is beyond the scope of this answer. http://ocaml.org/learn/tutorials/guidelines.html is a good starting point.
  • OCaml 4.01.0 makes it much easier than before to reuse record field labels and variant constructors (i.e. type t1 = {x:int} type t2 = {x:int;y:int} let t1_of_t2 ({x}:t2) : t1 = {x} now works)
  • we try to not use camlp4 syntax extensions in our own code
  • we do not use classes and objects unless mandated by some external library
  • in theory since OCaml 4.01.0 we should prefer classic variants over polymorphic variants
  • we use exceptions to indicate errors and let them go through happily until our main server loop catches them and interprets them as “internal error” (default), “bad request”, or something else
  • exceptions such as Exit or Not_found can be used locally when it makes sense, but in module interfaces we prefer to use options.

Libraries, protocols, frameworks:

  • we use Batteries for all commodity functions that are missing from OCaml’s standard library; for the rest we have a “util” library
  • we use Lwt for asynchronous programming, without the syntax extensions, and the bind operator (>>=) is the only operator that we use (if you have to know, we do reluctantly use camlp4 preprocessing for better exception tracking on bind points).
  • we use HTTP and JSON to communicate with 3rd-party software and we expect every modern service to provide such APIs
  • for serving HTTP, we run our own SCGI server (ocaml-scgi) behind nginx
  • as an HTTP client we use Cohttp
  • for JSON serialization we use atdgen

“Cloud” services:

  • we use quite a lot of them as they are usually cheap, easy to interact with, and solve scalability and maintenance problems for us.

Testing:

  • we have one make/omake target for fast tests and one for slow tests
  • fast tests are unit tests; each module may provide a “test” function; a test.ml file runs the list of tests
  • slow tests are those that involve running multiple services; these are crafted specifically for our project, but they cover as much as possible as a production service. Everything runs locally either on Linux or MacOS, except for cloud services for which we find ways to not interfere with production.

Setting this all up is quite a bit of work, especially for someone not familiar with OCaml. There is no framework taking care of all that yet, but at least you get the choice of the tools.

Leave a Comment