Foreign Function Interface - The Runtime System - Real World OCaml (2013)

Real World OCaml (2013)

Part III. The Runtime System

Writing good OCaml code is only one half of the typical software engineering workflow — you also need to understand how OCaml executes this code, and how to debug and profile your production applications. Part III is thus all about understanding the compiler toolchain and runtime system in OCaml. It is a remarkably simple system in comparison to other language runtimes such as Java or the .NET CLR, so this chapter should be accessible to even the casual OCaml programmer.

We open by a guided tour through the Ctypes library for binding your OCaml code to foreign C libraries. We use a terminal interface and POSIX functions of increasing complexity to show the more advanced features of the library.

OCaml has a very predictable memory representation of values, which we explain in the next chapter by walking through the various datatypes. We then illustrate how the memory regions in OCaml are automatically managed by a garbage collector to ensure no memory leaks occur.

The part closes with two bigger chapters that explain the tools that comprise the OCaml compiler, breaking it up into two logical pieces. The first part covers the parser and type checker and highlights various tips to help you solve common problems in your source code. The second part covers code generation into bytecode and native code, and also explains how to debug and profile production binaries.

Chapter 19. Foreign Function Interface

OCaml has several options available to interact with non-OCaml code. The compiler can link with external system libraries via C code and also can produce standalone native object files that can be embedded within other non-OCaml applications.

The mechanism by which code in one programming language can invoke routines in a different programming language is called a foreign function interface. This chapter will:

§ Show how to call routines in C libraries directly from your OCaml code

§ Teach you how to build higher-level abstractions in OCaml from the low-level C bindings

§ Work through some full examples for binding a terminal interface and UNIX date/time functions

The simplest foreign function interface in OCaml doesn’t even require you to write any C code at all! The Ctypes library lets you define the C interface in pure OCaml, and the library then takes care of loading the C symbols and invoking the foreign function call.

Let’s dive straight into a realistic example to show you how the library looks. We’ll create a binding to the Ncurses terminal toolkit, as it’s widely available on most systems and doesn’t have any complex dependencies.

INSTALLING THE CTYPES LIBRARY

You’ll need to install the libffi library as a prerequisite to using Ctypes. It’s a fairly popular library and should be available in your OS package manager.

A special note for Mac users: the version of libffi installed by default in Mac OS X 10.8 is too old for some of the features that Ctypes needs. Use Homebrew to brew install libffi to get the latest version before installing the OCaml library.

Once that’s done, Ctypes is available via OPAM as usual:

Terminal

$ brew install libffi # for MacOS X users

$ opam install ctypes

$ utop

# require "ctypes.foreign" ;;

You’ll also need the Ncurses library for the first example. This comes preinstalled on many operating systems such as Mac OS X, and Debian Linux provides it as the libncurses5-dev package.

Example: A Terminal Interface

Ncurses is a library to help build terminal-independent text interfaces in a reasonably efficient way. It’s used in console mail clients like Mutt and Pine, and console web browsers such as Lynx.

The full C interface is quite large and is explained in the online documentation. We’ll just use the small excerpt, since we just want to demonstrate Ctypes in action:

C

typedefstruct _win_st WINDOW;

typedefunsignedint chtype;

WINDOW *initscr (void);

WINDOW *newwin (int, int, int, int);

void endwin (void);

void refresh (void);

void wrefresh (WINDOW *);

void addstr (constchar *);

int mvwaddch (WINDOW *, int, int, const chtype);

void mvwaddstr (WINDOW *, int, int, char *);

void box (WINDOW *, chtype, chtype);

int cbreak (void);

The Ncurses functions either operate on the current pseudoterminal or on a window that has been created via newwin. The WINDOW structure holds the internal library state and is considered abstract outside of Ncurses. Ncurses clients just need to store the pointer somewhere and pass it back to Ncurses library calls, which in turn dereference its contents.

Note that there are over 200 library calls in Ncurses, so we’re only binding a select few for this example. The initscr and newwin create WINDOW pointers for the global and subwindows, respectively. The mvwaddrstr takes a window, x/y offsets, and a string and writes to the screen at that location. The terminal is only updated after refresh or wrefresh are called.

Ctypes provides an OCaml interface that lets you map these C functions to equivalent OCaml functions. The library takes care of converting OCaml function calls and arguments into the C calling convention, invoking the foreign call within the C library and finally returning the result as an OCaml value.

Let’s begin by defining the basic values we need, starting with the WINDOW state pointer:

OCaml

openCtypes

type window = unit ptr

let window : window typ = ptr void

We don’t know the internal representation of the window pointer, so we treat it as a C void pointer. We’ll improve on this later on in the chapter, but it’s good enough for now. The second statement defines an OCaml value that represents the WINDOW C pointer. This value is used later in the Ctypes function definitions:

OCaml (part 1)

openForeign

let initscr =

foreign "initscr" (void @-> returning window)

That’s all we need to invoke our first function call to initscr to initialize the terminal. The foreign function accepts two parameters:

§ The C function call name, which is looked up using the dlsym POSIX function.

§ A value that defines the complete set of C function arguments and its return type. The @-> operator adds an argument to the C parameter list, and returning terminates the parameter list with the return type.

The remainder of the Ncurses binding simply expands on these definitions:

OCaml (part 2)

let newwin =

foreign "newwin"

(int @-> int @-> int @-> int @-> returning window)

let endwin =

foreign "endwin" (void @-> returning void)

let refresh =

foreign "refresh" (void @-> returning void)

let wrefresh =

foreign "wrefresh" (window @-> returning void)

let addstr =

foreign "addstr" (string @-> returning void)

let mvwaddch =

foreign "mvwaddch"

(window @-> int @-> int @-> char @-> returning void)

let mvwaddstr =

foreign "mvwaddstr"

(window @-> int @-> int @-> string @-> returning void)

let box =

foreign "box" (window @-> char @-> char @-> returning void)

let cbreak =

foreign "cbreak" (void @-> returning int)

These definitions are all straightforward mappings from the C declarations in the Ncurses header file. Note that the string and int values here are nothing to do with OCaml type declarations; instead, they are values that come from opening the Ctypes module at the top of the file.

Most of the parameters in the Ncurses example represent fairly simple scalar C types, except for window (a pointer to the library state) and string, which maps from OCaml strings that have a specific length onto C character buffers whose length is defined by a terminating null character that immediately follows the string data.

The module signature for ncurses.mli looks much like a normal OCaml signature. You can infer it directly from the ncurses.ml by running a special build target:

Terminal

$ corebuild -pkg ctypes.foreign ncurses.inferred.mli

$ cp _build/ncurses.inferred.mli .

The inferred.mli target instructs the compiler to generate the default signature for a module file and places it in the _build directory as a normal output. You should normally copy it out into your source directory and customize it to improve its safety for external callers by making some of its internals more abstract.

Here’s the customized interface that we can safely use from other libraries:

OCaml

type window

val window : window Ctypes.typ

val initscr : unit -> window

val endwin : unit -> unit

val refresh : unit -> unit

val wrefresh : window -> unit

val newwin : int -> int -> int -> int -> window

val mvwaddch : window -> int -> int -> char -> unit

val addstr : string -> unit

val mvwaddstr : window -> int -> int -> string -> unit

val box : window -> char -> char -> unit

val cbreak : unit -> int

The window type is left abstract in the signature to ensure that window pointers can only be constructed via the Ncurses.initscr function. This prevents void pointers obtained from other sources from being mistakenly passed to an Ncurses library call.

Now compile a “hello world” terminal drawing program to tie this all together:

OCaml

openNcurses

let () =

let main_window = initscr () in

ignore(cbreak ());

let small_window = newwin 10 10 5 5 in

mvwaddstr main_window 1 2 "Hello";

mvwaddstr small_window 2 2 "World";

box small_window '\000' '\000';

refresh ();

Unix.sleep 1;

wrefresh small_window;

Unix.sleep 5;

endwin ()

The hello executable is compiled by linking with the ctypes.foreign OCamlfind package:

Terminal

$ corebuild -pkg ctypes.foreign -lflags -cclib,-lncurses hello.native

Running ./hello.native should now display a Hello World in your terminal!

ON BUILD DIRECTIVES FOR CTYPES

The preceding command line includes some important extra link directives. The -lflags instructs ocamlbuild to pass the next comma-separated set of arguments through to the ocaml command when linking a binary. OCaml in turn uses -cclib to pass directives through to the system compiler (normally gcc or clang). We first need to link to the ncurses C library to make the symbols available to Ctypes, and -cclib,-lncurses does that.

On some distributions such as Ubuntu 11.10 upwards, you’ll also need to add -cclib,-Xlinker,-cclib, and --no-as-needed to the -lflags directive. -Xlinker is interpreted by the compiler as a directive for the system linker ld, to which it passes --no-as-needed. Several modern OS distributions (such as Ubuntu 11.10 onwards) configure the system linker to only link in libraries that directly contain symbols used by the program. However, when we use Ctypes, those symbols are not referenced until runtime, which results an exception due to the library not being available.

The --no-as-needed flag disables this behavior and ensures all the specified libraries are linked despite not being directly used. The flag unfortunately doesn’t work everywhere (notably, Mac OS X should not have this passed to it).

Ctypes wouldn’t be very useful if it were limited to only defining simple C types, of course. It provides full support for C pointer arithmetic, pointer conversions, and reading and writing through pointers, using OCaml functions as function pointers to C code, as well as struct and union definitions.

We’ll go over some of these features in more detail for the remainder of the chapter by using some POSIX date functions as running examples.

Basic Scalar C Types

First, let’s look at how to define basic scalar C types. Every C type is represented by an OCaml equivalent via the single type definition:

OCaml

type'a typ

Ctypes.typ is the type of values that represents C types to OCaml. There are two types associated with each instance of typ:

§ The C type used to store and pass values to the foreign library.

§ The corresponding OCaml type. The 'a type parameter contains the OCaml type such that a value of type t typ is used to read and write OCaml values of type t.

There are various other uses of typ values within Ctypes, such as:

§ Constructing function types for binding native functions

§ Constructing pointers for reading and writing locations in C-managed storage

§ Describing component fields of structures, unions, and arrays

Here are the definitions for most of the standard C99 scalar types, including some platform-dependent ones:

OCaml (part 1)

val void : unit typ

valchar : char typ

val schar : int typ

val short : int typ

valint : int typ

val long : long typ

val llong : llong typ

val nativeint : nativeint typ

val int8_t : int typ

val int16_t : int typ

val int32_t : int32 typ

val int64_t : int64 typ

val uchar : uchar typ

val uchar : uchar typ

val uint8_t : uint8 typ

val uint16_t : uint16 typ

val uint32_t : uint32 typ

val uint64_t : uint64 typ

val size_t : size_t typ

val ushort : ushort typ

val uint : uint typ

val ulong : ulong typ

val ullong : ullong typ

valfloat : float typ

val double : float typ

val complex32 : Complex.t typ

val complex64 : Complex.t typ

These values are all of type 'a typ, where the value name (e.g., void) tells you the C type and the 'a component (e.g., unit) is the OCaml representation of that C type. Most of the mappings are straightforward, but some of them need a bit more explanation:

§ Void values appear in OCaml as the unit type. Using void in an argument or result type specification produces an OCaml function that accepts or returns unit. Dereferencing a pointer to void is an error, as in C, and will raise the IncompleteType exception.

§ The C size_t type is an alias for one of the unsigned integer types. The actual size and alignment requirements for size_t varies between platforms. Ctypes provides an OCaml size_t type that is aliased to the appropriate integer type.

§ OCaml only supports double-precision floating-point numbers, and so the C float and double types both map onto the OCaml float type, and the C float complex and double complex types both map onto the OCaml double-precision Complex.t type.

Pointers and Arrays

Pointers are at the heart of C, so they are necessarily part of Ctypes, which provides support for pointer arithmetic, pointer conversions, reading and writing through pointers, and passing and returning pointers to and from functions.

We’ve already seen a simple use of pointers in the Ncurses example. Let’s start a new example by binding the following POSIX functions:

C

time_t time(time_t *);

double difftime(time_t, time_t);

char *ctime(consttime_t *timep);

The time function returns the current calendar time and is a simple start. The first step is to open some of the Ctypes modules:

Ctypes

The Ctypes module provides functions for describing C types in OCaml.

PosixTypes

The PosixTypes module includes some extra POSIX-specific types (such as time_t).

Foreign

The Foreign module exposes the foreign function that makes it possible to invoke C functions.

We can now create a binding to time directly from the toplevel.

OCaml utop

# #require"ctypes.foreign" ;;

# #require"ctypes.top" ;;

# openCtypes ;;

# openPosixTypes ;;

# openForeign ;;

# lettime = foreign"time" (ptrtime_t @-> returningtime_t) ;;

val time : time_t ptr -> time_t = <fun>

The foreign function is the main link between OCaml and C. It takes two arguments: the name of the C function to bind, and a value describing the type of the bound function. In the time binding, the function type specifies one argument of type ptr time_t and a return type of time_t.

We can now call time immediately in the same toplevel. The argument is actually optional, so we’ll just pass a null pointer that has been coerced into becoming a null pointer to time_t:

OCaml utop (part 1)

# letcur_time = time (from_voidptime_tnull) ;;

val cur_time : time_t = 1376834134

Since we’re going to call time a few times, let’s create a wrapper function that passes the null pointer through:

OCaml utop (part 2)

# lettime'() = time (from_voidptime_tnull) ;;

val time' : unit -> time_t = <fun>

Since time_t is an abstract type, we can’t actually do anything useful with it directly. We need to bind a second function to do anything useful with the return values from time. We’ll move on to difftime; the second C function in our prototype list:

OCaml utop (part 3)

# letdifftime =

foreign"difftime" (time_t @-> time_t @-> returningdouble) ;;

val difftime : time_t -> time_t -> float = <fun>

# lett1 =

time'()in

Unix.sleep2;

lett2 = time'()in

difftimet2t1 ;;

- : float = 2.

The binding to difftime above is sufficient to compare two time_t values.

Allocating Typed Memory for Pointers

Let’s look at a slightly less trivial example where we pass a nonnull pointer to a function. Continuing with the theme from earlier, we’ll bind to the ctime function, which converts a time_t value to a human-readable string:

OCaml utop (part 4)

# letctime = foreign"ctime" (ptrtime_t @-> returningstring) ;;

val ctime : time_t ptr -> string = <fun>

The binding is continued in the toplevel to add to our growing collection. However, we can’t just pass the result of time to ctime:

OCaml utop (part 5)

# ctime (time'()) ;;

Characters 7-15:

Error: This expression has type time_t but an expression was expected of type

time_t ptr

This is because ctime needs a pointer to the time_t rather than passing it by value. We thus need to allocate some memory for the time_t and obtain its memory address:

OCaml utop (part 6)

# lett_ptr = allocatetime_t (time'()) ;;

val t_ptr : time_t ptr = (int64_t*) 0x238ac30

The allocate function takes the type of the memory to be allocated and the initial value and it returns a suitably typed pointer. We can now call ctime passing the pointer as an argument:

OCaml utop (part 7)

# ctimet_ptr ;;

- : string = "Sun Aug 18 14:55:36 2013\n"

Using Views to Map Complex Values

While scalar types typically have a 1:1 representation, other C types require extra work to convert them into OCaml. Views create new C type descriptions that have special behavior when used to read or write C values.

We’ve already used one view in the definition of ctime earlier. The string view wraps the C type char * (written in OCaml as ptr char) and converts between the C and OCaml string representations each time the value is written or read.

Here is the type signature of the Ctypes.view function:

OCaml (part 2)

val view :

read:('a -> 'b) ->

write:('b -> 'a) ->

'a typ -> 'b typ

Ctypes has some internal low-level conversion functions that map between an OCaml string and a C character buffer by copying the contents into the respective data structure. They have the following type signature:

OCaml (part 3)

val string_of_char_ptr : char ptr -> string

val char_ptr_of_string : string -> char ptr

Given these functions, the definition of the Ctypes.string value that uses views is quite simple:

OCaml

letstring =

view (char ptr)

~read:string_of_char_ptr

~write:char_ptr_of_string

The type of this string function is a normal typ with no external sign of the use of the view function:

OCaml (part 4)

valstring : string.typ

OCAML STRINGS VERSUS C CHARACTER BUFFERS

Although OCaml strings may look like C character buffers from an interface perspective, they’re very different in terms of their memory representations.

OCaml strings are stored in the OCaml heap with a header that explicitly defines their length. C buffers are also fixed-length, but by convention, a C string is terminated by a null (a \0 byte) character. The C string functions calculate their length by scanning the buffer until the first null character is encountered.

This means that you need to be careful that OCaml strings that you pass to C functions don’t contain any null values, since the first occurrence of a null character will be treated as the end of the C string. Ctypes also defaults to a copying interface for strings, which means that you shouldn’t use them when you want the library to mutate the buffer in-place. In that situation, use the CtypesBigarray support to pass memory by reference instead.

Structs and Unions

The C constructs struct and union make it possible to build new types from existing types. Ctypes contains counterparts that work similarly.

Defining a Structure

Let’s improve the timer function that we wrote earlier. The POSIX function gettimeofday retrieves the time with microsecond resolution. The signature of gettimeofday is as follows, including the structure definitions:

C

struct timeval {

long tv_sec;

long tv_usec;

};

int gettimeofday(struct timeval *, struct timezone *tv);

Using Ctypes, we can describe this type as follows in our toplevel, continuing on from the previous definitions:

OCaml utop (part 8)

# typetimeval ;;

type timeval

# lettimeval : timevalstructuretyp = structure"timeval" ;;

val timeval : timeval structure typ = struct timeval

The first command defines a new OCaml type timeval that we’ll use to instantiate the OCaml version of the struct. This is a phantom type that exists only to distinguish the underlying C type from other pointer types. The particular timeval structure now has a distinct type from other structures we define elsewhere, which helps to avoid getting them mixed up.

The second command calls structure to create a fresh structure type. At this point, the structure type is incomplete: we can add fields but cannot yet use it in foreign calls or use it to create values.

Adding Fields to Structures

The timeval structure definition still doesn’t have any fields, so we need to add those next:

OCaml utop (part 9)

# lettv_sec = fieldtimeval"tv_sec"long ;;

val tv_sec : (Signed.long, (timeval, [ `Struct ]) structured) field = <abstr>

# lettv_usec = fieldtimeval"tv_usec"long ;;

val tv_usec : (Signed.long, (timeval, [ `Struct ]) structured) field =

<abstr>

# sealtimeval ;;

- : unit = ()

The field function appends a field to the structure, as shown with tv_sec and tv_usec. Structure fields are typed accessors that are associated with a particular structure, and they correspond to the labels in C.

Every field addition mutates the structure variable and records a new size (the exact value of which depends on the type of the field that was just added). Once we seal the structure, we will be able to create values using it, but adding fields to a sealed structure is an error.

Incomplete Structure Definitions

Since gettimeofday needs a struct timezone pointer for its second argument, we also need to define a second structure type:

OCaml utop (part 10)

# typetimezone ;;

type timezone

# lettimezone : timezonestructuretyp = structure"timezone" ;;

val timezone : timezone structure typ = struct timezone

We don’t ever need to create struct timezone values, so we can leave this struct as incomplete without adding any fields or sealing it. If you ever try to use it in a situation where its concrete size needs to be known, the library will raise an IncompleteType exception.

We’re finally ready to bind to gettimeofday now:

OCaml utop (part 11)

# letgettimeofday = foreign"gettimeofday"

(ptrtimeval @-> ptrtimezone @-> returning_checking_errnoint) ;;

val gettimeofday : timeval structure ptr -> timezone structure ptr -> int =

<fun>

There’s one other new feature here: the returning_checking_errno function behaves like returning, except that it checks whether the bound C function modifies the C error flag. Changes to errno are mapped into OCaml exceptions and raise a Unix.Unix_error exception just as the standard library functions do.

As before, we can create a wrapper to make gettimeofday easier to use. The functions make, addr, and getf create a structure value, retrieve the address of a structure value, and retrieve the value of a field from a structure:

OCaml utop (part 12)

# letgettimeofday'() =

lettv = maketimevalin

ignore(gettimeofday (addrtv) (from_voidptimezonenull));

letsecs = Signed.Long.(to_int (getftvtv_sec)) in

letusecs = Signed.Long.(to_int (getftvtv_usec)) in

Pervasives.(floatsecs +. floatusecs /. 1000000.0) ;;

val gettimeofday' : unit -> float = <fun>

# gettimeofday'() ;;

- : float = 1376834137.14

You need to be a little careful not to get all the open modules mixed up here. Both Pervasives and Ctypes define different float functions. The Ctypes module we opened up earlier overrides the Pervasives definition. As seen previously though, you just need to locally open Pervasives again to bring the usual float function back in scope.

Recap: A time-printing command

We built up a lot of bindings in the previous section, so let’s recap them with a complete example that ties it together with a command-line frontend:

OCaml

openCore.Std

openCtypes

openPosixTypes

openForeign

let time = foreign "time" (ptr time_t @-> returning time_t)

let difftime = foreign "difftime" (time_t @-> time_t @-> returning double)

let ctime = foreign "ctime" (ptr time_t @-> returning string)

type timeval

let timeval : timeval structure typ = structure "timeval"

let tv_sec = field timeval "tv_sec" long

let tv_usec = field timeval "tv_usec" long

let () = seal timeval

type timezone

let timezone : timezone structure typ = structure "timezone"

let gettimeofday = foreign "gettimeofday"

(ptr timeval @-> ptr timezone @-> returning_checking_errno int)

let time' () = time (from_voidp time_t null)

let gettimeofday' () =

let tv = make timeval in

ignore(gettimeofday (addr tv) (from_voidp timezone null));

let secs = Signed.Long.(to_int (getf tv tv_sec)) in

let usecs = Signed.Long.(to_int (getf tv tv_usec)) in

Pervasives.(float secs +. float usecs /. 1_000_000.)

let float_time () = printf "%f%!\n" (gettimeofday' ())

let ascii_time () =

let t_ptr = allocate time_t (time' ()) in

printf "%s%!" (ctime t_ptr)

let () =

letopenCommandin

basic ~summary:"Display the current time in various formats"

Spec.(empty +> flag "-a" no_arg ~doc:" Human-readable output format")

(fun human -> if human then ascii_time else float_time)

|> Command.run

This can be compiled and run in the usual way:

Terminal

$ corebuild -pkg ctypes.foreign datetime.native

$ ./datetime.native

1376833554.984496

$ ./datetime.native -a

Sun Aug 18 14:45:55 2013

WHY DO WE NEED TO USE RETURNING?

The alert reader may be curious about why all these function definitions have to be terminated by returning:

OCaml

(* correct types *)

val time: ptr time_t @-> returning time_t

val difftime: time_t @-> time_t @-> returning double

The returning function may appear superfluous here. Why couldn’t we simply give the types as follows?

OCaml (part 1)

(* incorrect types *)

val time: ptr time_t @-> time_t

val difftime: time_t @-> time_t @-> double

The reason involves higher types and two differences between the way that functions are treated in OCaml and C. Functions are first-class values in OCaml, but not in C. For example, in C it is possible to return a function pointer from a function, but not to return an actual function.

Secondly, OCaml functions are typically defined in a curried style. The signature of a two-argument function is written as follows:

OCaml (part 2)

val curried : int -> int -> int

but this really means:

OCaml (part 3)

val curried : int -> (int -> int)

and the arguments can be supplied one at a time to create a closure. In contrast, C functions receive their arguments all at once. The equivalent C function type is the following:

C

int uncurried_C(int, int);

and the arguments must always be supplied together:

C

uncurried_C(3, 4);

A C function that’s written in curried style looks very different:

C

/* A function that accepts an int, and returns a function

pointer that accepts a second int and returns an int. */

typedef int (function_t)(int);

function_t *curried_C(int);

/* supply both arguments */

curried_C(3)(4);

/* supply one argument at a time */

function_t *f = curried_C(3); f(4);

The OCaml type of uncurried_C when bound by Ctypes is int -> int -> int: a two-argument function. The OCaml type of curried_C when bound by ctypes is int -> (int -> int): a one-argument function that returns a one-argument function.

In OCaml, of course, these types are absolutely equivalent. Since the OCaml types are the same but the C semantics are quite different, we need some kind of marker to distinguish the cases. This is the purpose of returning in function definitions.

Defining Arrays

Arrays in C are contiguous blocks of the same type of value. Any of the basic types defined previously can be allocated as blocks via the Array module:

OCaml (part 5)

moduleArray : sig

type'a t = 'a array

val get : 'a t -> int -> 'a

val set : 'a t -> int -> 'a -> unit

val of_list : 'a typ -> 'a list -> 'a t

val to_list : 'a t -> 'a list

val length : 'a t -> int

val start : 'a t -> 'a ptr

val from_ptr : 'a ptr -> int -> 'a t

val make : 'a typ -> ?initial:'a -> int -> 'a t

end

The array functions are similar to those in the standard library Array module except that they operate on arrays stored using the flat C representation rather than the OCaml representation described in Chapter 20.

As with standard OCaml arrays, the conversion between arrays and lists requires copying the values, which can be expensive for large data structures. Notice that you can also convert an array into a ptr pointer to the head of the underlying buffer, which can be useful if you need to pass the pointer and size arguments separately to a C function.

Unions in C are named structures that can be mapped onto the same underlying memory. They are also fully supported in Ctypes, but we won’t go into more detail here.

POINTER OPERATORS FOR DEREFERENCING AND ARITHMETIC

Ctypes defines a number of operators that let you manipulate pointers and arrays just as you would in C. The Ctypes equivalents do have the benefit of being more strongly typed, of course (see Table 19-1).

Table 19-1. Operators for manipulating pointers and arrays

Operator

Purpose

!@ p

Dereference the pointer p.

p <-@ v

Write the value v to the address p.

p +@ n

If p points to an array element, then compute the address of the nth next element.

p -@ n

If p points to an array element, then compute the address of the nth previous element.

There are also other useful nonoperator functions available (see the Ctypes documentation), such as pointer differencing and comparison.

Passing Functions to C

It’s also straightforward to pass OCaml function values to C. The C standard library function qsort sorts arrays of elements using a comparison function passed in as a function pointer. The signature for qsort is:

C

void qsort(void *base, size_t nmemb, size_t size,

int(*compar)(constvoid *, constvoid *));

C programmers often use typedef to make type definitions involving function pointers easier to read. Using a typedef, the type of qsort looks a little more palatable:

C:

typedef int(compare_t)(constvoid *, constvoid *);

void qsort(void *base, size_t nmemb, size_t size, compare_t *);

This also happens to be a close mapping to the corresponding Ctypes definition. Since type descriptions are regular values, we can just use let in place of typedef and end up with working OCaml bindings to qsort:

OCaml utop

# #require"ctypes.foreign" ;;

# openCtypes ;;

# openPosixTypes ;;

# openForeign ;;

# letcompare_t = ptrvoid @-> ptrvoid @-> returningint ;;

val compare_t : (unit ptr -> unit ptr -> int) fn = <abstr>

# letqsort = foreign"qsort"

(ptrvoid @-> size_t @-> size_t @->

funptrcompare_t @-> returningvoid) ;;

val qsort :

unit ptr -> size_t -> size_t -> (unit ptr -> unit ptr -> int) -> unit =

<fun>

We only use compare_t once (in the qsort definition), so you can choose to inline it in the OCaml code if you prefer. As the type shows, the resulting qsort value is a higher-order function, since the fourth argument is itself a function. As before, let’s define a wrapper function to make qsorteasier to use. The second and third arguments to qsort specify the length (number of elements) of the array and the element size.

Arrays created using Ctypes have a richer runtime structure than C arrays, so we don’t need to pass size information around. Furthermore, we can use OCaml polymorphism in place of the unsafe void ptr type.

Example: A Command-Line Quicksort

The following is a command-line tool that uses the qsort binding to sort all of the integers supplied on the standard input:

OCaml

openCore.Std

openCtypes

openPosixTypes

openForeign

let compare_t = ptr void @-> ptr void @-> returning int

let qsort = foreign "qsort"

(ptr void @-> size_t @-> size_t @-> funptr compare_t @->

returning void)

let qsort' cmp arr =

letopenUnsigned.Size_tin

let ty = Array.element_type arr in

let len = of_int (Array.length arr) in

let elsize = of_int (sizeof ty) in

let start = to_voidp (Array.start arr) in

let compare l r = cmp (!@ (from_voidp ty l)) (!@ (from_voidp ty r)) in

qsort start len elsize compare;

arr

let sort_stdin () =

In_channel.input_lines stdin

|> List.map ~f:int_of_string

|> Array.of_list int

|> qsort' Int.compare

|> Array.to_list

|> List.iter ~f:(fun a -> printf "%d\n" a)

let () =

Command.basic ~summary:"Sort integers on standard input"

Command.Spec.empty sort_stdin

|> Command.run

Compile it in the usual way with corebuild and test it against some input data, and also build the inferred interface so we can examine it more closely:

Terminal

$ corebuild -pkg ctypes.foreign qsort.native

$ cat input.txt

5

3

2

1

4

$ ./qsort.native < input.txt

1

2

3

4

5

$ corebuild -pkg ctypes.foreign qsort.inferred.mli

$ cp _build/qsort.inferred.mli qsort.mli

The inferred interface shows us the types of the raw qsort binding and also the qsort' wrapper function:

OCaml

val compare_t : (unitCtypes.ptr -> unitCtypes.ptr -> int) Ctypes.fn

val qsort :

unitCtypes.ptr ->

PosixTypes.size_t ->

PosixTypes.size_t -> (unitCtypes.ptr -> unitCtypes.ptr -> int) -> unit

val qsort' : ('a -> 'a -> int) -> 'a Ctypes.array -> 'a Ctypes.array

val sort_stdin : unit -> unit

The qsort' wrapper function has a much more canonical OCaml interface than the raw binding. It accepts a comparator function and a Ctypes array, and returns the same Ctypes array. It’s not strictly required that it returns the array, since it modifies it in-place, but it makes it easier to chain the function using the |> operator (as sort_stdin does in the example).

Using qsort' to sort arrays is straightforward. Our example code reads the standard input as a list, converts it to a C array, passes it through qsort, and outputs the result to the standard output. Again, remember to not confuse the Ctypes.Array module with the Core.Std.Array module: the former is in scope since we opened Ctypes at the start of the file.

LIFETIME OF ALLOCATED CTYPES

Values allocated via Ctypes (i.e., using allocate, Array.make, and so on) will not be garbage-collected as long as they are reachable from OCaml values. The system memory they occupy is freed when they do become unreachable, via a finalizer function registered with the garbage collector (GC).

The definition of reachability for Ctypes values is a little different from conventional OCaml values, though. The allocation functions return an OCaml-managed pointer to the value, and as long as some derivative pointer is still reachable by the GC, the value won’t be collected.

“Derivative” means a pointer that’s computed from the original pointer via arithmetic, so a reachable reference to an array element or a structure field protects the whole object from collection.

A corollary of the preceding rule is that pointers written into the C heap don’t have any effect on reachability. For example, if you have a C-managed array of pointers to structs, then you’ll need some additional way of keeping the structs themselves around to protect them from collection. You could achieve this via a global array of values on the OCaml side that would keep them live until they’re no longer needed.

Functions passed to C have similar considerations regarding lifetime. On the OCaml side, functions created at runtime may be collected when they become unreachable. As we’ve seen, OCaml functions passed to C are converted to function pointers, and function pointers written into the C heap have no effect on the reachability of the OCaml functions they reference. With qsort things are straightforward, since the comparison function is only used during the call to qsort itself. However, other C libraries may store function pointers in global variables or elsewhere, in which case you’ll need to take care that the OCaml functions you pass to them aren’t prematurely garbage-collected.

Learning More About C Bindings

The Ctypes distribution contains a number of larger-scale examples, including:

§ Bindings to the POSIX fts API, which demonstrates C callbacks more comprehensively

§ A more complete Ncurses binding than the example we opened the chapter with

§ A comprehensive test suite that covers the complete library, and can provide useful snippets for your own bindings

This chapter hasn’t really needed you to understand the innards of OCaml at all. Ctypes does its best to make function bindings easy, but the rest of this part will also fill you in about interactions with OCaml memory layout in Chapter 20 and automatic memory management in Chapter 21.

Ctypes gives OCaml programs access to the C representation of values, shielding you from the details of the OCaml value representation, and introduces an abstraction layer that hides the details of foreign calls. While this covers a wide variety of situations, it’s sometimes necessary to look behind the abstraction to obtain finer control over the details of the interaction between the two languages.

You can find more information about the C interface in several places:

§ The standard OCaml foreign function interface allows you to glue OCaml and C together from the other side of the boundary, by writing C functions that operate on the OCaml representation of values. You can find details of the standard interface in the OCaml manual and in the bookDeveloping Applications with Objective Caml.

§ Florent Monnier maintains an excellent online OCaml that provides examples of how to call OCaml functions from C. This covers a wide variety of OCaml data types and also more complex callbacks between C and OCaml.

§ SWIG is a tool that connects programs written in C/C++ to a variety of higher-level programming languages, including OCaml. The SWIG manual has examples of converting library specifications into OCaml bindings.

Struct Memory Layout

The C language gives implementations a certain amount of freedom in choosing how to lay out structs in memory. There may be padding between members and at the end of the struct, in order to satisfy the memory alignment requirements of the host platform. Ctypes uses platform-appropriate size and alignment information to replicate the struct layout process. OCaml and C will have consistent views about the layout of the struct as long as you declare the fields of a struct in the same order and with the same types as the C library you’re binding to.

However, this approach can lead to difficulties when the fields of a struct aren’t fully specified in the interface of a library. The interface may list the fields of a structure without specifying their order, or make certain fields available only on certain platforms, or insert undocumented fields into struct definitions for performance reasons. For example, the struct timeval definition used in this chapter accurately describes the layout of the struct on common platforms, but implementations on some more unusual architectures include additional padding members that will lead to strange behavior in the examples.

The Cstubs subpackage of Ctypes addresses this issue. Rather than simply assuming that struct definitions given by the user accurately reflect the actual definitions of structs used in C libraries, Cstubs generates code that uses the C library headers to discover the layout of the struct. The good news is that the code that you write doesn’t need to change much. Cstubs provides alternative implementations of the field and seal functions that you’ve already used to describe struct timeval; instead of computing member offsets and sizes appropriate for the platform, these implementations obtain them directly from C.

The details of using Cstubs are available in the online documentation, along with instructions on integration with autoconf platform portability instructions.