How do I use a C library in a Rust library compiled to WebAssembly?

TL;DR: Jump to “New week, new adventures” in order to get “Hello from C and Rust!”

The nice way would be creating a WASM library and passing it to the linker. rustc has an option for that (and there seem to be source-code directives too):

rustc <yourcode.rs> --target wasm32-unknown-unknown --crate-type=cdylib -C link-arg=<library.wasm>

The trick is that the library has to be a library, so it needs to contain reloc (and in practice linking) sections. Emscripten seems to have a symbol for that, RELOCATABLE:

emcc <something.c> -s WASM=1 -s SIDE_MODULE=1 -s RELOCATABLE=1 -s EMULATED_FUNCTION_POINTERS=1 -s ONLY_MY_CODE=1 -o <something.wasm>

(EMULATED_FUNCTION_POINTERS is included with RELOCATABLE, so it is not really necessary, ONLY_MY_CODE strips some extras, but it does not matter here either)

The thing is, emcc never generated a relocatable wasm file for me, at least not the version I downloaded this week, for Windows (I played this on hard difficulty, which retrospectively might have not been the best idea). So the sections are missing and rustc keeps complaining about <something.wasm> is not a relocatable wasm file.

Then comes clang, which can generate a relocatable wasm module with a very simple one-liner:

clang -c <something.c> -o <something.wasm> --target=wasm32-unknown-unknown

Then rustc says “Linking sub-section ended prematurely”. Aw, yes (by the way, my Rust setup was brand new too). Then I read that there are two clang wasm targets: wasm32-unknown-unknown-wasm and wasm32-unknown-unknown-elf, and maybe the latter one should be used here. As my also brand new llvm+clang build runs into an internal error with this target, asking me to send an error report to the developers, it might be something to test on easy or medium, like on some *nix or Mac box.

Minimal success story: sum of three numbers

At this point I just added lld to llvm and succeeded with linking a test code manually from bitcode files:

clang cadd.c --target=wasm32-unknown-unknown -emit-llvm -c
rustc rsum.rs --target wasm32-unknown-unknown --crate-type=cdylib --emit llvm-bc
lld -flavor wasm rsum.bc cadd.bc -o msum.wasm --no-entry

Aw yes, it sums numbers, 2 in C and 1+2 in Rust:

cadd.c

int cadd(int x,int y){
  return x+y;
}

msum.rs

extern "C" {
    fn cadd(x: i32, y: i32) -> i32;
}

#[no_mangle]
pub fn rsum(x: i32, y: i32, z: i32) -> i32 {
    x + unsafe { cadd(y, z) }
}

test.html

<script>
  fetch('msum.wasm')
    .then(response => response.arrayBuffer())
    .then(bytes => WebAssembly.compile(bytes))
    .then(module => {
      console.log(WebAssembly.Module.exports(module));
      console.log(WebAssembly.Module.imports(module));
      return WebAssembly.instantiate(module, {
        env:{
          _ZN4core9panicking5panic17hfbb77505dc622acdE:alert
        }
      });
    })
    .then(instance => {
      alert(instance.exports.rsum(13,14,15));
    });
</script>

That _ZN4core9panicking5panic17hfbb77505dc622acdE feels very natural (the module is compiled and instantiated in two steps in order to log the exports and imports, that is a way how such missing pieces can be found), and forecasts the demise of this attempt: the entire thing works because there is no other reference to the runtime library, and this particular method could be mocked/provided manually.

Side story: string

As alloc and its Layout thing scared me a little, I went with the vector-based approach described/used from time to time, for example here or on Hello, Rust!.
Here is an example, getting the “Hello from …” string from the outside…

rhello.rs

use std::ffi::CStr;
use std::mem;
use std::os::raw::{c_char, c_void};
use std::ptr;

extern "C" {
    fn chello() -> *mut c_char;
}

#[no_mangle]
pub fn alloc(size: usize) -> *mut c_void {
    let mut buf = Vec::with_capacity(size);
    let p = buf.as_mut_ptr();
    mem::forget(buf);
    p as *mut c_void
}

#[no_mangle]
pub fn dealloc(p: *mut c_void, size: usize) {
    unsafe {
        let _ = Vec::from_raw_parts(p, 0, size);
    }
}

#[no_mangle]
pub fn hello() -> *mut c_char {
    let phello = unsafe { chello() };
    let c_msg = unsafe { CStr::from_ptr(phello) };
    let message = format!("{} and Rust!", c_msg.to_str().unwrap());
    dealloc(phello as *mut c_void, c_msg.to_bytes().len() + 1);
    let bytes = message.as_bytes();
    let len = message.len();
    let p = alloc(len + 1) as *mut u8;
    unsafe {
        for i in 0..len as isize {
            ptr::write(p.offset(i), bytes[i as usize]);
        }
        ptr::write(p.offset(len as isize), 0);
    }
    p as *mut c_char
}

Built as rustc rhello.rs --target wasm32-unknown-unknown --crate-type=cdylib

… and actually working with JavaScript:

jhello.html

<script>
  var e;
  fetch('rhello.wasm')
    .then(response => response.arrayBuffer())
    .then(bytes => WebAssembly.compile(bytes))
    .then(module => {
      console.log(WebAssembly.Module.exports(module));
      console.log(WebAssembly.Module.imports(module));
      return WebAssembly.instantiate(module, {
        env:{
          chello:function(){
            var s="Hello from JavaScript";
            var p=e.alloc(s.length+1);
            var m=new Uint8Array(e.memory.buffer);
            for(var i=0;i<s.length;i++)
              m[p+i]=s.charCodeAt(i);
            m[s.length]=0;
            return p;
          }
        }
      });
    })
    .then(instance => {
      /*var*/ e=instance.exports;
      var ptr=e.hello();
      var optr=ptr;
      var m=new Uint8Array(e.memory.buffer);
      var s="";
      while(m[ptr]!=0)
        s+=String.fromCharCode(m[ptr++]);
      e.dealloc(optr,s.length+1);
      console.log(s);
    });
</script>

It is not particularly beautiful (actually I have no clue about Rust), but it does something what I expect from it, and even that dealloc might work (at least invoking it twice throws a panic).
There was an important lesson on the way: when the module manages its memory, its size may change which results in invalidating the backing ArrayBuffer object and its views. So that is why memory.buffer is checked multiple times, and checked after calling into wasm code.

And this is where I am stuck, because this code would refer to runtime libraries, and .rlib-s. The closest I could get to a manual build is the following:

rustc rhello.rs --target wasm32-unknown-unknown --crate-type=cdylib --emit obj
lld -flavor wasm rhello.o -o rhello.wasm --no-entry --allow-undefined
     liballoc-5235bf36189564a3.rlib liballoc_system-f0b9538845741d3e.rlib
     libcompiler_builtins-874d313336916306.rlib libcore-5725e7f9b84bd931.rlib
     libdlmalloc-fffd4efad67b62a4.rlib liblibc-453d825a151d7dec.rlib
     libpanic_abort-43290913ef2070ae.rlib libstd-dcc98be97614a8b6.rlib
     libunwind-8cd3b0417a81fb26.rlib

Where I had to use the lld sitting in the depths of the Rust toolchain as .rlib-s are said to be interpreted, so they are bound to the Rust toolchain

--crate-type=rlib, #[crate_type = "rlib"] – A “Rust library” file will be produced. This is used as an intermediate artifact and can be thought of as a “static Rust library”. These rlib files, unlike staticlib files, are interpreted by the Rust compiler in future linkage. This essentially means that rustc will look for metadata in rlib files like it looks for metadata in dynamic libraries. This form of output is used to produce statically linked executables as well as staticlib outputs.

Of course this lld does not eat the .wasm/.o files generated with clang or llc (“Linking sub-section ended prematurely”), perhaps the Rust-part also should be rebuilt with the custom llvm.
Also, this build seems to be missing the actual allocators, besides chello, there will be 4 more entries in the import table: __rust_alloc, __rust_alloc_zeroed, __rust_dealloc and __rust_realloc. Which in fact could be provided from JavaScript after all, just defeats the idea of letting Rust handle its own memory, plus an allocator was present in the single-pass rustc build… Oh, yes, this is where I gave up for this week (Aug 11, 2018, at 21:56)

New week, new adventures, with Binaryen, wasm-dis/merge

The idea was to modify the ready-made Rust code (having allocators and everything in place). And this one works. As long as your C code has no data.

Proof of concept code:

chello.c

void *alloc(int len); // allocator comes from Rust

char *chello(){
  char *hell=alloc(13);
  hell[0]='H';
  hell[1]='e';
  hell[2]='l';
  hell[3]='l';
  hell[4]='o';
  hell[5]=' ';
  hell[6]='f';
  hell[7]='r';
  hell[8]='o';
  hell[9]='m';
  hell[10]=' ';
  hell[11]='C';
  hell[12]=0;
  return hell;
}

Not extremely usual, but it is C code.

rustc rhello.rs --target wasm32-unknown-unknown --crate-type=cdylib
wasm-dis rhello.wasm -o rhello.wast
clang chello.c --target=wasm32-unknown-unknown -nostdlib -Wl,--no-entry,--export=chello,--allow-undefined
wasm-dis a.out -o chello.wast
wasm-merge rhello.wast chello.wast -o mhello.wasm -O

(rhello.rs is the same one presented in “Side story: string”)
And the result works as

mhello.html

<script>
  fetch('mhello.wasm')
    .then(response => response.arrayBuffer())
    .then(bytes => WebAssembly.compile(bytes))
    .then(module => {
      console.log(WebAssembly.Module.exports(module));
      console.log(WebAssembly.Module.imports(module));
      return WebAssembly.instantiate(module, {
        env:{
          memoryBase: 0,
          tableBase: 0
        }
      });
    })
    .then(instance => {
      var e=instance.exports;
      var ptr=e.hello();
      console.log(ptr);
      var optr=ptr;
      var m=new Uint8Array(e.memory.buffer);
      var s="";
      while(m[ptr]!=0)
        s+=String.fromCharCode(m[ptr++]);
      e.dealloc(optr,s.length+1);
      console.log(s);
    });
</script>

Even the allocators seem to do something (ptr readings from repeated blocks with/without dealloc show how memory does not leak/leaks accordingly).

Of course this is super-fragile and has mysterious parts too:

  • if the final merge is run with -S switch (generates source code instead of .wasm), and the result assembly file is compiled separately (using wasm-as), the result will be a couple bytes shorter (and those bytes are somewhere in the very middle of the running code, not in export/import/data sections)
  • the order of merge matters, file with “Rust-origin” has to come first. wasm-merge chello.wast rhello.wast [...] dies with an entertaining message

    [wasm-validator error in module] unexpected false: segment offset should be reasonable, on
    [i32] (i32.const 1)
    Fatal: error in validating output

  • probably my fault, but I had to build a complete chello.wasm module (so, with linking). Compiling only (clang -c [...]) resulted in the relocatable module which was missed so much at the very beginning of this story, but decompiling that one (to .wast) lost the named export (chello()):
    (export "chello" (func $chello)) disappears completely
    (func $chello ... becomes (func $0 ..., an internal function (wasm-dis loses reloc and linking sections, putting only a remark about them and their size into the assembly source)
  • related to the previous one: this way (building a complete module) data from the secondary module can not be relocated by wasm-merge: while there is a chance for catching references to the string itself (const char *HELLO="Hello from C"; becomes a constant at offset 1024 in particular, and later referred as (i32.const 1024) if it is local constant, inside a function), it does not happen. And if it is a global constant, its address becomes a global constant too, number 1024 stored at offset 1040, and the string is going to be referred as (i32.load offset=1040 [...], which starts being difficult to catch.

For laughs, this code compiles and works too…

void *alloc(int len);

int my_strlen(const char *ptr){
  int ret=0;
  while(*ptr++)ret++;
  return ret;
}

char *my_strcpy(char *dst,const char *src){
  char *ret=dst;
  while(*src)*dst++=*src++;
  *dst=0;
  return ret;
}

char *chello(){
  const char *HELLO="Hello from C";
  char *hell=alloc(my_strlen(HELLO)+1);
  return my_strcpy(hell,HELLO);
}

… just it writes “Hello from C” in the middle of Rust’s message pool, resulting in the printout

Hello from Clt::unwrap()` on an `Err`an value and Rust!

(Explanation: 0-initializers are not present in the recompiled code because of the optimization flag, -O)
And it also brings up the question about locating a libc (though defining them without my_, clang mentions strlen and strcpy as built-ins, also telling their correct singatures, it does not emit code for them and they become imports for the resulting module).

Leave a Comment

tech