TL;DR: Jump to “New week, new adventures” in order to get “Hello from C and Rust!”
The nice way would be creating a WASM library and passing it to the linker. rustc
has an option for that (and there seem to be source-code directives too):
rustc <yourcode.rs> --target wasm32-unknown-unknown --crate-type=cdylib -C link-arg=<library.wasm>
The trick is that the library has to be a library, so it needs to contain reloc
(and in practice linking
) sections. Emscripten seems to have a symbol for that, RELOCATABLE
:
emcc <something.c> -s WASM=1 -s SIDE_MODULE=1 -s RELOCATABLE=1 -s EMULATED_FUNCTION_POINTERS=1 -s ONLY_MY_CODE=1 -o <something.wasm>
(EMULATED_FUNCTION_POINTERS
is included with RELOCATABLE
, so it is not really necessary, ONLY_MY_CODE
strips some extras, but it does not matter here either)
The thing is, emcc
never generated a relocatable wasm
file for me, at least not the version I downloaded this week, for Windows (I played this on hard difficulty, which retrospectively might have not been the best idea). So the sections are missing and rustc
keeps complaining about <something.wasm> is not a relocatable wasm file
.
Then comes clang
, which can generate a relocatable wasm
module with a very simple one-liner:
clang -c <something.c> -o <something.wasm> --target=wasm32-unknown-unknown
Then rustc
says “Linking sub-section ended prematurely”. Aw, yes (by the way, my Rust setup was brand new too). Then I read that there are two clang
wasm
targets: wasm32-unknown-unknown-wasm
and wasm32-unknown-unknown-elf
, and maybe the latter one should be used here. As my also brand new llvm+clang
build runs into an internal error with this target, asking me to send an error report to the developers, it might be something to test on easy or medium, like on some *nix or Mac box.
Minimal success story: sum of three numbers
At this point I just added lld
to llvm
and succeeded with linking a test code manually from bitcode files:
clang cadd.c --target=wasm32-unknown-unknown -emit-llvm -c
rustc rsum.rs --target wasm32-unknown-unknown --crate-type=cdylib --emit llvm-bc
lld -flavor wasm rsum.bc cadd.bc -o msum.wasm --no-entry
Aw yes, it sums numbers, 2 in C
and 1+2 in Rust:
cadd.c
int cadd(int x,int y){
return x+y;
}
msum.rs
extern "C" {
fn cadd(x: i32, y: i32) -> i32;
}
#[no_mangle]
pub fn rsum(x: i32, y: i32, z: i32) -> i32 {
x + unsafe { cadd(y, z) }
}
test.html
<script>
fetch('msum.wasm')
.then(response => response.arrayBuffer())
.then(bytes => WebAssembly.compile(bytes))
.then(module => {
console.log(WebAssembly.Module.exports(module));
console.log(WebAssembly.Module.imports(module));
return WebAssembly.instantiate(module, {
env:{
_ZN4core9panicking5panic17hfbb77505dc622acdE:alert
}
});
})
.then(instance => {
alert(instance.exports.rsum(13,14,15));
});
</script>
That _ZN4core9panicking5panic17hfbb77505dc622acdE
feels very natural (the module is compiled and instantiated in two steps in order to log the exports and imports, that is a way how such missing pieces can be found), and forecasts the demise of this attempt: the entire thing works because there is no other reference to the runtime library, and this particular method could be mocked/provided manually.
Side story: string
As alloc
and its Layout
thing scared me a little, I went with the vector-based approach described/used from time to time, for example here or on Hello, Rust!.
Here is an example, getting the “Hello from …” string from the outside…
rhello.rs
use std::ffi::CStr;
use std::mem;
use std::os::raw::{c_char, c_void};
use std::ptr;
extern "C" {
fn chello() -> *mut c_char;
}
#[no_mangle]
pub fn alloc(size: usize) -> *mut c_void {
let mut buf = Vec::with_capacity(size);
let p = buf.as_mut_ptr();
mem::forget(buf);
p as *mut c_void
}
#[no_mangle]
pub fn dealloc(p: *mut c_void, size: usize) {
unsafe {
let _ = Vec::from_raw_parts(p, 0, size);
}
}
#[no_mangle]
pub fn hello() -> *mut c_char {
let phello = unsafe { chello() };
let c_msg = unsafe { CStr::from_ptr(phello) };
let message = format!("{} and Rust!", c_msg.to_str().unwrap());
dealloc(phello as *mut c_void, c_msg.to_bytes().len() + 1);
let bytes = message.as_bytes();
let len = message.len();
let p = alloc(len + 1) as *mut u8;
unsafe {
for i in 0..len as isize {
ptr::write(p.offset(i), bytes[i as usize]);
}
ptr::write(p.offset(len as isize), 0);
}
p as *mut c_char
}
Built as rustc rhello.rs --target wasm32-unknown-unknown --crate-type=cdylib
… and actually working with JavaScript
:
jhello.html
<script>
var e;
fetch('rhello.wasm')
.then(response => response.arrayBuffer())
.then(bytes => WebAssembly.compile(bytes))
.then(module => {
console.log(WebAssembly.Module.exports(module));
console.log(WebAssembly.Module.imports(module));
return WebAssembly.instantiate(module, {
env:{
chello:function(){
var s="Hello from JavaScript";
var p=e.alloc(s.length+1);
var m=new Uint8Array(e.memory.buffer);
for(var i=0;i<s.length;i++)
m[p+i]=s.charCodeAt(i);
m[s.length]=0;
return p;
}
}
});
})
.then(instance => {
/*var*/ e=instance.exports;
var ptr=e.hello();
var optr=ptr;
var m=new Uint8Array(e.memory.buffer);
var s="";
while(m[ptr]!=0)
s+=String.fromCharCode(m[ptr++]);
e.dealloc(optr,s.length+1);
console.log(s);
});
</script>
It is not particularly beautiful (actually I have no clue about Rust), but it does something what I expect from it, and even that dealloc
might work (at least invoking it twice throws a panic).
There was an important lesson on the way: when the module manages its memory, its size may change which results in invalidating the backing ArrayBuffer
object and its views. So that is why memory.buffer
is checked multiple times, and checked after calling into wasm
code.
And this is where I am stuck, because this code would refer to runtime libraries, and .rlib
-s. The closest I could get to a manual build is the following:
rustc rhello.rs --target wasm32-unknown-unknown --crate-type=cdylib --emit obj
lld -flavor wasm rhello.o -o rhello.wasm --no-entry --allow-undefined
liballoc-5235bf36189564a3.rlib liballoc_system-f0b9538845741d3e.rlib
libcompiler_builtins-874d313336916306.rlib libcore-5725e7f9b84bd931.rlib
libdlmalloc-fffd4efad67b62a4.rlib liblibc-453d825a151d7dec.rlib
libpanic_abort-43290913ef2070ae.rlib libstd-dcc98be97614a8b6.rlib
libunwind-8cd3b0417a81fb26.rlib
Where I had to use the lld
sitting in the depths of the Rust toolchain as .rlib
-s are said to be interpreted, so they are bound to the Rust
toolchain
--crate-type=rlib
,#[crate_type = "rlib"]
– A “Rust library” file will be produced. This is used as an intermediate artifact and can be thought of as a “static Rust library”. Theserlib
files, unlikestaticlib
files, are interpreted by the Rust compiler in future linkage. This essentially means thatrustc
will look for metadata inrlib
files like it looks for metadata in dynamic libraries. This form of output is used to produce statically linked executables as well asstaticlib
outputs.
Of course this lld
does not eat the .wasm
/.o
files generated with clang
or llc
(“Linking sub-section ended prematurely”), perhaps the Rust-part also should be rebuilt with the custom llvm
.
Also, this build seems to be missing the actual allocators, besides chello
, there will be 4 more entries in the import table: __rust_alloc
, __rust_alloc_zeroed
, __rust_dealloc
and __rust_realloc
. Which in fact could be provided from JavaScript after all, just defeats the idea of letting Rust handle its own memory, plus an allocator was present in the single-pass rustc
build… Oh, yes, this is where I gave up for this week (Aug 11, 2018, at 21:56)
New week, new adventures, with Binaryen, wasm-dis/merge
The idea was to modify the ready-made Rust code (having allocators and everything in place). And this one works. As long as your C code has no data.
Proof of concept code:
chello.c
void *alloc(int len); // allocator comes from Rust
char *chello(){
char *hell=alloc(13);
hell[0]='H';
hell[1]='e';
hell[2]='l';
hell[3]='l';
hell[4]='o';
hell[5]=' ';
hell[6]='f';
hell[7]='r';
hell[8]='o';
hell[9]='m';
hell[10]=' ';
hell[11]='C';
hell[12]=0;
return hell;
}
Not extremely usual, but it is C code.
rustc rhello.rs --target wasm32-unknown-unknown --crate-type=cdylib
wasm-dis rhello.wasm -o rhello.wast
clang chello.c --target=wasm32-unknown-unknown -nostdlib -Wl,--no-entry,--export=chello,--allow-undefined
wasm-dis a.out -o chello.wast
wasm-merge rhello.wast chello.wast -o mhello.wasm -O
(rhello.rs
is the same one presented in “Side story: string”)
And the result works as
mhello.html
<script>
fetch('mhello.wasm')
.then(response => response.arrayBuffer())
.then(bytes => WebAssembly.compile(bytes))
.then(module => {
console.log(WebAssembly.Module.exports(module));
console.log(WebAssembly.Module.imports(module));
return WebAssembly.instantiate(module, {
env:{
memoryBase: 0,
tableBase: 0
}
});
})
.then(instance => {
var e=instance.exports;
var ptr=e.hello();
console.log(ptr);
var optr=ptr;
var m=new Uint8Array(e.memory.buffer);
var s="";
while(m[ptr]!=0)
s+=String.fromCharCode(m[ptr++]);
e.dealloc(optr,s.length+1);
console.log(s);
});
</script>
Even the allocators seem to do something (ptr
readings from repeated blocks with/without dealloc
show how memory does not leak/leaks accordingly).
Of course this is super-fragile and has mysterious parts too:
- if the final merge is run with
-S
switch (generates source code instead of.wasm
), and the result assembly file is compiled separately (usingwasm-as
), the result will be a couple bytes shorter (and those bytes are somewhere in the very middle of the running code, not in export/import/data sections) - the order of merge matters, file with “Rust-origin” has to come first.
wasm-merge chello.wast rhello.wast [...]
dies with an entertaining message
[wasm-validator error in module] unexpected false: segment offset should be reasonable, on
[i32] (i32.const 1)
Fatal: error in validating output - probably my fault, but I had to build a complete
chello.wasm
module (so, with linking). Compiling only (clang -c [...]
) resulted in the relocatable module which was missed so much at the very beginning of this story, but decompiling that one (to.wast
) lost the named export (chello()
):
(export "chello" (func $chello))
disappears completely
(func $chello ...
becomes(func $0 ...
, an internal function (wasm-dis
losesreloc
andlinking
sections, putting only a remark about them and their size into the assembly source) - related to the previous one: this way (building a complete module) data from the secondary module can not be relocated by
wasm-merge
: while there is a chance for catching references to the string itself (const char *HELLO="Hello from C";
becomes a constant at offset 1024 in particular, and later referred as(i32.const 1024)
if it is local constant, inside a function), it does not happen. And if it is a global constant, its address becomes a global constant too, number 1024 stored at offset 1040, and the string is going to be referred as(i32.load offset=1040 [...]
, which starts being difficult to catch.
For laughs, this code compiles and works too…
void *alloc(int len);
int my_strlen(const char *ptr){
int ret=0;
while(*ptr++)ret++;
return ret;
}
char *my_strcpy(char *dst,const char *src){
char *ret=dst;
while(*src)*dst++=*src++;
*dst=0;
return ret;
}
char *chello(){
const char *HELLO="Hello from C";
char *hell=alloc(my_strlen(HELLO)+1);
return my_strcpy(hell,HELLO);
}
… just it writes “Hello from C” in the middle of Rust’s message pool, resulting in the printout
Hello from Clt::unwrap()` on an `Err`an value and Rust!
(Explanation: 0-initializers are not present in the recompiled code because of the optimization flag, -O
)
And it also brings up the question about locating a libc
(though defining them without my_
, clang
mentions strlen
and strcpy
as built-ins, also telling their correct singatures, it does not emit code for them and they become imports for the resulting module).