Building Cross-Platform SDKs: From FFI to WebAssembly

This post describes our journey building portable, cross-language SDKs for Flipt, and the unexpected technical challenges that forced us to rethink our approach a few times along the way. What started as a somewhat straightforward FFI implementation evolved into a complex exploration of C standard libraries, static linking, and eventually led us to WebAssembly. Along the way, we discovered that even in 2025, the dream of truly portable libraries remains surprisingly elusive - but our hybrid approach combining WASM with native code might be the key to cross-platform SDKs.

Building Cross-Platform SDKs: The Challenge

When we started building our client-side evaluation SDKs for Flipt, our goal was simple: provide a consistent, reliable way for developers to integrate feature flagging into their applications, regardless of their programming language or platform. Our core evaluation engine is written in Rust, which meant we needed a way to bridge the gap between our Rust implementation and various client languages. If you're interested in more of the why and how behind our SDKs, check out our introductory blog post from a while back.

In short, we wanted to write the evaluation logic once and be able to use it in any language to perform feature flag evaluations within end user applications. Client-side evaluation is a powerful feature that allows you to evaluate feature flags without sending any data to the Flipt server, which is especially useful for mobile applications or applications that cannot always have a network connection.

The rest of this post is a technical deep dive into the challenges we faced and the solutions we tried to build universal SDKs and isn't really specific to Flipt or feature flags, but rather a general guide to the state of developing cross-platform libraries in 2025. Along the way, we'll share some of the learnings we had to navigate to build our SDKs.

FFI: The First Implementation

Our first approach was straightforward - we'd compile our Rust code into a shared library and use FFI (Foreign Function Interface) to create language-specific wrappers. In fact this is the reason we chose Rust in the first place, as it's a language that's designed for building memory safe, performant, 'low-level' libraries and applications and supports many different compilation targets such as MacOS, Linux, Windows, iOS, Android, etc. As a bonus, it also has very good support for building libraries that can interoperate with other languages via a C compatible ABI.

Our original blog post linked above has more details on the implementation and approach we took so I won't go into much more detail here. All you really need to know is that we'd compile our Rust code into a shared library which exposed a C compatible ABI that we could use to create language-specific wrappers.

The Rust library extends beyond basic evaluation logic, implementing network operations for feature flag configuration retrieval, retry handling, and streaming support for flag state changes from Flipt Cloud.

This approach was working great for awhile. We were able to build our SDKs in a way that was compatible with most languages that support FFI, allowing us to quickly add new languages as needed with minimal effort. That was until we started running into issues with the C standard library...

The glibc vs musl Divide

The C standard library landscape is complex. While most developers might not think twice about it, the choice between glibc and musl can make or break cross-platform compatibility. Glibc, the GNU C Library, is the default on most Linux distributions, while musl is commonly used in Alpine Linux and other minimal environments.

For those unfamiliar, the C standard library is a set of functions that are used to perform common operations like reading and writing to the file system, allocating and freeing memory, handling network requests, and more. It's a low-level library that is used to build other libraries and applications. While glibc and musl serve similar purposes, their implementations differ significantly, leading to compatibility challenges when attempting to support both simultaneously.

Check out Flipt on GitHub

Like what you're reading? Please consider giving us a star on GitHub.

This became a significant issue when we started distributing our SDKs. We needed to maintain two versions of each SDK - one for glibc-based systems and another for musl-based systems. This doubled our build matrix and created confusion for users who weren't familiar with these underlying differences. Actually, we needed to maintain more than just two versions of each SDK, as we needed to support different operating systems and architectures (x86_64, arm64, etc.).

The problem extended beyond just our own build process. Developers face a challenging situation when their development environment differs from their production environment. Consider this common scenario:

A developer builds their application on Ubuntu (glibc-based)
They deploy to a production environment using Alpine Linux containers (musl-based)
Suddenly, their application fails because the SDK they're using was built against glibc

This obviously created a frustrating developer experience, both for us and for our users.

Static Libraries to the Rescue

We put up with this for awhile, but it soon became too much to handle as each new language we added to our SDKs only made it worse. We needed a better solution.

Our first attempt at solving this involved building a static library using musl. A static library by definition contains all (barring kernel level dependencies) of its dependencies within the library itself. The musl-gcc compiler is a popular choice for building static libraries that are compatible with many different Linux distributions.

The idea was that we'd build our Rust library using musl-gcc to create a static library for Linux (both arm64 and x86_64) and then use that static library in our SDKs for the target language (via FFI as before). Because the static library contains all of its dependencies, we wouldn't need to worry about the C standard library differences between distributions.

Just as before, we'd still need to distribute the static libraries for each platform we supported, but at least we wouldn't need to worry about the C standard library differences between distributions on Linux.

We quickly ran into a roadblock however, as FFI bindings in pretty much all languages require a shared library and won't work at all with a static one. One exception is Go, which can link against a static or shared library... but more on that in a bit.

So, for a refresher, here's the state of things:

We want to build a library that can be used in any language via FFI
We want to build a statically linked library that uses the musl C standard library to ensure compatibility with many different Linux distributions
FFI only works with shared libraries

Using a Shared Library Wrapper

Then an idea came. What if we could 'wrap' our static library with a shared library that would load the static library at runtime? This way we could still build our static library with the musl C standard library and have a shared library that could be used in any language via FFI.

First, we renamed all the exported functions in the Rust library to have the suffix _ffi. This way we could prevent naming collisions with the original function names in our SDKs and not have to change any existing code.

// flipt-engine-ffi/src/lib.rs

// Before
#[no_mangle]
pub unsafe extern "C" fn evaluate_variant(
    engine_ptr: *mut c_void,
    evaluation_request: *const c_char,
) -> *const c_char {
    // ...
}

// After
#[no_mangle]
pub unsafe extern "C" fn evaluate_variant_ffi(
    engine_ptr: *mut c_void,
    evaluation_request: *const c_char,
) -> *const c_char {
    // ...
}

Then we created a simple wrapper in C (wrapper.c) that would load the static library and call the appropriate function.

// wrapper.c

#include <stdlib.h>
#include "flipt_engine.h"

// Declare the Rust functions we're wrapping
...
extern const char* evaluate_variant_ffi(void* engine, const char* request);
...

// Wrapper functions that will be exported in our .so
...

const char* evaluate_variant(void* engine, const char* request) {
    return evaluate_variant_ffi(engine, request);
}

Finally, we'd build the static library using cargo as before and then compile it into a shared library using musl-gcc.

# Build the static library
cargo build -p flipt-engine-ffi --release --target=x86_64-unknown-linux-musl

# Move the static library to a temporary directory
mv "target/x86_64-unknown-linux-musl/release/libfliptengine.a" "/tmp/ffi/libfliptengine_static.a"

# Build the shared library, wrapping the static library
musl-gcc -shared -o "target/x86_64-unknown-linux-musl/release/libfliptengine.so" -fPIC wrapper.c \
      -I"include" \
      -L"/tmp/ffi" -lfliptengine_static \
      -Wl,-Bstatic -static-libgcc -static

Low and behold, this actually worked! We now had a self-contained, portable, and shared library that could be called from any language via FFI!

We increased our testing matrix to include all the different combinations of operating systems, architectures, and C standard libraries that we supported and everything worked great... well, almost everything.

Here's a screenshot of our CI pipeline (via GitHub Actions) that shows all of the successful tests for each language, operating system, and architecture that we support.

But, if you scroll down a bit, you can see that there are some failures, and they all happen to be with the Go SDK.

CGO, Dreaded CGO

Our Go SDK was calling our shared library via CGO, which is part of the Go toolchain that allows Go programs to call C code. Here's an example of the CGO directives we were using previously to build the Go SDK:

#cgo CFLAGS: -I./ext
#cgo darwin,arm64 LDFLAGS: -L${SRCDIR}/ext/darwin_aarch64 -lfliptengine -Wl,-rpath,${SRCDIR}/ext/darwin_aarch64
#cgo darwin,amd64 LDFLAGS: -L${SRCDIR}/ext/darwin_x86_64 -lfliptengine -Wl,-rpath,${SRCDIR}/ext/darwin_x86_64
#cgo linux,arm64 LDFLAGS: -L${SRCDIR}/ext/linux_aarch64 -lfliptengine -Wl,-rpath,${SRCDIR}/ext/linux_aarch64
#cgo linux,amd64 LDFLAGS: -L${SRCDIR}/ext/linux_x86_64 -lfliptengine -Wl,-rpath,${SRCDIR}/ext/linux_x86_64
#cgo windows,amd64 LDFLAGS: -L${SRCDIR}/ext/windows_x86_64 -lfliptengine -Wl,-rpath,${SRCDIR}/ext/windows_x86_64

These directives tell the Go compiler to link against the shared library in the ext directory and set the correct version for the target platform.

However, one important thing to remember is that CGO is not Go and it comes with significant limitations, the main one being that it creates dependencies on system libraries like glibc. We could provide custom linker flags to the Go compiler to use a different C standard library, but this would still require the user to have the correct C compiler toolchain installed on their system.. something we could not enforce.

In fact, reading through the Go Porting Policy wiki page, it states:

All Linux first class ports are for systems using glibc only. Linux systems using other C libraries are not fully supported and are not treated as first class.

Effectively we had hit a dead end again.

There was one last thing we could try though, and that was to use WebAssembly (WASM) as an alternative to FFI for our Go SDK.

Enter WebAssembly

The complexity of managing multiple library versions and dealing with C standard library differences led us to explore WebAssembly (WASM) as an alternative for our Go SDK. WASM offered several compelling advantages:

Platform-independent bytecode format
No direct dependency on system libraries
Sandboxed execution environment
Isolated linear memory model

WebAssembly has evolved significantly since its introduction in 2017. Initially focused on browser-based execution, WASM has expanded to become a universal runtime for any environment. However, this expansion has revealed limitations in the core WASM specification:

No built-in threading support
No direct network access
Limited system interface capabilities
No direct file system access

Remember earlier when I said that our FFI library did more than just the evaluation bits and handled things making network requests via polling or streaming in the background? Well... WASM doesn't support any of that.

However, we were able to work around these limitations by adopting a hybrid approach where the core evaluation logic takes place in WASM and the rest of the functionality is written in Go.

Go is responsible for handling all the things that WASM can't do yet, like making HTTP requests and concurrent operations in our case.

To handle this we wrote yet another Rust library similar to the one we had before using FFI, but this time it was meant to compile to WASM and had a few key differences because of this.

One major difference is that WASM only supports a few primitive types, namely numbers (integers and floating-point). Strings and more complex types like structs and arrays are not supported natively by the WebAssembly specification. You also need to be careful about how you allocate memory in WASM, as there is no garbage collector and memory is represented as a contiguous linear buffer that must be manually managed to prevent leaks and corruption.

These differences required us to make some changes to how we call into the WASM module from Go and having to carefully keep track of pointers and argument lengths.

Here's a snippet of the Go code that handles the bulk of the calling into the WASM module for evaluation:

Note: I've removed the locking logic and error handling for brevity, but it's important to note that we need to handle it in the actual Go code.

var (
	allocFunc   = e.mod.ExportedFunction("allocate")
	deallocFunc = e.mod.ExportedFunction("deallocate")
	evalFunc    = e.mod.ExportedFunction("evaluate")
)

reqBytes, err := json.Marshal(request)
if err != nil {
	return nil, err
}

// allocate WASM memory for the request
reqPtr, err := allocFunc.Call(ctx, uint64(len(reqBytes)))
if err != nil {
	return nil, err
}

// write the request to the WASM memory
if !e.mod.Memory().Write(uint32(reqPtr[0]), reqBytes) {
	deallocFunc.Call(ctx, reqPtr[0], uint64(len(reqBytes)))
	return nil, err
}

// call the evaluate function in WASM, passing in the engine pointer, the request pointer, and the request length
res, err := evalFunc.Call(ctx, uint64(e.engine), reqPtr[0], uint64(len(reqBytes)))
if err != nil {
	deallocFunc.Call(ctx, reqPtr[0], uint64(len(reqBytes)))
	return nil, err
}

// clean up request WASM memory
deallocFunc.Call(ctx, reqPtr[0], uint64(len(reqBytes)))

// read the result from the WASM memory, decode the pointer and length
ptr, length := decodePtr(res[0])
b, ok := e.mod.Memory().Read(ptr, length)
if !ok {
	deallocFunc.Call(ctx, uint64(ptr), uint64(length))
	return nil, err
}

// make a copy of the result before deallocating
result := make([]byte, len(b))
copy(result, b)

// clean up result WASM memory
deallocFunc.Call(ctx, uint64(ptr), uint64(length))

return result, nil

Memory Management Risks

Working with WASM memory requires careful attention to prevent common issues:

Memory Leaks: Failing to call deallocate after allocating memory will cause leaks. This is particularly challenging in error paths where early returns might skip cleanup.
Use After Free: Reading from deallocated memory can cause undefined behavior. Always ensure memory is valid before access and copy data before deallocation if needed.
Memory Management: While WASM provides runtime bounds checking for memory access, you still need to carefully manage memory allocation and deallocation. The linear memory model requires explicit tracking of allocated regions.
Resource Cleanup: In concurrent scenarios, ensure proper cleanup when operations are cancelled or timeouts occur.

As I mentioned before, since WASM doesn't support complex types, we use JSON serialization/deserialization to pass data between WASM and Go. This adds a bit of overhead, but it provides a clean boundary between WASM and host code. We actually do the same in the FFI versions as well, so this isn't anything new.

Choosing a WASM Runtime

One final thing to consider when diving into WASM is the runtime you want to use. There are a few different WASM runtimes available in Go, and they all have their own trade-offs. We initially tried using wasmtime, as it is mature, feature rich, has a large community, and has broad language support, Go being one of them. However, as we dove deeper, we quickly realized that it was not the best fit for our use case because... you guessed it... it requires CGO. This would have again brought us back to the same problem we had with FFI.

We ended up using wazero, which is a pure Go runtime, and doesn't require CGO. Its API was similar enough to wasmtime's that we were able to make the switch relatively easily.

After working through the kinks and limitations of WASM, we finally had a Go SDK that worked across all the platforms we supported. We were able to remove all the C standard library related complexity and build a truly universal library!

Performance Analysis and Benchmarks

Benchmarks comparing the FFI and WASM implementations revealed that the WASM version runs approximately 35-45% slower than its FFI counterpart. Two primary factors likely contribute to this performance difference:

Sync/lock overhead required for thread safety between Go and WASM
Additional memory management operations, including allocation, copying, and deallocation of memory buffers

We've done pretty much zero optimization of the WASM version, so we're looking forward to seeing what we can do to improve that. Also, we're talking about microseconds here (4-19μs range), so it's not like it's a noticeable difference for most applications, especially considering the benefits of platform independence.

FFI

goos: darwin
goarch: arm64
cpu: Apple M1 Max
BenchmarkVariantEvaluation/Simple-10              230846              5014 ns/op            1496 B/op         25 allocs/op
BenchmarkVariantEvaluation/MediumContext-10       152347              7865 ns/op            2105 B/op         41 allocs/op
BenchmarkVariantEvaluation/LargeContext-10         26518             45133 ns/op           12325 B/op        221 allocs/op
BenchmarkBooleanEvaluation/Simple-10              299326              4478 ns/op            1304 B/op         22 allocs/op
BenchmarkBooleanEvaluation/MediumContext-10       162817              7115 ns/op            1912 B/op         38 allocs/op
BenchmarkBooleanEvaluation/LargeContext-10         26995             44367 ns/op           12133 B/op        218 allocs/op
BenchmarkBatchEvaluation/Simple-10                111051             10812 ns/op            2345 B/op         39 allocs/op
BenchmarkBatchEvaluation/MediumBatch-10            19398             61418 ns/op           17514 B/op        294 allocs/op
BenchmarkBatchEvaluation/LargeBatch-10               566           2125860 ns/op          588452 B/op      10376 allocs/op
BenchmarkListFlags-10                             324736              3656 ns/op             872 B/op         18 allocs/op

WASM

goos: darwin
goarch: arm64
cpu: Apple M1 Max
BenchmarkVariantEvaluation/Simple-10              171214              7004 ns/op            1536 B/op         32 allocs/op
BenchmarkVariantEvaluation/MediumContext-10       109365             10957 ns/op            2000 B/op         48 allocs/op
BenchmarkVariantEvaluation/LargeContext-10         18249             66030 ns/op           10391 B/op        228 allocs/op
BenchmarkBooleanEvaluation/Simple-10              182265              6077 ns/op            1344 B/op         29 allocs/op
BenchmarkBooleanEvaluation/MediumContext-10       120817             10098 ns/op            1808 B/op         45 allocs/op
BenchmarkBooleanEvaluation/LargeContext-10         18490             65206 ns/op           10199 B/op        225 allocs/op
BenchmarkBatchEvaluation/Simple-10                 76533             15641 ns/op            2297 B/op         46 allocs/op
BenchmarkBatchEvaluation/MediumBatch-10            14582             83378 ns/op           15318 B/op        301 allocs/op
BenchmarkBatchEvaluation/LargeBatch-10               387           3109470 ns/op          490291 B/op      10383 allocs/op
BenchmarkListFlags-10                             275409              4334 ns/op             920 B/op         22 allocs/op

Future Directions and WASI

The journey from FFI to WASM represents a significant evolution in our SDK architecture. While this post covered the core technical challenges and solutions, future articles will explore the testing, packaging, and distribution aspects of cross-platform SDK development.

We've just barely scratched the surface of what's possible with WASM and we're excited to see what the future holds. I'm personally looking forward to seeing the WASI standard mature and have wide runtime support for things like networking, threading, and WIT (WebAssembly Interface Types). The latter of which is a new standard that allows you to describe types in a way that can be used across languages (no more JSON serialization/deserialization).

If you're interested in seeing all the code for yourself, it's all open-source and available in our Client SDKs monorepo. We have an Architecture document that goes into more detail about the different components and how they all work together.

Finally, we'd love it if you gave us a star on GitHub and shared the post with your friends and colleagues!

Resources

These resources were invaluable during our journey: