FFI in Rust - writing bindings for libcpuid

A few days ago I needed to find my CPU name and clock frequency in a Rust program. One can parse /etc/cpuinfo or the output of lscpu, but I wanted to encapsulate this functionality in a library of sorts. Meanwhile, I stumbled upon libcpuid, which is a small C library that does exactly what I needed (although not by reading from the proc filesystem). That left me wondering how hard would it be to create a Rust wrapper for libcpuid. Turns out it's quite simple!

FFI basics

To quote Wikipedia:

A foreign function interface (FFI) is a mechanism by which a program written in one programming language can call routines or make use of services written in another.

What that means for us is that we can use C libraries from our Rust code. As the Rust FFI guide says, all foreign functions are assumed to be unsafe, so we're going to build safe, high-level wrappers around them. The convention is to create a ffi.rs module with foreign function signatures inside, and use this module in the safe parts of the library (lib.rs in the simplest case).

We need to declare foreign functions in Rust, using types from libc crate and wrapping the declarations in an extern block. The link attribute tells the linker where to find the actual implementation of these functions (which library to link with). In rust-cpuid it looks like this:

#[link(name = "cpuid")]
extern {
    // redeclare functions from libcpuid.h here
}

Simple functions

I started by wrapping functions with the simplest signatures, such as cpuid_lib_version(). The C declaration in libcpuid.h is:

const char* cpuid_lib_version(void);

which translates rather easily to Rust:

pub fn cpuid_lib_version() -> *const c_char;

As I mentioned before, FFI calls are treated as unsafe. However, Rust gives us precise control over the unsafe code with the aptly named unsafe keyword. An unsafe block is a way to tell the compiler "this code you consider unsafe, is actually pretty safe to call right now". In most cases the wrapper function extracts some value from the unsafe block and than continues with the usual, safe Rust code. See the version() function:

pub fn version() -> String {
    let version_string = unsafe {
        let ptr = ffi::cpuid_lib_version();
        CString::new(ptr, false)
    };
    version_string.as_str().unwrap().to_string()
}

This is a typical pattern for simple C functions (apart from some string juggling).

Structs

The most important function in libcpuid is cpu_identify(), which detects the CPU based on some raw information gathered earlier by cpuid_get_raw_data(). Both of these functions take a pointer to some C struct which is then filled with data.

For example, cpuid_get_raw_data() expects a pointer to a cpu_raw_data_t value and returns a status code (zero indicating success):

struct cpu_raw_data_t {
    uint32_t basic_cpuid[MAX_CPUID_LEVEL][4]; 
    uint32_t ext_cpuid[MAX_EXT_CPUID_LEVEL][4];
    uint32_t intel_fn4[MAX_INTELFN4_LEVEL][4];
    uint32_t intel_fn11[MAX_INTELFN11_LEVEL][4];
};

int cpuid_get_raw_data(struct cpu_raw_data_t* data);

These structs also have to be redeclared in Rust, although not in the extern block. Using types from the libc crate and the repr attribute enures that the memory layout of the struct is the same in Rust as in C.

use libc::{c_int, uint32_t};

#[repr(C)]
pub struct cpu_raw_data_t {
    pub basic_cpuid: [[uint32_t, ..4u], ..MAX_CPUID_LEVEL],
    pub ext_cpuid: [[uint32_t, ..4u], ..MAX_EXT_CPUID_LEVEL],
    pub intel_fn4: [[uint32_t, ..4u], ..MAX_INTELFN4_LEVEL],
    pub intel_fn11: [[uint32_t, ..4u], ..MAX_INTELFN11_LEVEL],
}

// in the extern block:
pub fn cpuid_get_raw_data(raw: *mut cpu_raw_data_t) -> c_int;

If we were writing our program in C, retrieving raw CPUID data would look along the lines of:

struct cpu_raw_data_t raw;
int result = cpu_get_raw_data(&raw);
// use raw if result == 0

Notice how the raw variable isn't explicitly initialized and that's ok with C code. On the other hand, Rust requires us to initialize all members of the struct. This can be quite cumbersome (especially with larger, more complex structs) and repeating the same initialization code over and over is definitely not DRY. Fortunately we can take advantage of Rust's traits, namely the Default trait. It has only one method, Default::default(), which should return some reasonable default value for any type that implements this trait. Let's do it for cpu_raw_data_t.

use std::default::Default;

impl Default for cpu_raw_data_t {
    fn default() -> cpu_raw_data_t {
        cpu_raw_data_t {
            basic_cpuid: [[0, ..4u], ..MAX_CPUID_LEVEL],
            ext_cpuid: [[0, ..4u], ..MAX_EXT_CPUID_LEVEL],
            intel_fn4: [[0, ..4u], ..MAX_INTELFN4_LEVEL],
            intel_fn11: [[0, ..4u], ..MAX_INTELFN11_LEVEL],
        }
    }
}

Now anytime we need a "default" value of this type, we can just write:

let mut raw: ffi::cpu_raw_data_t = Default::default();

At last we can proceed to the actual CPU identification. I decided not to expose the raw data in Rust, so the cpuid::identify() function is a wrapper for both cpuid_get_raw_data and cpu_identify C functions.

pub fn identify() -> Result<CpuInfo, String> {
    let mut raw: ffi::cpu_raw_data_t = Default::default();
    let raw_result = unsafe {
        ffi::cpuid_get_raw_data(&mut raw)
    };
    if raw_result != 0 {
        return Err(error());
    }
    let mut data: ffi::cpu_id_t = Default::default();
    let identify_result = unsafe {
        ffi::cpu_identify(&mut raw, &mut data)
    };
    if identify_result != 0 {
        Err(error())
    } else {
        Ok(CpuInfo {
            num_cores: data.num_cores as int,
            num_logical_cpus: data.num_logical_cpus as int,
            // and more CpuInfo members...
        })
    }
}

In a typical Rust idiom, the function returns a Result wrapping either an error string or a successful return value. The CpuInfo struct contains fields describing processor features like number of cores, cache size, vendor string etc.

Example

Finally, here's a real life usage example of rust-cpuid.

extern crate cpuid;

fn main () {
    match cpuid::identify() {
        Ok(info) => {
            println!("Found: {} CPU, model: {}", info.vendor, info.codename);
            println!("The full brand string is: {}", info.brand);
            println!("Hardware AES support: {}", if info.has_feature(cpuid::AES) { "yes" } else { "no" });
        },
        Err(err) => println!("cpuid error: {}", err),
    };
    match cpuid::clock_frequency() {
        Some(frequency) => println!("CPU speed: {} MHz", frequency),
        None => println!("Couldn't get CPU speed."),
    };
}

Summary

As it turns out, calling C code from Rust is pretty straightforward. After figuring out how to use C structs in function arguments, the wrapper worked correctly the first time it compiled! This is not as common in Rust as in Haskell (just my luck rather than language "guarantee"), but still - awesome.

The rust-cpuid source code is on GitHub if you want to see the whole thing. And if you want to use the library, great! See the documentation and the libcpuid docs for reference.


Code examples in this article were built with rustc 0.12.0-nightly.

Photo by Abby Lanes and shared under the Creative Commons Attribution 2.0 Generic License. See https://www.flickr.com/photos/abbylanes/3346280502