Armin Ronacher's Thoughts and Writings

Rust and Rest

written on Sunday, July 10, 2016

A few months back I decided to write a command line client for Sentry because manually invoking the Sentry API for some common tasks (such as dsym or sourcemap management is just no fun). Given the choice of languages available I went with Rust. The reason for this is that I want people to be able to download a single executable that links everything in statically. The choice was between C, C++, Go and Rust. There is no denying that I really like Rust so it was already a pretty good choice for me. However what made it even easier is that Rust quite a potent ecosystem for what I wanted. So here is my lessons learned from this.

Libraries for HTTP

To make an HTTP request you have a choice of libraries. In particular there are two in Rust you can try: hyper and rust-curl. I tried both and there are some releases with the former but I settled in rust-curl in the end. The reason for this is twofold. The first is that curl (despite some of the oddities in how it does things) is very powerful and integrates really well with the system SSL libraries. This means that when I compile the executable I get the native TLS support right away. rust-curl also (despite being not a pure rust library) compiles really well out of the box on Windows, macOS and Linux. The second reason is that Hyper is currently undergoing a major shift in how it's structured and a bit in flux. I did not want to bet on that too much. When I started it also did not have proxy support which is not great.

For JSON parsing and serializing I went with serde. I suppose that serde will eventually be the library of choice for all things serialization but right now it's not. It depends on compiler plugins and there are two ways to make it work right now. One is to go with nightly Rust (which is what I did) the other is to use the build script support in Rust. This is similar to what you do in Go where some code generation happens as part of the build. It definitely works but it's not nearly as nice as using serde with nightly Rust.

API Design

The next question is what a good API design for a Rust HTTP library is. I struggeld with this quite a bit and it took multiple iterations to end up with something that I think is a good pattern. What I ended up is a collection of multiple types:

  • Api: I have a basic client object which I call Api internally. it manages the curl handles (right now it just caches one) and also exposes convenience methods to perform certain types of HTTP requests. On top of that it provides high level methods that send the right HTTP requests and handle the responses.
  • ApiRequest: basically your request object. It's mostly a builder for making requests and has a method to send the request and get a response object.
  • ApiResponse: contains the response from the HTTP request. This also provides various helpers to convert the response into different things.
  • ApiResult<T>: this is a result object which is returned from most methods. The error is a special API error that converts from all the APIs we call into. This means it can hold curl errors, form errors, JSON errors, IO errors and more.

To give you an idea how this looks like I want to show you one of the high level methods that use most of the API:

pub fn list_releases(&self, org: &str, project: &str)
    -> ApiResult<Vec<ReleaseInfo>>
{
    Ok(self.get(&format!("/projects/{}/{}/releases/",
                         PathArg(org), PathArg(project)))?.convert()?)
}

(Note that I'm using the new question mark syntax ? instead of the more familiar try! macro here)

So what is happening here?

  1. This is a method on the Api struct. We use the get() shorthand method to make an HTTP GET request. It takes one argument which is the URL to make the request to. We use standard string formatting to create the URL path here.
  2. The PathArg is a simple wrapper that customizes the formatting so that instead of just stringifying a value it also percent encodes it.
  3. The return value of the get method is a ApiResult<ApiResponse> which provides a handy convert() method which does both error handling and deserialization.

How does the JSON handling take place here? The answer is that convert() can do that. Because Vec<ReleaseInfo> has an automatic deserializer implemented.

The Error Onion

The bulk of the complexity is hidden behind multiple layers of error handling. It took me quite a long time to finally come up with this design which is why I'm particularly happy with finally having found one I really like. The reason error handling is so tricky with HTTP requests is because you want to have both the flexibility of responding to specific error conditions as well as automatically handling all the ones you are not interested in.

The design I ended up with is that I have an ApiError type. All the internal errors that the library could encounter (curl errors etc.) are automatically converted into an ApiError. If you send a request the return value is as such Result<ApiResponse, ApiError>. However the trick here is that at this level no HTTP error (other than connection errors) is actually stored as ApiError. Instead also a failed response (because for instance of a 404) is stored as the actual response object.

On the response object you can check the status of the response with these methods:

pub fn status(&self) -> u32 { self.status }
pub fn failed(&self) -> bool { self.status >= 400 && self.status <= 600 }
pub fn ok(&self) -> bool { !self.failed() }

However what's nice is that most of the time you don't have to do any of this. The response method also provides a method to conver non successful responses into errors like this:

pub fn to_result(self) -> ApiResult<ApiResponse> {
    if self.ok() {
        return Ok(self);
    }
    if let Ok(err) = self.deserialize::<ErrorInfo>() {
        if let Some(detail) = err.detail {
            return Err(ApiError::Http(self.status(), detail));
        }
    }
    Err(ApiError::Http(self.status(), "generic error".into()))
}

This method consumes the response and depending on the condition of the response returns different results. If everything was fine the response is returned unchanged. However if there was an error we first try to deserialize the body with our own ErrorInfo which is the JSON response our API returns or otherwise we fall back to a generic error message and the status code.

What's deserialize? It just invokes serde for deserialization:

pub fn deserialize<T: Deserialize>(&self) -> ApiResult<T> {
    Ok(serde_json::from_reader(match self.body {
        Some(ref body) => body,
        None => &b""[..],
    })?)
}

One thing you can see here is that the body is buffered into memory entirely. I was torn on this in the beginning but it actually turns out to make the API significantly nicer because it allows you to reason about the response better. Without buffering up everything in memory it becomes much harder to do conditional things based on the body. For the cases where we cannot deal with this limitation I have extra methods to stream the incoming data.

On deserialization we match on the body. The body is an Option<Vec<u8>> here which we convert into a &[u8] which satisfies the Read interface which we can then use for deserialization.

The nice thing about the aforementioned to_result method is that it works just so nice. The common case is to convert something into a result and to then deserialize the response if everything is fine. Which is why we have this convert method:

pub fn convert<T: Deserialize>(self) -> ApiResult<T> {
    self.to_result().and_then(|x| x.deserialize())
}

Complex Uses

There are some really nice uses for this. For instance here is how we check for updates from the GitHub API:

pub fn get_latest_release(&self) -> ApiResult<Option<(String, String)>>
{
    let resp = self.get("https://api.github.com/repos/getsentry/sentry-cli/releases/latest")?;
    if resp.status() != 404 {
        let info : GitHubRelease = resp.to_result()?.convert()?;
        for asset in info.assets {
            if asset.name == REFERENCE_NAME {
                return Ok(Some((
                    info.tag_name,
                    asset.browser_download_url
                )));
            }
        }
    }
    Ok(None)
}

Here we silently ignore a 404 but otherwise we parse the response as GitHubRelease structure and then look through all the assets. The call to to_result does nothing on success but it will handle all the other response errors automatically.

To get an idea how the structures like GitHubRelease are defined, this is all that is needed:

#[derive(Debug, Deserialize)]
struct GitHubAsset {
    browser_download_url: String,
    name: String,
}

#[derive(Debug, Deserialize)]
struct GitHubRelease {
    tag_name: String,
    assets: Vec<GitHubAsset>,
}

Curl Handle Management

One thing that is not visible here is how I manage the curl handles. Curl is a C library and the Rust binding to it is quite low level. While it's well typed and does not require unsafe code to use, it still feels very much like a C library. In particular there is a curl "easy" handle object you are supposed to keep hanging around between requests to take advantage of keepalives. However the handles are stateful. Readers of this blog are aware that there are few things I hate as much as unnecessary stateful APIs. So I made it as stateless as possible.

The "correct" thing to do would be to have a pool of "easy" handles. However in my case I never have more than one request outstanding at the time so instead of going with something more complex I stuff away the "easy" handle in a RefCell. A RefCell is a smart pointer that moves the borrow semantics that rust normally requires at compile time to runtime. This is rougly how this looks:

pub struct ApiRequest<'a> {
    handle: RefMut<'a, curl::easy::Easy>
}

pub struct Api {
    shared_handle: RefCell<curl::easy::Easy>,
    ...
}

impl Api {
    pub fn request(&self, method: Method, url: &str)
        -> ApiResult<ApiRequest<'a>>
    {
        let mut handle = self.shared_handle.borrow_mut();
        ApiRequest::new(handle, method, &url)
    }
}

This way if you call request twice you will get a runtime panic if the last request is still outstanding. This is fine for what I do. The ApiRequest object itself implements a builder like pattern where you can modify the object with chaining calls. This is roughly how this looks like when used for a more complex situation:

pub fn send_event(&self, event: &Event) -> ApiResult<String> {
    let dsn = self.config.dsn.as_ref().ok_or(Error::NoDsn)?;
    let event : EventInfo = self.request(Method::Post, &dsn.get_submit_url())?
        .with_header("X-Sentry-Auth", &dsn.get_auth_header(event.timestamp))?
        .with_json_body(&event)?
        .send()?.convert()?;
    Ok(event.id)
}

Lessons Learned

My key takeaways from doing this in Rust so far have been:

  • Rust is definitely a great choice for building command line utilities. The ecosystem is getting stronger by the day and there are so many useful crates already for very common tasks.
  • The cross platform support is superb. Getting the windows build going was easy cake compared to the terror you generally go through with other languages (including Python).
  • serde is a pretty damn good library. It's a shame it's not as nice to use on stable rust. Can't wait for this stuff to get more stable.
  • Result objects in rust are great but sometimes it makes sense to not immediately convert data into a result object. I originally converted failure responses into errors immediately and that definitely hurt the convenience of the APIs tremendously.
  • Don't be afraid of using C libraries like curl instead of native Rust things. It turns out that Rust's build support is pretty magnificent which makes installing the rust curl library straightforward. It even takes care of compiling curl itself on Windows.

If you want to see the code, the entire git repository of the client can be found online: getsentry/sentry-cli.

This entry was tagged api, http, rest and rust