Copyright (c) 2020 Fred Morris Tacoma WA 98445 USA. Apache 2.0 license
Because I needed it to be faster, of course! In the past I would have done this in C, but I stumbled across PyO3. I was initially skeptical but it worked pretty darned well. The one thing which would have made the experience truly better was better insight upfront as to what the lift and result would look like. In that spirit, here's what I produced.
Assuming you've got a functional Rust and Cargo environment, invoke run.sh
in the toplevel directory.
On the plus side:
- Supports the usual Python types:
- strings
- ints and floats
- lists
- tuples
- dicts
- value-or-
None
semantics
- Doc strings
- Lets you create Python classes in Rust
- Binary
.so
imports as a "typical" Python module
On the other hand:
- Can't subclass the classes Not a deal breaker, otherwise they work and act like normal python classes. Rust doesn't support subclassing, so it doesn't violate the spirit of Rust.
- Node -like ecosystem I don't consider this a plus.
- 4 MB binary
.so
filecargo build
produces an.so
which is over 4MB, even with--release
.
I've got observation data sitting in files where each line is a record consisting of tab-separated values. The first few fields are always present, then there are an arbitrary number of attribute + value pairings.
The immediate calling code in Python looks a lot like this:
with open('data.tsv') as f:
for line in f:
observation = BaseDevice(line)
if not observation.valid():
continue
# Do some stuff
Notice the first thing I do after turning the line into an object is to make sure it's valid! I could have wrapped it in a try:
but my instinct
is that it's faster this way.
The object attributes are native Rust types. They're made visible by the #[pyo3(get)]
macro. It supports put (#[pyo3(get,put)]
) as well, but
I don't want/need that. Frankly the reason they're visible at all is for debugging and screwing around. I don't actually access the attributes
this way in production.
The constructor is the new()
method, decorated with the #[new]
macro.
You'll notice that the constructor takes a String
but the attr()
method takes a &str
and there's a reason for this. We're storing the string passed
to the constructor, and we want it allocated in the (Python) heap.
Option<T>
implements this. You need to declare the type which will be returned when a value is present, for example Option<u32>
to return an
unsigned integer. Option
is a Rust enumeration, which is a misnomer because they're not numerate. In any case it can be either a Some<T>
or None
.
The Err<T>
type expects a value, but we don't care what the value is. Rust will complain if you declare a variable and never use it, but
allows the special variable name _
for unused values.