-
Notifications
You must be signed in to change notification settings - Fork 0
Design exploration #2
base: master
Are you sure you want to change the base?
Conversation
…ssible to just "drop in" a loss function.
…eprintGenerator for Blueprints
I have added another trait, called |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good!
I think I need to see a few example workflows using this trait set to really consider the ergonomics. If I have some time over the next couple days, I'll take a crack at adding some example code to this pull request (most likely with no-op estimators).
src/lib.rs
Outdated
/// In the same way, it has no notion of loss or "correct" predictions. | ||
/// Those concepts are embedded elsewhere. | ||
pub trait Model { | ||
type Input; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to think whether Input
and Output
should be associated types or struct generics (Model<Input, Output>
). It's definitely possible that a trained model could be implemented to provide predictions over multiple types of input / output. For instance, we could have a model defined over ndarray input, or dataframe input, or even a Vec<T>
.
I could also see a case for Model<Input>
with Output
being an associated type -- given a particular input, the output could only be a specific type.
src/lib.rs
Outdated
/// This means that there is no difference between one-shot training and incremental training. | ||
/// Furthermore, the optimizer doesn't have to "own" the model or know anything about its hyperparameters, | ||
/// because it never has to initialize it. | ||
pub trait Optimizer<M> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wording: Optimizer
or something like Estimator
? Optimizer
might be confusing given that some algorithms are actually optimization algorithms, but others aren't.
src/lib.rs
Outdated
/// Each of these strategies can take different (hyper)parameters, even though they return an | ||
/// instance of the same model type in the end. | ||
/// | ||
/// The initialization procedure could be data-dependent, hence the signature of `initialize`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit concerned about potential user confusion about what should be put in the Blueprint
's initialize
method vs Optimizer
's train
method, given the similarities in method signatures (they both take input and targets, they both return models).
What would be an example of a workflow with a data-dependent initialization? Are there any other options for handling that initialization?
I have been trying to sketch out some usage and I ran into some of the issues you have identified.
Looking back at what I have written, and at your comments @jblondin, I can see how this draft fails to accomodate some of these requirements. I'll try to put down a revised sketch tonight, ideally with some example code using no-op estimators or very simple estimators (computing the mean). |
Some brief thoughts on the goals:
Agreed. I also would prefer a mutation-free workflow (which you already have with the optimizer consuming the model and creating a new one). In other words, nothing like this: // would NOT prefer this style
let mut model = Model::from(blueprint);
model.train(train_data, targets).;
let predictions = model.predict(test_data);
So, a trait-based
This seems like it would be useful for transfer learning tasks -- taking a model trained with one algorithm / data set, and then updated it (or a subset of it) with another algorithm. The model could even support different components that are trained differently. In the deep learning / CNN use case, the convolutional layers are usually transferred, and the fully-connected neural network at the 'end' of the network is retrained for the new learning problem.
Agreed. A pipeline could have a pipeline component. I would love to be able to just define a SVM pipeline, then use that as a component in a bayesian optimization pipeline for model selection without needed new 'concepts'. |
Same feeling - Rust gives us move semantics, which we can use to have optimized routines (using mutation inside the method) while still providing a more side-effect free API to consumers.
This was exactly one of my driving examples. I have done another iteration, unfortunately I didn't master the time to provide a code example, but I'd still appreciate your feedback @jblondin.
|
I have added a first, very simple example: standard scaler, supporting one-off and incremental computation of both mean and standard deviation. The main issue I experienced is around optimizers: I had to modify both |
Sorry for the delay in getting to this! I've been a bit backed up the past week or so. I'm going to have to give this some thought, but here's a few quick comments...
My initial reaction is that the number of samples should actually be an update to the config ( Of course, even if you do pass a configuration to I feel like this demonstrates that this |
One more thought - I like using the generic name We could also have |
I think I prefer the original workflow, without the separate pub trait Blueprint<I, O> {
type Transformer: Transformer<I, O>;
fn initialize(&self) -> Self::Transformer;
}
pub trait Fit<T, I, O>
where
T: Transformer<I, O>
{
type Error: error::Error;
fn fit(&self, inputs: &I, targets: &O, transformer: T) -> Result<T, Self::Error>;
} with an example workflow let blueprint = SomeConfig::new();
let model = my_algorithm.fit(&train, &targets, blueprint.initialize())?;
let preds = model.transform(&test)?;
// generate new batch of input
let model = my_algorithm.fit(&new_train, &new_targets, model)?;
let better_preds = model.transform(&self)?; |
Sorry, should've added a couple more thoughts to my last comment. This would require a bit more 'weight' to the On the plus side, this avoids any modification to the I'm sure there are workflow quirks we're not considering at this point -- I feel like we're close to the point where we should prototype something and start iterating as we implement different models, algorithms, and data science workflows. |
let (x, y) = generate_batch(n_samples); | ||
|
||
let mut optimizer = OnlineOptimizer::default(); | ||
let standard_scaler = optimizer.fit(&x, &y, Config::default())?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing the config in fit time might make difficult to compose estimators. Say if the esimator is a pipeline of estimators we wouldn't want to pass all the config in a fit. Having two steps a) building the pipeline b) fitting it is more natural IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two things are not mutually exclusive I'd say. You could compose the configuration of all steps in the pipeline and then pass that in when you want to fit it, it shouldn't look very different.
But I have yet to actually prototype it, so take it with a grain of salt.
check(&standard_scaler, &x)?; | ||
|
||
let (x2, y2) = generate_batch(n_samples); | ||
let standard_scaler = optimizer.incremental_fit(&x2, &y2, standard_scaler)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this conceptually difficult to follow. If anything I would have expected,
standard_scaler.incremental_fit(&x2, &y2, &optimizer)
not the other way around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a good point. It should be easy enough to flip it.
&mut self, | ||
inputs: &Input<S>, | ||
_targets: &Output, | ||
blueprint: Config, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not call this config
or params
? Is there any other ML library that uses the "blueprint" vocabulary?
params
as in scikit-learn would be more accurate IMO -- we are not providing model configuration but model parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Params is kind of an overloaded term: in this case, I'd say that we are passing hyperparameters (e.g. number of convolutional layers in a CNN), not parameters (e.g. the network weights).
I think it's quite natural to call the set of model hyperparameters model configuration
.
We can safely discard the blueprint terminology, but I'd try to stick to terms that are not ambiguous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you prefer "model configuration" to "hyperparameter" for that purpose?
(Is it the sole fact that Config
is smaller?)
#[macro_use] | ||
extern crate derive_more; | ||
|
||
use crate::standard_scaler::{Config, OnlineOptimizer, ScalingError, StandardScaler}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if we have several models, this means we would need to use the full paths, e.g.
use crate::standard_scaler;
use crate::linear_model::logistic_regression;
let mut standard_scaler_optimizer = standard_scaler::OnlineOptimizer::default();
let standard_scaler = standard_scaler_optimizer.fit(&x, &y, standard_scaler::Config::default())?;
let X_tr, y_tr = optimizer.transform(&x, &y);
let mut logregr_optimizer = logistic_regression::OnlineOptimizer::default();
let standard_scaler = logregr_optimizer.fit(&x, &y, logistic_regression::Config::default())?;
which might become somewhat difficult to manage?
Also purely from the user experience and readability (I understand this has other advantages) I find the builder pattern in rustlearn somewhat simpler because one doesn't have to deal with the optimizer.
The main issue is correctness: after you have called
I do agree 100%. I have been a little bit busy lately with a bunch of side projects, but now I should be able to get focused on it again. Should we do a list of models that we should start with, ideally giving us a sufficiently diverse range of quirks to enable design validation @jblondin? |
If Conceptually, I don't think the model and optimizer should be so interwoven that the optimizer is absolutely required to help build the initial model (which would demand a Can you give me an example of a violation of 'correctness' in this context? I feel like you could still effectively apply local reasoning in my example workflow.
Sounds good. I'll start giving it some thought! |
I have found a couple of repos that should allow me to get a sizeable collection of algorithms up and running in a short amount of time:
They use just NumPy and vanilla Python, so it should be quite straight-forward to port them to Rust using |
I have started with rust-ndarray/ndarray-linalg#166 👀 |
I have started to play around with some traits to explore how we could structure the different concepts in a ML workflow.
For now I have kept it very simple:
Model
trait (should it be renamed toTransformer
?);Blueprint
trait (serving as initializer forModel
types, it holds the model configuration);Optimizer
trait (encoding the training step).For the same
Model
type we could potentially have multipleBlueprint
s, each one providing a different parametrization of the space of possible models, as well as multipleOptimizer
s.Model
, as defined here, could potentially be used to represent any kind of transformation (e.g. preprocessing steps).I am now trying to come down with something to encode the concept of pipeline or network of transformations, but I have not nailed it down yet.