-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introducing formulaic
, a high-performance patsy
"competitor"
#157
Comments
Hey, that's pretty cool! patsy has turned out to be pretty solid, so my lack of time to maintain it hasn't been a huge problem, but there definitely is a lot of potential for something better. Random questions:
|
Wow... I wasn't expecting so fast a reply! :). How compatible is the formula language with patsy's? How compatible is the API?
It also allows arbitrarily many Some features are not yet present, for example the specific implementations of various categorical/spline encoding strategies (but that is less a framework thing atm then those functions simply not being implemented; the framework supports everything needed by those encoding algorithms... I just haven't needed it in my use-case). The API is fully object-oriented with hooks to allow users to override behaviour with their own subclasses of various pieces, and reuse the rest of the tooling as is. As such, the API is very different. With that said, I've toyed with the notion of creating some patsy shim wrappers that expose the same API as patsy for common operations. Do you have nice error messages on parse errors? (I just really like this patsy feature ;-)) Did you copy patsy's novel model matrix building algorithm that fixes some bugs in R? Given this vs this, it seems incorporating your changes into patsy somehow might be a good way to get them out to users ... how viable do you think that would be? |
Greetings all,
Late last year I had the need to generate sparse model matrices from large pandas DataFrames (dense model matrices would not fit in memory for the dataset I was using). I originally set about trying to patch patsy, but the code was not set up to allow overriding individual methods, and since I felt it would be a didactic experience in any case, I decided to rewrite something like patsy from scratch. The result is Formulaic.
I wasn't expecting much more than the addition of sparse matrix support, but it seems I've also managed to improve the performance of model matrix generation by (in many cases) orders of magnitude, even beating R in many cases. I'm in the process of writing up documentation, and there is some low-hanging fruit in terms of improvements, but I'd love to get some eyes on the project, and would welcome feedback.
The text was updated successfully, but these errors were encountered: