Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PySRSequenceRegressor #677

Open
wants to merge 198 commits into
base: master
Choose a base branch
from
Open

Conversation

wenbang24
Copy link

Resolve #94

Adds ability to use symbolic regression on 1D recurrent array.
Added new keyword argument to PySRRegressor called recursive_history_length. Must be at least 1, or will throw a ValueError.

Here an example of how it works:
Suppose you have the array [1,2,3,4,5,6,7,8,9] and recursive_history_length = 3.
Then the feature array will be

[[1 2 3]
 [2 3 4]
 [3 4 5]
 [4 5 6]
 [5 6 7]
 [6 7 8]]

And the target array will be:

[4, 5, 6, 7, 8, 9]

And it's the same from there.

@MilesCranmer
Copy link
Owner

MilesCranmer commented Jul 21, 2024

Thanks! Regarding the design, I wonder if this shouldn’t go into PySRRegressor itself, but rather be a separate input transform that you would compose with scikit-learn’s Pipeline: https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html#sklearn.pipeline.Pipeline

This would be nice because it would automatically update .predict as well without extra internal code.

Basically I think PySRRegressor is already too large and I’d like to break it into smaller, composable pieces, so it’s easier for users to implement custom behavior.

Alternatively maybe this could be a separate class PySRSequenceRegressor that inherits from PySRRegressor but wraps the methods with transforms.

Also please add some tests, as well as tests for how this interacts with X_units, variable_names, weights, complexity_of_variables, etc.

wenbang24 and others added 4 commits July 23, 2024 19:49
it inherits PySRRegressor and changes __init__ (new recursive_history_length hyperparameter) and run (preprocessing data so it works with everything else)
also got rid of stuff in PySRRegressor
also changed __init__.py to import new class
a lot copied from the PySRRegressor tests :)
@wenbang24
Copy link
Author

test_sequence_weighted_bumper test still isn't passing; how do the weights work?

pysr/sr.py Outdated Show resolved Hide resolved
pysr/sr.py Outdated Show resolved Hide resolved
@wenbang24
Copy link
Author

wenbang24 commented Jul 23, 2024

One test (test_sequence_noisy_builtin_variable_names) isn't passing because set_params doesn't work when PySRSequenceRegressor calls super().__init__()

pysr/sr.py Outdated Show resolved Hide resolved
pysr/sr.py Outdated Show resolved Hide resolved
pysr/sr.py Outdated Show resolved Hide resolved
pysr/sr.py Outdated Show resolved Hide resolved
pysr/sr.py Outdated Show resolved Hide resolved
Copy link
Owner

@MilesCranmer MilesCranmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment: should PySRSequenceRegressor allow for multiple features? Right now it is just a scalar feature over time. But what if you want to have two features over time, like X_t and Z_t, and get recursive information for both of them?

(Just wondering why/why not)

@MilesCranmer
Copy link
Owner

I think for weights, the time series approach should simply weight by time. So the weights should be shape (n_times,). It should be one weight for every predicted value.

docs/examples.md Outdated Show resolved Hide resolved
@MilesCranmer
Copy link
Owner

I think stuff breaks without np.array(X)

Weird, normally scikit-learn's checking function should automatically convert to a numpy array. The normal PySRRegressor you can pass it lists or pandas dataframes and it will get converted by scikit-learn to a numpy array. Is there a reason scikit-learn's conversion doesn't work here?

@wenbang24
Copy link
Author

yeah it works mb
but the examples look a bit weird with X.append(X[-1][0] + X[-2][0])
should i do that or X = np.append(X, X[-1] + X[-2])

@wenbang24
Copy link
Author

wenbang24 commented Aug 29, 2024

also theres a test failing in TestDimensionalConstraints and idk what's going on with it, but I think its something to do with a display_variable_names parameter messing up positional arguments

@wenbang24
Copy link
Author

ok i stopped the error; is that the right way to fix it?

@wenbang24
Copy link
Author

how do we fix the typing issue??
im stumped

is this cheating lol
@wenbang24
Copy link
Author

yo all checks passed 🎉

@wenbang24
Copy link
Author

@MilesCranmer anything else I need to do before merging?

@MilesCranmer
Copy link
Owner

MilesCranmer commented Sep 6, 2024

Sorry for not reviewing the latest yet, got back from holiday recently and still catching up.

In the meantime you might consider some “dogfooding” (https://en.m.wikipedia.org/wiki/Eating_your_own_dog_food) of PySRSequenceRegressor on some real problems that you have in mind? Just by using it on some real problems you will be able to find ways of improving the API. Maybe look up some math sequences or simple physics problems and try it out, and see what breaks and what works well, and what parts of the API are easy and what parts are confusing. (This is how I develop PySR in general - I’m both a user and a dev!)

Anyways I hope to get to reviewing soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] PySRSequenceRegressor
3 participants