Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dimensional constraints (backend version 0.21) #389

Merged
merged 30 commits into from
Jul 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
542657a
Update backend version to v0.21.1
MilesCranmer Jul 22, 2023
22c66be
Let `equation_search` make default variable if not given
MilesCranmer Jul 22, 2023
61ee61c
Ensure registry is up-to-date
MilesCranmer Jul 22, 2023
117b2c3
Clean up subscriptification
MilesCranmer Jul 22, 2023
e5a9067
Create pretty variable names for print outs
MilesCranmer Jul 22, 2023
ad6d652
Fix custom variable names
MilesCranmer Jul 22, 2023
13d5805
Fix for when pretty_feature_names_in_ is undefined
MilesCranmer Jul 22, 2023
42005bd
Enable dimensional constraints
MilesCranmer Jul 22, 2023
2d025c2
Add test for units
MilesCranmer Jul 22, 2023
04454ac
Add unittests for units checks
MilesCranmer Jul 22, 2023
4f7e6cf
Fix select_k_features + units
MilesCranmer Jul 22, 2023
0e86456
Add custom operators to unit test
MilesCranmer Jul 22, 2023
fbd0ad8
Test empty units as well
MilesCranmer Jul 23, 2023
cbfdb9b
Bump backend version with empty units fix
MilesCranmer Jul 23, 2023
a117981
Test unit propagation
MilesCranmer Jul 23, 2023
0e15dd6
Test multiple output units
MilesCranmer Jul 23, 2023
af0be92
Add print_precision and dimensional_constraint_penalty
MilesCranmer Jul 23, 2023
ce904eb
Remove need for YAML in unittests
MilesCranmer Jul 23, 2023
e9fbda8
Move param_groupings file to `pysr` folder
MilesCranmer Jul 23, 2023
2c97b85
Fix warning message about pkl file not found
MilesCranmer Jul 23, 2023
2621e9c
Fix for operator assertion - can set EITHER unary or binary in csv lo…
MilesCranmer Jul 23, 2023
22eb380
Add test for pickling + units
MilesCranmer Jul 23, 2023
e0c5bae
Bump backend version with batching fix
MilesCranmer Jul 27, 2023
4ca54a5
Describe batched objective in docstring
MilesCranmer Jul 27, 2023
abd0cfa
Set `y_variable_names` in output
MilesCranmer Jul 27, 2023
db8bfce
Add warm start test
MilesCranmer Jul 27, 2023
622c1b9
Bump ClusterManagers.jl version
MilesCranmer Jul 28, 2023
1faa2d4
Fix pandas version lower bound
MilesCranmer Jul 28, 2023
5527c70
Bump version with dimensional constraints
MilesCranmer Jul 28, 2023
1e1bd80
Add docs on dimensional constraints
MilesCranmer Jul 28, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 89 additions & 1 deletion docs/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -433,9 +433,97 @@ equal to:
$\frac{x_0^2 x_1 - 2.0000073}{x_2^2 - 1.0000019}$, which
is nearly the same as the true equation!

## 10. Dimensional constraints

One other feature we can exploit is dimensional analysis.
Say that we know the physical units of each feature and output,
and we want to find an expression that is dimensionally consistent.

## 10. Additional features
We can do this as follows, using `DynamicQuantities.jl` to assign units,
passing a string specifying the units for each variable.
First, let's make some data on Newton's law of gravitation, using
astropy for units:

```python
import numpy as np
from astropy import units as u, constants as const

M = (np.random.rand(100) + 0.1) * const.M_sun
m = 100 * (np.random.rand(100) + 0.1) * u.kg
r = (np.random.rand(100) + 0.1) * const.R_earth
G = const.G

F = G * M * m / r**2
```

We can see the units of `F` with `F.unit`.

Now, let's create our model.
Since this data has such a large dynamic range,
let's also create a custom loss function
that looks at the error in log-space:

```python
loss = """function loss_fnc(prediction, target)
scatter_loss = abs(log((abs(prediction)+1e-20) / (abs(target)+1e-20)))
sign_loss = 10 * (sign(prediction) - sign(target))^2
return scatter_loss + sign_loss
end
"""
```

Now let's define our model:

```python
model = PySRRegressor(
binary_operators=["+", "-", "*", "/"],
unary_operators=["square"],
loss=loss,
complexity_of_constants=2,
maxsize=25,
niterations=100,
populations=50,
# Amount to penalize dimensional violations:
dimensional_constraint_penalty=10**5,
)
```

and fit it, passing the unit information.
To do this, we need to use the format of [DynamicQuantities.jl](https://symbolicml.org/DynamicQuantities.jl/dev/#Usage).

```python
# Get numerical arrays to fit:
X = pd.DataFrame(dict(
M=M.value,
m=m.value,
r=r.value,
))
y = F.value

model.fit(
X,
y,
X_units=["Constants.M_sun", "kg", "Constants.R_earth"],
y_units="kg * m / s^2"
)
```

You can observe that all expressions with a loss under
our penalty are dimensionally consistent!
(The `"[⋅]"` indicates free units in a constant, which can cancel out other units in the expression.)
For example,

```julia
"y[m s⁻² kg] = (M[kg] * 2.6353e-22[⋅])"
```

would indicate that the expression is dimensionally consistent, with
a constant `"2.6353e-22[m s⁻²]"`.

Note that this expression has a large dynamic range so may be difficult to find. Consider searching with a larger `niterations` if needed.


## 11. Additional features

For the many other features available in PySR, please
read the [Options section](options.md).
2 changes: 1 addition & 1 deletion docs/gen_param_docs.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ def str_param_groups(param_groupings, params, cur_heading=2):
if __name__ == "__main__":
# This is the path to the param_groupings.yml file
# relative to the current file.
path = "param_groupings.yml"
path = "../pysr/param_groupings.yml"
with open(path, "r") as f:
param_groupings = safe_load(f)

Expand Down
4 changes: 2 additions & 2 deletions pysr/julia_helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -259,15 +259,15 @@ def init_julia(julia_project=None, quiet=False, julia_kwargs=None, return_aux=Fa

def _add_sr_to_julia_project(Main, io_arg):
Main.eval("using Pkg")
Main.eval("Pkg.Registry.update()")
Main.sr_spec = Main.PackageSpec(
name="SymbolicRegression",
url="https://github.com/MilesCranmer/SymbolicRegression.jl",
rev="v" + __symbolic_regression_jl_version__,
)
Main.clustermanagers_spec = Main.PackageSpec(
name="ClusterManagers",
url="https://github.com/JuliaParallel/ClusterManagers.jl",
rev="14e7302f068794099344d5d93f71979aaf4fbeb3",
version="0.4",
)
Main.eval(f"Pkg.add([sr_spec, clustermanagers_spec], {io_arg})")

Expand Down
3 changes: 3 additions & 0 deletions docs/param_groupings.yml → pysr/param_groupings.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
- loss
- full_objective
- model_selection
- dimensional_constraint_penalty
- Working with Complexities:
- parsimony
- constraints
Expand Down Expand Up @@ -72,12 +73,14 @@
- fast_cycle
- turbo
- enable_autodiff
- Determinism:
- random_state
- deterministic
- warm_start
- Monitoring:
- verbosity
- update_verbosity
- print_precision
- progress
- Environment:
- temp_equation_file
Expand Down
Loading
Loading