Spaces:

MilesCranmer
/

PySR

Running

App Files Files Community

MilesCranmer commited on Mar 22, 2023

Commit

ff874cd

unverified ·

2 Parent(s): 048be9c ccbe668

Merge tag 'v0.12.0' into custom-objectives

Browse files

Files changed (8) hide show

README.md +8 -14
docs/examples.md +35 -1
pysr/__init__.py +1 -0
pysr/julia_helpers.py +3 -0
pysr/sklearn_monkeypatch.py +13 -0
pysr/sr.py +41 -6
pysr/test/test.py +17 -0
pysr/version.py +2 -2

README.md CHANGED Viewed

@@ -8,7 +8,6 @@ https://user-images.githubusercontent.com/7593028/188328887-1b6cda72-2f41-439e-a
 </div>
 PySR uses evolutionary algorithms to search for symbolic expressions which optimize a particular objective.
 <div align="center">
@@ -19,13 +18,11 @@ PySR uses evolutionary algorithms to search for symbolic expressions which optim
 </div>
 (pronounced like *py* as in python, and then *sur* as in surface)
 If you find PySR useful, please cite it using the citation information given in [CITATION.md](https://github.com/MilesCranmer/PySR/blob/master/CITATION.md).
 If you've finished a project with PySR, please submit a PR to showcase your work on the [Research Showcase page](https://astroautomata.com/PySR/papers)!
 <div align="center">
 ### Test status
@@ -33,10 +30,9 @@ If you've finished a project with PySR, please submit a PR to showcase your work
 | **Linux** | **Windows** | **macOS (intel)** |
 |---|---|---|
 |[![Linux](https://github.com/MilesCranmer/PySR/actions/workflows/CI.yml/badge.svg)](https://github.com/MilesCranmer/PySR/actions/workflows/CI.yml)|[![Windows](https://github.com/MilesCranmer/PySR/actions/workflows/CI_Windows.yml/badge.svg)](https://github.com/MilesCranmer/PySR/actions/workflows/CI_Windows.yml)|[![macOS](https://github.com/MilesCranmer/PySR/actions/workflows/CI_mac.yml/badge.svg)](https://github.com/MilesCranmer/PySR/actions/workflows/CI_mac.yml)|
-| **Docker** | **Conda** | **Coverage** |
 |[![Docker](https://github.com/MilesCranmer/PySR/actions/workflows/CI_docker.yml/badge.svg)](https://github.com/MilesCranmer/PySR/actions/workflows/CI_docker.yml)|[![conda-forge](https://github.com/MilesCranmer/PySR/actions/workflows/CI_conda_forge.yml/badge.svg)](https://github.com/MilesCranmer/PySR/actions/workflows/CI_conda_forge.yml)|[![Coverage Status](https://coveralls.io/repos/github/MilesCranmer/PySR/badge.svg?branch=master&service=github)](https://coveralls.io/github/MilesCranmer/PySR)|
 </div>
 PySR is built on an extremely optimized pure-Julia backend: [SymbolicRegression.jl](https://github.com/MilesCranmer/SymbolicRegression.jl).
@@ -47,14 +43,13 @@ to find algebraic relations that approximate a dataset.
 One can also
 extend these approaches to higher-dimensional
-spaces by using a neural network as proxy, as explained in
 [2006.11287](https://arxiv.org/abs/2006.11287), where we apply
 it to N-body problems. Here, one essentially uses
 symbolic regression to convert a neural net
 to an analytic equation. Thus, these tools simultaneously present
 an explicit and powerful way to interpret deep models.
 *Backstory:*
 Previously, we have used
@@ -68,19 +63,18 @@ of this package is to have an open-source symbolic regression tool
 as efficient as eureqa, while also exposing a configurable
 python interface.
 # Installation
 <div align="center">
 | pip - **recommended** <br> (works everywhere) | conda <br>(Linux and Intel-based macOS) | docker <br>(if all else fails) |
 |---|---|---|
-| 1. [Install Julia](https://julialang.org/downloads/)<br>2. Then, run: `pip install -U pysr`<br>3. Finally, to install Julia packages:<br>`python -c 'import pysr; pysr.install()'` | `conda install -c conda-forge pysr` | 1. Clone this repo.<br>2. `docker build -t pysr .`<br>Run with:<br>`docker run -it --rm pysr ipython`
 </div>
 Common issues tend to be related to Python not finding Julia.
-To debug this, try running `python -c 'import os; print(os.environ["PATH"])'`.
 If none of these folders contain your Julia binary, then you need to add Julia's `bin` folder to your `PATH` environment variable.
 **Running PySR on macOS with an M1 processor:** you should use the pip version, and make sure to get the Julia binary for ARM/M-series processors.
@@ -136,7 +130,7 @@ model.fit(X, y)
 Internally, this launches a Julia process which will do a multithreaded search for equations to fit the dataset.
-Equations will be printed during training, and once you are satisfied, you may
 quit early by hitting 'q' and then \<enter\>.
 After the model has been fit, you can run `model.predict(X)`
@@ -167,9 +161,9 @@ This arrow in the `pick` column indicates which equation is currently selected b
 `model_selection` strategy for prediction.
 (You may change `model_selection` after `.fit(X, y)` as well.)
-`model.equations_` is a pandas DataFrame containing all equations, including callable format
 (`lambda_format`),
-SymPy format (`sympy_format` - which you can also get with `model.sympy()`), and even JAX and PyTorch format
 (both of which are differentiable - which you can get with `model.jax()` and `model.pytorch()`).
 Note that `PySRRegressor` stores the state of the last search, and will restart from where you left off the next time you call `.fit()`, assuming you have set `warm_start=True`.
@@ -181,7 +175,7 @@ You may load the model from the `pkl` file with:
 ```python
 model = PySRRegressor.from_file("hall_of_fame.2022-08-10_100832.281.pkl")
-```
 There are several other useful features such as denoising (e.g., `denoising=True`),
 feature selection (e.g., `select_k_features=3`).

 </div>
 PySR uses evolutionary algorithms to search for symbolic expressions which optimize a particular objective.
 <div align="center">
 </div>
 (pronounced like *py* as in python, and then *sur* as in surface)
 If you find PySR useful, please cite it using the citation information given in [CITATION.md](https://github.com/MilesCranmer/PySR/blob/master/CITATION.md).
 If you've finished a project with PySR, please submit a PR to showcase your work on the [Research Showcase page](https://astroautomata.com/PySR/papers)!
 <div align="center">
 ### Test status
 | **Linux** | **Windows** | **macOS (intel)** |
 |---|---|---|
 |[![Linux](https://github.com/MilesCranmer/PySR/actions/workflows/CI.yml/badge.svg)](https://github.com/MilesCranmer/PySR/actions/workflows/CI.yml)|[![Windows](https://github.com/MilesCranmer/PySR/actions/workflows/CI_Windows.yml/badge.svg)](https://github.com/MilesCranmer/PySR/actions/workflows/CI_Windows.yml)|[![macOS](https://github.com/MilesCranmer/PySR/actions/workflows/CI_mac.yml/badge.svg)](https://github.com/MilesCranmer/PySR/actions/workflows/CI_mac.yml)|
+| **Docker** | **Conda** | **Coverage** |
 |[![Docker](https://github.com/MilesCranmer/PySR/actions/workflows/CI_docker.yml/badge.svg)](https://github.com/MilesCranmer/PySR/actions/workflows/CI_docker.yml)|[![conda-forge](https://github.com/MilesCranmer/PySR/actions/workflows/CI_conda_forge.yml/badge.svg)](https://github.com/MilesCranmer/PySR/actions/workflows/CI_conda_forge.yml)|[![Coverage Status](https://coveralls.io/repos/github/MilesCranmer/PySR/badge.svg?branch=master&service=github)](https://coveralls.io/github/MilesCranmer/PySR)|
 </div>
 PySR is built on an extremely optimized pure-Julia backend: [SymbolicRegression.jl](https://github.com/MilesCranmer/SymbolicRegression.jl).
 One can also
 extend these approaches to higher-dimensional
+spaces by using a neural network as proxy, as explained in
 [2006.11287](https://arxiv.org/abs/2006.11287), where we apply
 it to N-body problems. Here, one essentially uses
 symbolic regression to convert a neural net
 to an analytic equation. Thus, these tools simultaneously present
 an explicit and powerful way to interpret deep models.
 *Backstory:*
 Previously, we have used
 as efficient as eureqa, while also exposing a configurable
 python interface.
 # Installation
 <div align="center">
 | pip - **recommended** <br> (works everywhere) | conda <br>(Linux and Intel-based macOS) | docker <br>(if all else fails) |
 |---|---|---|
+| 1. [Install Julia](https://julialang.org/downloads/)<br>2. Then, run: `pip install -U pysr`<br>3. Finally, to install Julia packages:<br>`python3 -c 'import pysr; pysr.install()'` | `conda install -c conda-forge pysr` | 1. Clone this repo.<br>2. `docker build -t pysr .`<br>Run with:<br>`docker run -it --rm pysr ipython`
 </div>
 Common issues tend to be related to Python not finding Julia.
+To debug this, try running `python3 -c 'import os; print(os.environ["PATH"])'`.
 If none of these folders contain your Julia binary, then you need to add Julia's `bin` folder to your `PATH` environment variable.
 **Running PySR on macOS with an M1 processor:** you should use the pip version, and make sure to get the Julia binary for ARM/M-series processors.
 Internally, this launches a Julia process which will do a multithreaded search for equations to fit the dataset.
+Equations will be printed during training, and once you are satisfied, you may
 quit early by hitting 'q' and then \<enter\>.
 After the model has been fit, you can run `model.predict(X)`
 `model_selection` strategy for prediction.
 (You may change `model_selection` after `.fit(X, y)` as well.)
+`model.equations_` is a pandas DataFrame containing all equations, including callable format
 (`lambda_format`),
+SymPy format (`sympy_format` - which you can also get with `model.sympy()`), and even JAX and PyTorch format
 (both of which are differentiable - which you can get with `model.jax()` and `model.pytorch()`).
 Note that `PySRRegressor` stores the state of the last search, and will restart from where you left off the next time you call `.fit()`, assuming you have set `warm_start=True`.
 ```python
 model = PySRRegressor.from_file("hall_of_fame.2022-08-10_100832.281.pkl")
+```
 There are several other useful features such as denoising (e.g., `denoising=True`),
 feature selection (e.g., `select_k_features=3`).

docs/examples.md CHANGED Viewed

@@ -284,7 +284,41 @@ You can get the sympy version of the best equation with:
 model.sympy()
 ```
-## 8. Additional features
 For the many other features available in PySR, please
 read the [Options section](options.md).

 model.sympy()
 ```
+## 8. Complex numbers
+PySR can also search for complex-valued expressions. Simply pass
+data with a complex datatype (e.g., `np.complex128`),
+and PySR will automatically search for complex-valued expressions:
+```python
+import numpy as np
+X = np.random.randn(100, 1) + 1j * np.random.randn(100, 1)
+y = (1 + 2j) * np.cos(X[:, 0] * (0.5 - 0.2j))
+model = PySRRegressor(
+    binary_operators=["+", "-", "*"], unary_operators=["cos"], niterations=100,
+)
+model.fit(X, y)
+```
+You can see that all of the learned constants are now complex numbers.
+We can get the sympy version of the best equation with:
+```python
+model.sympy()
+```
+We can also make predictions normally, by passing complex data:
+```python
+model.predict(X, -1)
+```
+to make predictions with the most accurate expression.
+## 9. Additional features
 For the many other features available in PySR, please
 read the [Options section](options.md).

pysr/__init__.py CHANGED Viewed

@@ -1,3 +1,4 @@
 from .version import __version__
 from .sr import (
     pysr,

+from . import sklearn_monkeypatch
 from .version import __version__
 from .sr import (
     pysr,

pysr/julia_helpers.py CHANGED Viewed

@@ -194,6 +194,9 @@ def init_julia(julia_project=None, quiet=False, julia_kwargs=None, return_aux=Fa
         # Static python binary, so we turn off pre-compiled modules.
         julia_kwargs = {**julia_kwargs, "compiled_modules": False}
         Julia(**julia_kwargs)
     using_compiled_modules = (not "compiled_modules" in julia_kwargs) or julia_kwargs[
         "compiled_modules"

         # Static python binary, so we turn off pre-compiled modules.
         julia_kwargs = {**julia_kwargs, "compiled_modules": False}
         Julia(**julia_kwargs)
+        warnings.warn(
+            "Your system's Python library is static (e.g., conda), so precompilation will be turned off. For a dynamic library, try `pyenv`."
+        )
     using_compiled_modules = (not "compiled_modules" in julia_kwargs) or julia_kwargs[
         "compiled_modules"

pysr/sklearn_monkeypatch.py ADDED Viewed

	@@ -0,0 +1,13 @@

+# Here, we monkey patch scikit-learn until this
+# issue is fixed: https://github.com/scikit-learn/scikit-learn/issues/25922
+from sklearn.utils import validation
+def _ensure_no_complex_data(*args, **kwargs):
+    ...
+try:
+    validation._ensure_no_complex_data = _ensure_no_complex_data
+except AttributeError:
+    ...

pysr/sr.py CHANGED Viewed

@@ -1,5 +1,6 @@
 """Define the PySRRegressor scikit-learn interface."""
 import copy
 import os
 import sys
 import numpy as np
@@ -518,6 +519,8 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
         What precision to use for the data. By default this is `32`
         (float32), but you can select `64` or `16` as well, giving
         you 64 or 16 bits of floating point precision, respectively.
         Default is `32`.
     random_state : int, Numpy RandomState instance or None
         Pass an int for reproducible results across multiple function calls.
@@ -1647,7 +1650,13 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
         )
         # Convert data to desired precision
-        np_dtype = {16: np.float16, 32: np.float32, 64: np.float64}[self.precision]
         # This converts the data into a Julia array:
         Main.X = np.array(X, dtype=np_dtype).T
@@ -1788,9 +1797,9 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
             warnings.warn(
                 "Note: you are running with 10 features or more. "
                 "Genetic algorithms like used in PySR scale poorly with large numbers of features. "
-                "Consider using feature selection techniques to select the most important features "
-                "(you can do this automatically with the `select_k_features` parameter), "
-                "or, alternatively, doing a dimensionality reduction beforehand. "
                 "For example, `X = PCA(n_components=6).fit_transform(X)`, "
                 "using scikit-learn's `PCA` class, "
                 "will reduce the number of features to 6 in an interpretable way, "
@@ -2035,6 +2044,7 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
     def _read_equation_file(self):
         """Read the hall of fame file created by `SymbolicRegression.jl`."""
         try:
             if self.nout_ > 1:
                 all_outputs = []
@@ -2042,7 +2052,11 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
                     cur_filename = str(self.equation_file_) + f".out{i}" + ".bkup"
                     if not os.path.exists(cur_filename):
                         cur_filename = str(self.equation_file_) + f".out{i}"
-                    df = pd.read_csv(cur_filename)
                     # Rename Complexity column to complexity:
                     df.rename(
                         columns={
@@ -2058,7 +2072,10 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
                 filename = str(self.equation_file_) + ".bkup"
                 if not os.path.exists(filename):
                     filename = str(self.equation_file_)
-                all_outputs = [pd.read_csv(filename)]
                 all_outputs[-1].rename(
                     columns={
                         "Complexity": "complexity",
@@ -2067,6 +2084,7 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
                     },
                     inplace=True,
                 )
         except FileNotFoundError:
             raise RuntimeError(
                 "Couldn't find equation file! The equation search likely exited "
@@ -2357,3 +2375,20 @@ def _csv_filename_to_pkl_filename(csv_filename) -> str:
     pkl_basename = base + ".pkl"
     return os.path.join(dirname, pkl_basename)

 """Define the PySRRegressor scikit-learn interface."""
 import copy
+from io import StringIO
 import os
 import sys
 import numpy as np
         What precision to use for the data. By default this is `32`
         (float32), but you can select `64` or `16` as well, giving
         you 64 or 16 bits of floating point precision, respectively.
+        If you pass complex data, the corresponding complex precision
+        will be used (i.e., `64` for complex128, `32` for complex64).
         Default is `32`.
     random_state : int, Numpy RandomState instance or None
         Pass an int for reproducible results across multiple function calls.
         )
         # Convert data to desired precision
+        test_X = np.array(X)
+        is_complex = np.issubdtype(test_X.dtype, np.complexfloating)
+        is_real = not is_complex
+        if is_real:
+            np_dtype = {16: np.float16, 32: np.float32, 64: np.float64}[self.precision]
+        else:
+            np_dtype = {32: np.complex64, 64: np.complex128}[self.precision]
         # This converts the data into a Julia array:
         Main.X = np.array(X, dtype=np_dtype).T
             warnings.warn(
                 "Note: you are running with 10 features or more. "
                 "Genetic algorithms like used in PySR scale poorly with large numbers of features. "
+                "You should run PySR for more `niterations` to ensure it can find "
+                "the correct variables, "
+                "or, alternatively, do a dimensionality reduction beforehand. "
                 "For example, `X = PCA(n_components=6).fit_transform(X)`, "
                 "using scikit-learn's `PCA` class, "
                 "will reduce the number of features to 6 in an interpretable way, "
     def _read_equation_file(self):
         """Read the hall of fame file created by `SymbolicRegression.jl`."""
         try:
             if self.nout_ > 1:
                 all_outputs = []
                     cur_filename = str(self.equation_file_) + f".out{i}" + ".bkup"
                     if not os.path.exists(cur_filename):
                         cur_filename = str(self.equation_file_) + f".out{i}"
+                    with open(cur_filename, "r") as f:
+                        buf = f.read()
+                    buf = _preprocess_julia_floats(buf)
+                    df = pd.read_csv(StringIO(buf))
                     # Rename Complexity column to complexity:
                     df.rename(
                         columns={
                 filename = str(self.equation_file_) + ".bkup"
                 if not os.path.exists(filename):
                     filename = str(self.equation_file_)
+                with open(filename, "r") as f:
+                    buf = f.read()
+                buf = _preprocess_julia_floats(buf)
+                all_outputs = [pd.read_csv(StringIO(buf))]
                 all_outputs[-1].rename(
                     columns={
                         "Complexity": "complexity",
                     },
                     inplace=True,
                 )
         except FileNotFoundError:
             raise RuntimeError(
                 "Couldn't find equation file! The equation search likely exited "
     pkl_basename = base + ".pkl"
     return os.path.join(dirname, pkl_basename)
+_regexp_im = re.compile(r"\b(\d+\.\d+)im\b")
+_regexp_im_sci = re.compile(r"\b(\d+\.\d+)[eEfF]([+-]?\d+)im\b")
+_regexp_sci = re.compile(r"\b(\d+\.\d+)[eEfF]([+-]?\d+)\b")
+_apply_regexp_im = lambda x: _regexp_im.sub(r"\1j", x)
+_apply_regexp_im_sci = lambda x: _regexp_im_sci.sub(r"\1e\2j", x)
+_apply_regexp_sci = lambda x: _regexp_sci.sub(r"\1e\2", x)
+def _preprocess_julia_floats(s: str) -> str:
+    if isinstance(s, str):
+        s = _apply_regexp_im(s)
+        s = _apply_regexp_im_sci(s)
+        s = _apply_regexp_sci(s)
+    return s

pysr/test/test.py CHANGED Viewed

@@ -194,6 +194,20 @@ class TestPipeline(unittest.TestCase):
             print("Model equations: ", model.sympy()[1])
             print("True equation: x1^2")
     def test_empty_operators_single_input_warm_start(self):
         X = self.rstate.randn(100, 1)
         y = X[:, 0] + 3.0
@@ -677,6 +691,9 @@ class TestMiscellaneous(unittest.TestCase):
         check_generator = check_estimator(model, generate_only=True)
         exception_messages = []
         for _, check in check_generator:
             try:
                 with warnings.catch_warnings():
                     warnings.simplefilter("ignore")

             print("Model equations: ", model.sympy()[1])
             print("True equation: x1^2")
+    def test_complex_equations_anonymous_stop(self):
+        X = self.rstate.randn(100, 3) + 1j * self.rstate.randn(100, 3)
+        y = (2 + 1j) * np.cos(X[:, 0] * (0.5 - 0.3j))
+        model = PySRRegressor(
+            binary_operators=["+", "-", "*"],
+            unary_operators=["cos"],
+            **self.default_test_kwargs,
+            early_stop_condition="(loss, complexity) -> loss <= 1e-4 && complexity <= 6",
+        )
+        model.fit(X, y)
+        test_y = model.predict(X)
+        self.assertTrue(np.issubdtype(test_y.dtype, np.complexfloating))
+        self.assertLessEqual(np.average(np.abs(test_y - y) ** 2), 1e-4)
     def test_empty_operators_single_input_warm_start(self):
         X = self.rstate.randn(100, 1)
         y = X[:, 0] + 3.0
         check_generator = check_estimator(model, generate_only=True)
         exception_messages = []
         for _, check in check_generator:
+            if check.func.__name__ == "check_complex_data":
+                # We can use complex data, so avoid this check.
+                continue
             try:
                 with warnings.catch_warnings():
                     warnings.simplefilter("ignore")

pysr/version.py CHANGED Viewed

	@@ -1,2 +1,2 @@
1	- __version__ = "0.11.18"
2	- __symbolic_regression_jl_version__ = "0.15.3"


1	+ __version__ = "0.12.1"
2	+ __symbolic_regression_jl_version__ = "0.16.1"