Spaces:
Running
Running
Commit
·
408a63c
1
Parent(s):
cdd291e
Clean up main docstrings
Browse files- pysr/sr.py +185 -123
pysr/sr.py
CHANGED
|
@@ -230,57 +230,65 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
|
|
| 230 |
|
| 231 |
Parameters
|
| 232 |
----------
|
| 233 |
-
model_selection : str
|
| 234 |
Model selection criterion when selecting a final expression from
|
| 235 |
the list of best expression at each complexity.
|
| 236 |
-
Can be 'accuracy'
|
| 237 |
-
|
| 238 |
-
|
| 239 |
-
|
| 240 |
-
|
| 241 |
-
|
| 242 |
-
|
| 243 |
-
|
| 244 |
-
|
| 245 |
-
|
| 246 |
-
|
| 247 |
-
binary_operators : list[str], default=["+", "-", "*", "/"]
|
| 248 |
List of strings for binary operators used in the search.
|
| 249 |
See the [operators page](https://astroautomata.com/PySR/operators/)
|
| 250 |
for more details.
|
| 251 |
-
|
|
|
|
| 252 |
Operators which only take a single scalar as input.
|
| 253 |
For example, `"cos"` or `"exp"`.
|
| 254 |
-
|
|
|
|
| 255 |
Number of iterations of the algorithm to run. The best
|
| 256 |
equations are printed and migrate between populations at the
|
| 257 |
end of each iteration.
|
| 258 |
-
|
|
|
|
| 259 |
Number of populations running.
|
| 260 |
-
|
|
|
|
| 261 |
Number of individuals in each population.
|
| 262 |
-
|
|
|
|
| 263 |
Limits the total number of evaluations of expressions to
|
| 264 |
-
this number.
|
| 265 |
-
maxsize : int
|
| 266 |
-
Max complexity of an equation.
|
| 267 |
-
maxdepth : int
|
| 268 |
Max depth of an equation. You can use both `maxsize` and
|
| 269 |
`maxdepth`. `maxdepth` is by default not used.
|
| 270 |
-
|
|
|
|
| 271 |
Whether to slowly increase max size from a small number up to
|
| 272 |
the maxsize (if greater than 0). If greater than 0, says the
|
| 273 |
fraction of training time at which the current maxsize will
|
| 274 |
reach the user-passed maxsize.
|
| 275 |
-
|
|
|
|
| 276 |
Make the search return early once this many seconds have passed.
|
| 277 |
-
|
|
|
|
| 278 |
Dictionary of int (unary) or 2-tuples (binary), this enforces
|
| 279 |
maxsize constraints on the individual arguments of operators.
|
| 280 |
E.g., `'pow': (-1, 1)` says that power laws can have any
|
| 281 |
complexity left argument, but only 1 complexity in the right
|
| 282 |
argument. Use this to force more interpretable solutions.
|
| 283 |
-
|
|
|
|
| 284 |
Specifies how many times a combination of operators can be
|
| 285 |
nested. For example, `{"sin": {"cos": 0}}, "cos": {"cos": 2}}`
|
| 286 |
specifies that `cos` may never appear within a `sin`, but `sin`
|
|
@@ -296,7 +304,8 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
|
|
| 296 |
operators, you only need to provide a single number: both
|
| 297 |
arguments are treated the same way, and the max of each
|
| 298 |
argument is constrained.
|
| 299 |
-
|
|
|
|
| 300 |
String of Julia code specifying the loss function. Can either
|
| 301 |
be a loss from LossFunctions.jl, or your own loss written as a
|
| 302 |
function. Examples of custom written losses include:
|
|
@@ -311,7 +320,8 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
|
|
| 311 |
`L1HingeLoss()`, `SmoothedL1HingeLoss(γ)`,
|
| 312 |
`ModifiedHuberLoss()`, `L2MarginLoss()`, `ExpLoss()`,
|
| 313 |
`SigmoidLoss()`, `DWDMarginLoss(q)`.
|
| 314 |
-
|
|
|
|
| 315 |
If you would like to use a complexity other than 1 for an
|
| 316 |
operator, specify the complexity here. For example,
|
| 317 |
`{"sin": 2, "+": 1}` would give a complexity of 2 for each use
|
|
@@ -319,184 +329,231 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
|
|
| 319 |
the `+` operator (which is the default). You may specify real
|
| 320 |
numbers for a complexity, and the total complexity of a tree
|
| 321 |
will be rounded to the nearest integer after computing.
|
| 322 |
-
|
| 323 |
-
|
| 324 |
-
|
| 325 |
-
|
| 326 |
-
|
|
|
|
| 327 |
Multiplicative factor for how much to punish complexity.
|
| 328 |
-
|
|
|
|
| 329 |
Whether to measure the frequency of complexities, and use that
|
| 330 |
instead of parsimony to explore equation space. Will naturally
|
| 331 |
find equations of all complexities.
|
| 332 |
-
|
|
|
|
| 333 |
Whether to use the frequency mentioned above in the tournament,
|
| 334 |
rather than just the simulated annealing.
|
| 335 |
-
|
|
|
|
| 336 |
Initial temperature for simulated annealing
|
| 337 |
(requires `annealing` to be `True`).
|
| 338 |
-
|
| 339 |
-
|
| 340 |
-
|
|
|
|
| 341 |
Stop the search early if this loss is reached. You may also
|
| 342 |
pass a string containing a Julia function which
|
| 343 |
takes a loss and complexity as input, for example:
|
| 344 |
`"f(loss, complexity) = (loss < 0.1) && (complexity < 10)"`.
|
| 345 |
-
|
|
|
|
| 346 |
Number of total mutations to run, per 10 samples of the
|
| 347 |
population, per iteration.
|
| 348 |
-
|
|
|
|
| 349 |
How much of population to replace with migrating equations from
|
| 350 |
other populations.
|
| 351 |
-
|
|
|
|
| 352 |
How much of population to replace with migrating equations from
|
| 353 |
-
hall of fame.
|
| 354 |
-
weight_add_node : float
|
| 355 |
Relative likelihood for mutation to add a node.
|
| 356 |
-
|
|
|
|
| 357 |
Relative likelihood for mutation to insert a node.
|
| 358 |
-
|
|
|
|
| 359 |
Relative likelihood for mutation to delete a node.
|
| 360 |
-
|
|
|
|
| 361 |
Relative likelihood for mutation to leave the individual.
|
| 362 |
-
|
|
|
|
| 363 |
Relative likelihood for mutation to change the constant slightly
|
| 364 |
in a random direction.
|
| 365 |
-
|
|
|
|
| 366 |
Relative likelihood for mutation to swap an operator.
|
| 367 |
-
|
|
|
|
| 368 |
Relative likelihood for mutation to completely delete and then
|
| 369 |
randomly generate the equation
|
| 370 |
-
|
|
|
|
| 371 |
Relative likelihood for mutation to simplify constant parts by evaluation
|
| 372 |
-
|
|
|
|
| 373 |
Absolute probability of crossover-type genetic operation, instead of a mutation.
|
| 374 |
-
|
|
|
|
| 375 |
Whether to skip mutation and crossover failures, rather than
|
| 376 |
simply re-sampling the current member.
|
| 377 |
-
|
| 378 |
-
|
| 379 |
-
|
| 380 |
-
|
| 381 |
-
|
|
|
|
| 382 |
How many top individuals migrate from each population.
|
| 383 |
-
|
|
|
|
| 384 |
Whether to numerically optimize constants (Nelder-Mead/Newton)
|
| 385 |
-
at the end of each iteration.
|
| 386 |
-
optimizer_algorithm : str
|
| 387 |
Optimization scheme to use for optimizing constants. Can currently
|
| 388 |
be `NelderMead` or `BFGS`.
|
| 389 |
-
|
|
|
|
| 390 |
Number of time to restart the constants optimization process with
|
| 391 |
different initial conditions.
|
| 392 |
-
|
|
|
|
| 393 |
Probability of optimizing the constants during a single iteration of
|
| 394 |
the evolutionary algorithm.
|
| 395 |
-
|
|
|
|
| 396 |
Number of iterations that the constants optimizer can take.
|
| 397 |
-
|
|
|
|
| 398 |
Constants are perturbed by a max factor of
|
| 399 |
(perturbation_factor*T + 1). Either multiplied by this or
|
| 400 |
divided by this.
|
| 401 |
-
|
|
|
|
| 402 |
Number of expressions to consider in each tournament.
|
| 403 |
-
|
|
|
|
| 404 |
Probability of selecting the best expression in each
|
| 405 |
tournament. The probability will decay as p*(1-p)^n for other
|
| 406 |
expressions, sorted by loss.
|
| 407 |
-
|
|
|
|
| 408 |
Number of processes (=number of populations running).
|
| 409 |
-
|
|
|
|
| 410 |
Use multithreading instead of distributed backend.
|
| 411 |
-
Using procs=0 will turn off both.
|
| 412 |
-
cluster_manager : str
|
| 413 |
For distributed computing, this sets the job queue system. Set
|
| 414 |
to one of "slurm", "pbs", "lsf", "sge", "qrsh", "scyld", or
|
| 415 |
"htc". If set to one of these, PySR will run in distributed
|
| 416 |
mode, and use `procs` to figure out how many processes to launch.
|
| 417 |
-
|
|
|
|
| 418 |
Whether to compare population members on small batches during
|
| 419 |
evolution. Still uses full dataset for comparing against hall
|
| 420 |
-
of fame.
|
| 421 |
-
batch_size : int
|
| 422 |
-
The amount of data to use if doing batching.
|
| 423 |
-
fast_cycle : bool
|
| 424 |
Batch over population subsamples. This is a slightly different
|
| 425 |
algorithm than regularized evolution, but does cycles 15%
|
| 426 |
faster. May be algorithmically less efficient.
|
| 427 |
-
|
| 428 |
-
|
| 429 |
-
|
| 430 |
-
|
|
|
|
|
|
|
|
|
|
| 431 |
Pass an int for reproducible results across multiple function calls.
|
| 432 |
See :term:`Glossary <random_state>`.
|
| 433 |
-
|
|
|
|
| 434 |
Make a PySR search give the same result every run.
|
| 435 |
To use this, you must turn off parallelism
|
| 436 |
(with `procs`=0, `multithreading`=False),
|
| 437 |
and set `random_state` to a fixed seed.
|
| 438 |
-
|
|
|
|
| 439 |
Tells fit to continue from where the last call to fit finished.
|
| 440 |
If false, each call to fit will be fresh, overwriting previous results.
|
| 441 |
-
|
|
|
|
| 442 |
What verbosity level to use. 0 means minimal print statements.
|
| 443 |
-
|
|
|
|
| 444 |
What verbosity level to use for package updates.
|
| 445 |
Will take value of `verbosity` if not given.
|
| 446 |
-
|
|
|
|
| 447 |
Whether to use a progress bar instead of printing to stdout.
|
| 448 |
-
|
|
|
|
| 449 |
Where to save the files (.csv extension).
|
| 450 |
-
|
|
|
|
| 451 |
Whether to put the hall of fame file in the temp directory.
|
| 452 |
Deletion is then controlled with the `delete_tempfiles`
|
| 453 |
parameter.
|
| 454 |
-
|
| 455 |
-
|
| 456 |
-
|
|
|
|
| 457 |
Whether to delete the temporary files after finishing.
|
| 458 |
-
|
|
|
|
| 459 |
A Julia environment location containing a Project.toml
|
| 460 |
(and potentially the source code for SymbolicRegression.jl).
|
| 461 |
Default gives the Python package directory, where a
|
| 462 |
Project.toml file should be present from the install.
|
| 463 |
-
update: bool
|
| 464 |
Whether to automatically update Julia packages.
|
| 465 |
-
|
|
|
|
| 466 |
Whether to create a 'jax_format' column in the output,
|
| 467 |
containing jax-callable functions and the default parameters in
|
| 468 |
a jax array.
|
| 469 |
-
|
|
|
|
| 470 |
Whether to create a 'torch_format' column in the output,
|
| 471 |
containing a torch module with trainable parameters.
|
| 472 |
-
|
|
|
|
| 473 |
Provides mappings between custom `binary_operators` or
|
| 474 |
`unary_operators` defined in julia strings, to those same
|
| 475 |
operators defined in sympy.
|
| 476 |
E.G if `unary_operators=["inv(x)=1/x"]`, then for the fitted
|
| 477 |
model to be export to sympy, `extra_sympy_mappings`
|
| 478 |
would be `{"inv": lambda x: 1/x}`.
|
| 479 |
-
|
|
|
|
| 480 |
Similar to `extra_sympy_mappings` but for model export
|
| 481 |
to jax. The dictionary maps sympy functions to jax functions.
|
| 482 |
For example: `extra_jax_mappings={sympy.sin: "jnp.sin"}` maps
|
| 483 |
the `sympy.sin` function to the equivalent jax expression `jnp.sin`.
|
| 484 |
-
|
|
|
|
| 485 |
The same as `extra_jax_mappings` but for model export
|
| 486 |
to pytorch. Note that the dictionary keys should be callable
|
| 487 |
pytorch expressions.
|
| 488 |
-
For example: `extra_torch_mappings={sympy.sin: torch.sin}
|
| 489 |
-
|
|
|
|
| 490 |
Whether to use a Gaussian Process to denoise the data before
|
| 491 |
inputting to PySR. Can help PySR fit noisy data.
|
| 492 |
-
|
|
|
|
| 493 |
whether to run feature selection in Python using random forests,
|
| 494 |
before passing to the symbolic regression code. None means no
|
| 495 |
feature selection; an int means select that many features.
|
| 496 |
-
|
|
|
|
| 497 |
Supports deprecated keyword arguments. Other arguments will
|
| 498 |
result in an error.
|
| 499 |
-
|
| 500 |
Attributes
|
| 501 |
----------
|
| 502 |
equations_ : pandas.DataFrame | list[pandas.DataFrame]
|
|
@@ -793,9 +850,10 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
|
|
| 793 |
selection_mask : list[bool]
|
| 794 |
If using select_k_features, you must pass `model.selection_mask_` here.
|
| 795 |
Not needed if loading from a pickle file.
|
| 796 |
-
nout : int
|
| 797 |
Number of outputs of the model.
|
| 798 |
Not needed if loading from a pickle file.
|
|
|
|
| 799 |
**pysr_kwargs : dict
|
| 800 |
Any other keyword arguments to initialize the PySRRegressor object.
|
| 801 |
These will overwrite those stored in the pickle file.
|
|
@@ -999,7 +1057,7 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
|
|
| 999 |
|
| 1000 |
Parameters
|
| 1001 |
----------
|
| 1002 |
-
index : int | list[int]
|
| 1003 |
If you wish to select a particular equation from `self.equations_`,
|
| 1004 |
give the row number here. This overrides the `model_selection`
|
| 1005 |
parameter. If there are multiple output features, then pass
|
|
@@ -1171,9 +1229,9 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
|
|
| 1171 |
y : ndarray | pandas.DataFrame}
|
| 1172 |
Target values of shape `(n_samples,)` or `(n_samples, n_targets)`.
|
| 1173 |
Will be cast to `X`'s dtype if necessary.
|
| 1174 |
-
Xresampled : ndarray | pandas.DataFrame
|
| 1175 |
-
|
| 1176 |
-
|
| 1177 |
weights : ndarray | pandas.DataFrame
|
| 1178 |
Weight array of the same shape as `y`.
|
| 1179 |
Each element is how to weight the mean-square-error loss
|
|
@@ -1252,15 +1310,15 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
|
|
| 1252 |
y : ndarray | pandas.DataFrame
|
| 1253 |
Target values of shape (n_samples,) or (n_samples, n_targets).
|
| 1254 |
Will be cast to X's dtype if necessary.
|
| 1255 |
-
Xresampled : ndarray | pandas.DataFrame
|
| 1256 |
Resampled training data, of shape `(n_resampled, n_features)`,
|
| 1257 |
used for denoising.
|
| 1258 |
variable_names : list[str]
|
| 1259 |
Names of each variable in the training dataset, `X`.
|
| 1260 |
Of length `n_features`.
|
| 1261 |
-
random_state : int
|
| 1262 |
Pass an int for reproducible results across multiple function calls.
|
| 1263 |
-
See :term:`Glossary <random_state>`.
|
| 1264 |
|
| 1265 |
Returns
|
| 1266 |
-------
|
|
@@ -1578,17 +1636,17 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
|
|
| 1578 |
y : ndarray | pandas.DataFrame
|
| 1579 |
Target values of shape (n_samples,) or (n_samples, n_targets).
|
| 1580 |
Will be cast to X's dtype if necessary.
|
| 1581 |
-
Xresampled : ndarray | pandas.DataFrame
|
| 1582 |
Resampled training data, of shape (n_resampled, n_features),
|
| 1583 |
to generate a denoised data on. This
|
| 1584 |
will be used as the training data, rather than `X`.
|
| 1585 |
-
weights : ndarray | pandas.DataFrame
|
| 1586 |
Weight array of the same shape as `y`.
|
| 1587 |
Each element is how to weight the mean-square-error loss
|
| 1588 |
for that particular element of `y`. Alternatively,
|
| 1589 |
if a custom `loss` was set, it will can be used
|
| 1590 |
in arbitrary ways.
|
| 1591 |
-
variable_names : list[str]
|
| 1592 |
A list of names for the variables, rather than "x0", "x1", etc.
|
| 1593 |
If `X` is a pandas dataframe, the column names will be used
|
| 1594 |
instead of `variable_names`. Cannot contain spaces or special
|
|
@@ -1695,8 +1753,9 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
|
|
| 1695 |
|
| 1696 |
Parameters
|
| 1697 |
----------
|
| 1698 |
-
checkpoint_file : str
|
| 1699 |
Path to checkpoint hall of fame file to be loaded.
|
|
|
|
| 1700 |
"""
|
| 1701 |
if checkpoint_file:
|
| 1702 |
self.equation_file_ = checkpoint_file
|
|
@@ -1716,7 +1775,7 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
|
|
| 1716 |
X : ndarray | pandas.DataFrame
|
| 1717 |
Training data of shape `(n_samples, n_features)`.
|
| 1718 |
|
| 1719 |
-
index : int | list[int]
|
| 1720 |
If you want to compute the output of an expression using a
|
| 1721 |
particular row of `self.equations_`, you may specify the index here.
|
| 1722 |
For multiple output equations, you must pass a list of indices
|
|
@@ -1784,7 +1843,7 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
|
|
| 1784 |
|
| 1785 |
Parameters
|
| 1786 |
----------
|
| 1787 |
-
index : int | list[int]
|
| 1788 |
If you wish to select a particular equation from
|
| 1789 |
`self.equations_`, give the index number here. This overrides
|
| 1790 |
the `model_selection` parameter. If there are multiple output
|
|
@@ -1808,15 +1867,16 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
|
|
| 1808 |
|
| 1809 |
Parameters
|
| 1810 |
----------
|
| 1811 |
-
index : int | list[int]
|
| 1812 |
If you wish to select a particular equation from
|
| 1813 |
`self.equations_`, give the index number here. This overrides
|
| 1814 |
the `model_selection` parameter. If there are multiple output
|
| 1815 |
features, then pass a list of indices with the order the same
|
| 1816 |
as the output feature.
|
| 1817 |
-
precision : int
|
| 1818 |
The number of significant figures shown in the LaTeX
|
| 1819 |
representation.
|
|
|
|
| 1820 |
|
| 1821 |
Returns
|
| 1822 |
-------
|
|
@@ -1843,7 +1903,7 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
|
|
| 1843 |
|
| 1844 |
Parameters
|
| 1845 |
----------
|
| 1846 |
-
index : int | list[int]
|
| 1847 |
If you wish to select a particular equation from
|
| 1848 |
`self.equations_`, give the index number here. This overrides
|
| 1849 |
the `model_selection` parameter. If there are multiple output
|
|
@@ -1874,7 +1934,7 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
|
|
| 1874 |
|
| 1875 |
Parameters
|
| 1876 |
----------
|
| 1877 |
-
index : int | list[int]
|
| 1878 |
If you wish to select a particular equation from
|
| 1879 |
`self.equations_`, give the index number here. This overrides
|
| 1880 |
the `model_selection` parameter. If there are multiple output
|
|
@@ -2094,16 +2154,18 @@ class PySRRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
|
|
| 2094 |
|
| 2095 |
Parameters
|
| 2096 |
----------
|
| 2097 |
-
indices : list[int] | list[list[int]]
|
| 2098 |
If you wish to select a particular subset of equations from
|
| 2099 |
`self.equations_`, give the row numbers here. By default,
|
| 2100 |
all equations will be used. If there are multiple output
|
| 2101 |
features, then pass a list of lists.
|
| 2102 |
-
precision : int
|
| 2103 |
The number of significant figures shown in the LaTeX
|
| 2104 |
representations.
|
| 2105 |
-
|
|
|
|
| 2106 |
Which columns to include in the table.
|
|
|
|
| 2107 |
|
| 2108 |
Returns
|
| 2109 |
-------
|
|
|
|
| 230 |
|
| 231 |
Parameters
|
| 232 |
----------
|
| 233 |
+
model_selection : str
|
| 234 |
Model selection criterion when selecting a final expression from
|
| 235 |
the list of best expression at each complexity.
|
| 236 |
+
Can be `'accuracy'`, `'best'`, or `'score'`. Default is `'best'`.
|
| 237 |
+
`'accuracy'` selects the candidate model with the lowest loss
|
| 238 |
+
(highest accuracy).
|
| 239 |
+
`'score'` selects the candidate model with the highest score.
|
| 240 |
+
Score is defined as the negated derivative of the log-loss with
|
| 241 |
+
respect to complexity - if an expression has a much better
|
| 242 |
+
loss at a slightly higher complexity, it is preferred.
|
| 243 |
+
`'best'` selects the candidate model with the highest score
|
| 244 |
+
among expressions with a loss better than at least 1.5x the
|
| 245 |
+
most accurate model.
|
| 246 |
+
binary_operators : list[str]
|
|
|
|
| 247 |
List of strings for binary operators used in the search.
|
| 248 |
See the [operators page](https://astroautomata.com/PySR/operators/)
|
| 249 |
for more details.
|
| 250 |
+
Default is `["+", "-", "*", "/"]`.
|
| 251 |
+
unary_operators : list[str]
|
| 252 |
Operators which only take a single scalar as input.
|
| 253 |
For example, `"cos"` or `"exp"`.
|
| 254 |
+
Default is `None`.
|
| 255 |
+
niterations : int
|
| 256 |
Number of iterations of the algorithm to run. The best
|
| 257 |
equations are printed and migrate between populations at the
|
| 258 |
end of each iteration.
|
| 259 |
+
Default is `40`.
|
| 260 |
+
populations : int
|
| 261 |
Number of populations running.
|
| 262 |
+
Default is `15`.
|
| 263 |
+
population_size : int
|
| 264 |
Number of individuals in each population.
|
| 265 |
+
Default is `33`.
|
| 266 |
+
max_evals : int
|
| 267 |
Limits the total number of evaluations of expressions to
|
| 268 |
+
this number. Default is `None`.
|
| 269 |
+
maxsize : int
|
| 270 |
+
Max complexity of an equation. Default is `20`.
|
| 271 |
+
maxdepth : int
|
| 272 |
Max depth of an equation. You can use both `maxsize` and
|
| 273 |
`maxdepth`. `maxdepth` is by default not used.
|
| 274 |
+
Default is `None`.
|
| 275 |
+
warmup_maxsize_by : float
|
| 276 |
Whether to slowly increase max size from a small number up to
|
| 277 |
the maxsize (if greater than 0). If greater than 0, says the
|
| 278 |
fraction of training time at which the current maxsize will
|
| 279 |
reach the user-passed maxsize.
|
| 280 |
+
Default is `0.0`.
|
| 281 |
+
timeout_in_seconds : float
|
| 282 |
Make the search return early once this many seconds have passed.
|
| 283 |
+
Default is `None`.
|
| 284 |
+
constraints : dict[str, int | tuple[int,int]]
|
| 285 |
Dictionary of int (unary) or 2-tuples (binary), this enforces
|
| 286 |
maxsize constraints on the individual arguments of operators.
|
| 287 |
E.g., `'pow': (-1, 1)` says that power laws can have any
|
| 288 |
complexity left argument, but only 1 complexity in the right
|
| 289 |
argument. Use this to force more interpretable solutions.
|
| 290 |
+
Default is `None`.
|
| 291 |
+
nested_constraints : dict[str, dict]
|
| 292 |
Specifies how many times a combination of operators can be
|
| 293 |
nested. For example, `{"sin": {"cos": 0}}, "cos": {"cos": 2}}`
|
| 294 |
specifies that `cos` may never appear within a `sin`, but `sin`
|
|
|
|
| 304 |
operators, you only need to provide a single number: both
|
| 305 |
arguments are treated the same way, and the max of each
|
| 306 |
argument is constrained.
|
| 307 |
+
Default is `None`.
|
| 308 |
+
loss : str
|
| 309 |
String of Julia code specifying the loss function. Can either
|
| 310 |
be a loss from LossFunctions.jl, or your own loss written as a
|
| 311 |
function. Examples of custom written losses include:
|
|
|
|
| 320 |
`L1HingeLoss()`, `SmoothedL1HingeLoss(γ)`,
|
| 321 |
`ModifiedHuberLoss()`, `L2MarginLoss()`, `ExpLoss()`,
|
| 322 |
`SigmoidLoss()`, `DWDMarginLoss(q)`.
|
| 323 |
+
Default is `"L2DistLoss()"`.
|
| 324 |
+
complexity_of_operators : dict[str, float]
|
| 325 |
If you would like to use a complexity other than 1 for an
|
| 326 |
operator, specify the complexity here. For example,
|
| 327 |
`{"sin": 2, "+": 1}` would give a complexity of 2 for each use
|
|
|
|
| 329 |
the `+` operator (which is the default). You may specify real
|
| 330 |
numbers for a complexity, and the total complexity of a tree
|
| 331 |
will be rounded to the nearest integer after computing.
|
| 332 |
+
Default is `None`.
|
| 333 |
+
complexity_of_constants : float
|
| 334 |
+
Complexity of constants. Default is `1`.
|
| 335 |
+
complexity_of_variables : float
|
| 336 |
+
Complexity of variables. Default is `1`.
|
| 337 |
+
parsimony : float
|
| 338 |
Multiplicative factor for how much to punish complexity.
|
| 339 |
+
Default is `0.0032`.
|
| 340 |
+
use_frequency : bool
|
| 341 |
Whether to measure the frequency of complexities, and use that
|
| 342 |
instead of parsimony to explore equation space. Will naturally
|
| 343 |
find equations of all complexities.
|
| 344 |
+
Default is `True`.
|
| 345 |
+
use_frequency_in_tournament : bool
|
| 346 |
Whether to use the frequency mentioned above in the tournament,
|
| 347 |
rather than just the simulated annealing.
|
| 348 |
+
Default is `True`.
|
| 349 |
+
alpha : float
|
| 350 |
Initial temperature for simulated annealing
|
| 351 |
(requires `annealing` to be `True`).
|
| 352 |
+
Default is `0.1`.
|
| 353 |
+
annealing : bool
|
| 354 |
+
Whether to use annealing. Default is `False`.
|
| 355 |
+
early_stop_condition : float | str
|
| 356 |
Stop the search early if this loss is reached. You may also
|
| 357 |
pass a string containing a Julia function which
|
| 358 |
takes a loss and complexity as input, for example:
|
| 359 |
`"f(loss, complexity) = (loss < 0.1) && (complexity < 10)"`.
|
| 360 |
+
Default is `None`.
|
| 361 |
+
ncyclesperiteration : int
|
| 362 |
Number of total mutations to run, per 10 samples of the
|
| 363 |
population, per iteration.
|
| 364 |
+
Default is `550`.
|
| 365 |
+
fraction_replaced : float
|
| 366 |
How much of population to replace with migrating equations from
|
| 367 |
other populations.
|
| 368 |
+
Default is `0.000364`.
|
| 369 |
+
fraction_replaced_hof : float
|
| 370 |
How much of population to replace with migrating equations from
|
| 371 |
+
hall of fame. Default is `0.035`.
|
| 372 |
+
weight_add_node : float
|
| 373 |
Relative likelihood for mutation to add a node.
|
| 374 |
+
Default is `0.79`.
|
| 375 |
+
weight_insert_node : float
|
| 376 |
Relative likelihood for mutation to insert a node.
|
| 377 |
+
Default is `5.1`.
|
| 378 |
+
weight_delete_node : float
|
| 379 |
Relative likelihood for mutation to delete a node.
|
| 380 |
+
Default is `1.7`.
|
| 381 |
+
weight_do_nothing : float
|
| 382 |
Relative likelihood for mutation to leave the individual.
|
| 383 |
+
Default is `0.21`.
|
| 384 |
+
weight_mutate_constant : float
|
| 385 |
Relative likelihood for mutation to change the constant slightly
|
| 386 |
in a random direction.
|
| 387 |
+
Default is `0.048`.
|
| 388 |
+
weight_mutate_operator : float
|
| 389 |
Relative likelihood for mutation to swap an operator.
|
| 390 |
+
Default is `0.47`.
|
| 391 |
+
weight_randomize : float
|
| 392 |
Relative likelihood for mutation to completely delete and then
|
| 393 |
randomly generate the equation
|
| 394 |
+
Default is `0.00023`.
|
| 395 |
+
weight_simplify : float
|
| 396 |
Relative likelihood for mutation to simplify constant parts by evaluation
|
| 397 |
+
Default is `0.0020`.
|
| 398 |
+
crossover_probability : float
|
| 399 |
Absolute probability of crossover-type genetic operation, instead of a mutation.
|
| 400 |
+
Default is `0.066`.
|
| 401 |
+
skip_mutation_failures : bool
|
| 402 |
Whether to skip mutation and crossover failures, rather than
|
| 403 |
simply re-sampling the current member.
|
| 404 |
+
Default is `True`.
|
| 405 |
+
migration : bool
|
| 406 |
+
Whether to migrate. Default is `True`.
|
| 407 |
+
hof_migration : bool
|
| 408 |
+
Whether to have the hall of fame migrate. Default is `True`.
|
| 409 |
+
topn : int
|
| 410 |
How many top individuals migrate from each population.
|
| 411 |
+
Default is `12`.
|
| 412 |
+
should_optimize_constants : bool
|
| 413 |
Whether to numerically optimize constants (Nelder-Mead/Newton)
|
| 414 |
+
at the end of each iteration. Default is `True`.
|
| 415 |
+
optimizer_algorithm : str
|
| 416 |
Optimization scheme to use for optimizing constants. Can currently
|
| 417 |
be `NelderMead` or `BFGS`.
|
| 418 |
+
Default is `"BFGS"`.
|
| 419 |
+
optimizer_nrestarts : int
|
| 420 |
Number of time to restart the constants optimization process with
|
| 421 |
different initial conditions.
|
| 422 |
+
Default is `2`.
|
| 423 |
+
optimize_probability : float
|
| 424 |
Probability of optimizing the constants during a single iteration of
|
| 425 |
the evolutionary algorithm.
|
| 426 |
+
Default is `0.14`.
|
| 427 |
+
optimizer_iterations : int
|
| 428 |
Number of iterations that the constants optimizer can take.
|
| 429 |
+
Default is `8`.
|
| 430 |
+
perturbation_factor : float
|
| 431 |
Constants are perturbed by a max factor of
|
| 432 |
(perturbation_factor*T + 1). Either multiplied by this or
|
| 433 |
divided by this.
|
| 434 |
+
Default is `0.076`.
|
| 435 |
+
tournament_selection_n : int
|
| 436 |
Number of expressions to consider in each tournament.
|
| 437 |
+
Default is `10`.
|
| 438 |
+
tournament_selection_p : float
|
| 439 |
Probability of selecting the best expression in each
|
| 440 |
tournament. The probability will decay as p*(1-p)^n for other
|
| 441 |
expressions, sorted by loss.
|
| 442 |
+
Default is `0.86`.
|
| 443 |
+
procs : int
|
| 444 |
Number of processes (=number of populations running).
|
| 445 |
+
Default is `cpu_count()`.
|
| 446 |
+
multithreading : bool
|
| 447 |
Use multithreading instead of distributed backend.
|
| 448 |
+
Using procs=0 will turn off both. Default is `True`.
|
| 449 |
+
cluster_manager : str
|
| 450 |
For distributed computing, this sets the job queue system. Set
|
| 451 |
to one of "slurm", "pbs", "lsf", "sge", "qrsh", "scyld", or
|
| 452 |
"htc". If set to one of these, PySR will run in distributed
|
| 453 |
mode, and use `procs` to figure out how many processes to launch.
|
| 454 |
+
Default is `None`.
|
| 455 |
+
batching : bool
|
| 456 |
Whether to compare population members on small batches during
|
| 457 |
evolution. Still uses full dataset for comparing against hall
|
| 458 |
+
of fame. Default is `False`.
|
| 459 |
+
batch_size : int
|
| 460 |
+
The amount of data to use if doing batching. Default is `50`.
|
| 461 |
+
fast_cycle : bool
|
| 462 |
Batch over population subsamples. This is a slightly different
|
| 463 |
algorithm than regularized evolution, but does cycles 15%
|
| 464 |
faster. May be algorithmically less efficient.
|
| 465 |
+
Default is `False`.
|
| 466 |
+
precision : int
|
| 467 |
+
What precision to use for the data. By default this is `32`
|
| 468 |
+
(float32), but you can select `64` or `16` as well, giving
|
| 469 |
+
you 64 or 16 bits of floating point precision, respectively.
|
| 470 |
+
Default is `32`.
|
| 471 |
+
random_state : int, Numpy RandomState instance or None
|
| 472 |
Pass an int for reproducible results across multiple function calls.
|
| 473 |
See :term:`Glossary <random_state>`.
|
| 474 |
+
Default is `None`.
|
| 475 |
+
deterministic : bool
|
| 476 |
Make a PySR search give the same result every run.
|
| 477 |
To use this, you must turn off parallelism
|
| 478 |
(with `procs`=0, `multithreading`=False),
|
| 479 |
and set `random_state` to a fixed seed.
|
| 480 |
+
Default is `False`.
|
| 481 |
+
warm_start : bool
|
| 482 |
Tells fit to continue from where the last call to fit finished.
|
| 483 |
If false, each call to fit will be fresh, overwriting previous results.
|
| 484 |
+
Default is `False`.
|
| 485 |
+
verbosity : int
|
| 486 |
What verbosity level to use. 0 means minimal print statements.
|
| 487 |
+
Default is `1e9`.
|
| 488 |
+
update_verbosity : int
|
| 489 |
What verbosity level to use for package updates.
|
| 490 |
Will take value of `verbosity` if not given.
|
| 491 |
+
Default is `None`.
|
| 492 |
+
progress : bool
|
| 493 |
Whether to use a progress bar instead of printing to stdout.
|
| 494 |
+
Default is `True`.
|
| 495 |
+
equation_file : str
|
| 496 |
Where to save the files (.csv extension).
|
| 497 |
+
Default is `None`.
|
| 498 |
+
temp_equation_file : bool
|
| 499 |
Whether to put the hall of fame file in the temp directory.
|
| 500 |
Deletion is then controlled with the `delete_tempfiles`
|
| 501 |
parameter.
|
| 502 |
+
Default is `False`.
|
| 503 |
+
tempdir : str
|
| 504 |
+
directory for the temporary files. Default is `None`.
|
| 505 |
+
delete_tempfiles : bool
|
| 506 |
Whether to delete the temporary files after finishing.
|
| 507 |
+
Default is `True`.
|
| 508 |
+
julia_project : str
|
| 509 |
A Julia environment location containing a Project.toml
|
| 510 |
(and potentially the source code for SymbolicRegression.jl).
|
| 511 |
Default gives the Python package directory, where a
|
| 512 |
Project.toml file should be present from the install.
|
| 513 |
+
update: bool
|
| 514 |
Whether to automatically update Julia packages.
|
| 515 |
+
Default is `True`.
|
| 516 |
+
output_jax_format : bool
|
| 517 |
Whether to create a 'jax_format' column in the output,
|
| 518 |
containing jax-callable functions and the default parameters in
|
| 519 |
a jax array.
|
| 520 |
+
Default is `False`.
|
| 521 |
+
output_torch_format : bool
|
| 522 |
Whether to create a 'torch_format' column in the output,
|
| 523 |
containing a torch module with trainable parameters.
|
| 524 |
+
Default is `False`.
|
| 525 |
+
extra_sympy_mappings : dict[str, Callable]
|
| 526 |
Provides mappings between custom `binary_operators` or
|
| 527 |
`unary_operators` defined in julia strings, to those same
|
| 528 |
operators defined in sympy.
|
| 529 |
E.G if `unary_operators=["inv(x)=1/x"]`, then for the fitted
|
| 530 |
model to be export to sympy, `extra_sympy_mappings`
|
| 531 |
would be `{"inv": lambda x: 1/x}`.
|
| 532 |
+
Default is `None`.
|
| 533 |
+
extra_jax_mappings : dict[Callable, str]
|
| 534 |
Similar to `extra_sympy_mappings` but for model export
|
| 535 |
to jax. The dictionary maps sympy functions to jax functions.
|
| 536 |
For example: `extra_jax_mappings={sympy.sin: "jnp.sin"}` maps
|
| 537 |
the `sympy.sin` function to the equivalent jax expression `jnp.sin`.
|
| 538 |
+
Default is `None`.
|
| 539 |
+
extra_torch_mappings : dict[Callable, Callable]
|
| 540 |
The same as `extra_jax_mappings` but for model export
|
| 541 |
to pytorch. Note that the dictionary keys should be callable
|
| 542 |
pytorch expressions.
|
| 543 |
+
For example: `extra_torch_mappings={sympy.sin: torch.sin}`.
|
| 544 |
+
Default is `None`.
|
| 545 |
+
denoise : bool
|
| 546 |
Whether to use a Gaussian Process to denoise the data before
|
| 547 |
inputting to PySR. Can help PySR fit noisy data.
|
| 548 |
+
Default is `False`.
|
| 549 |
+
select_k_features : int
|
| 550 |
whether to run feature selection in Python using random forests,
|
| 551 |
before passing to the symbolic regression code. None means no
|
| 552 |
feature selection; an int means select that many features.
|
| 553 |
+
Default is `None`.
|
| 554 |
+
**kwargs : dict
|
| 555 |
Supports deprecated keyword arguments. Other arguments will
|
| 556 |
result in an error.
|
|
|
|
| 557 |
Attributes
|
| 558 |
----------
|
| 559 |
equations_ : pandas.DataFrame | list[pandas.DataFrame]
|
|
|
|
| 850 |
selection_mask : list[bool]
|
| 851 |
If using select_k_features, you must pass `model.selection_mask_` here.
|
| 852 |
Not needed if loading from a pickle file.
|
| 853 |
+
nout : int
|
| 854 |
Number of outputs of the model.
|
| 855 |
Not needed if loading from a pickle file.
|
| 856 |
+
Default is `1`.
|
| 857 |
**pysr_kwargs : dict
|
| 858 |
Any other keyword arguments to initialize the PySRRegressor object.
|
| 859 |
These will overwrite those stored in the pickle file.
|
|
|
|
| 1057 |
|
| 1058 |
Parameters
|
| 1059 |
----------
|
| 1060 |
+
index : int | list[int]
|
| 1061 |
If you wish to select a particular equation from `self.equations_`,
|
| 1062 |
give the row number here. This overrides the `model_selection`
|
| 1063 |
parameter. If there are multiple output features, then pass
|
|
|
|
| 1229 |
y : ndarray | pandas.DataFrame}
|
| 1230 |
Target values of shape `(n_samples,)` or `(n_samples, n_targets)`.
|
| 1231 |
Will be cast to `X`'s dtype if necessary.
|
| 1232 |
+
Xresampled : ndarray | pandas.DataFrame
|
| 1233 |
+
Resampled training data used for denoising,
|
| 1234 |
+
of shape `(n_resampled, n_features)`.
|
| 1235 |
weights : ndarray | pandas.DataFrame
|
| 1236 |
Weight array of the same shape as `y`.
|
| 1237 |
Each element is how to weight the mean-square-error loss
|
|
|
|
| 1310 |
y : ndarray | pandas.DataFrame
|
| 1311 |
Target values of shape (n_samples,) or (n_samples, n_targets).
|
| 1312 |
Will be cast to X's dtype if necessary.
|
| 1313 |
+
Xresampled : ndarray | pandas.DataFrame
|
| 1314 |
Resampled training data, of shape `(n_resampled, n_features)`,
|
| 1315 |
used for denoising.
|
| 1316 |
variable_names : list[str]
|
| 1317 |
Names of each variable in the training dataset, `X`.
|
| 1318 |
Of length `n_features`.
|
| 1319 |
+
random_state : int | np.RandomState
|
| 1320 |
Pass an int for reproducible results across multiple function calls.
|
| 1321 |
+
See :term:`Glossary <random_state>`. Default is `None`.
|
| 1322 |
|
| 1323 |
Returns
|
| 1324 |
-------
|
|
|
|
| 1636 |
y : ndarray | pandas.DataFrame
|
| 1637 |
Target values of shape (n_samples,) or (n_samples, n_targets).
|
| 1638 |
Will be cast to X's dtype if necessary.
|
| 1639 |
+
Xresampled : ndarray | pandas.DataFrame
|
| 1640 |
Resampled training data, of shape (n_resampled, n_features),
|
| 1641 |
to generate a denoised data on. This
|
| 1642 |
will be used as the training data, rather than `X`.
|
| 1643 |
+
weights : ndarray | pandas.DataFrame
|
| 1644 |
Weight array of the same shape as `y`.
|
| 1645 |
Each element is how to weight the mean-square-error loss
|
| 1646 |
for that particular element of `y`. Alternatively,
|
| 1647 |
if a custom `loss` was set, it will can be used
|
| 1648 |
in arbitrary ways.
|
| 1649 |
+
variable_names : list[str]
|
| 1650 |
A list of names for the variables, rather than "x0", "x1", etc.
|
| 1651 |
If `X` is a pandas dataframe, the column names will be used
|
| 1652 |
instead of `variable_names`. Cannot contain spaces or special
|
|
|
|
| 1753 |
|
| 1754 |
Parameters
|
| 1755 |
----------
|
| 1756 |
+
checkpoint_file : str
|
| 1757 |
Path to checkpoint hall of fame file to be loaded.
|
| 1758 |
+
The default will use the set `equation_file_`.
|
| 1759 |
"""
|
| 1760 |
if checkpoint_file:
|
| 1761 |
self.equation_file_ = checkpoint_file
|
|
|
|
| 1775 |
X : ndarray | pandas.DataFrame
|
| 1776 |
Training data of shape `(n_samples, n_features)`.
|
| 1777 |
|
| 1778 |
+
index : int | list[int]
|
| 1779 |
If you want to compute the output of an expression using a
|
| 1780 |
particular row of `self.equations_`, you may specify the index here.
|
| 1781 |
For multiple output equations, you must pass a list of indices
|
|
|
|
| 1843 |
|
| 1844 |
Parameters
|
| 1845 |
----------
|
| 1846 |
+
index : int | list[int]
|
| 1847 |
If you wish to select a particular equation from
|
| 1848 |
`self.equations_`, give the index number here. This overrides
|
| 1849 |
the `model_selection` parameter. If there are multiple output
|
|
|
|
| 1867 |
|
| 1868 |
Parameters
|
| 1869 |
----------
|
| 1870 |
+
index : int | list[int]
|
| 1871 |
If you wish to select a particular equation from
|
| 1872 |
`self.equations_`, give the index number here. This overrides
|
| 1873 |
the `model_selection` parameter. If there are multiple output
|
| 1874 |
features, then pass a list of indices with the order the same
|
| 1875 |
as the output feature.
|
| 1876 |
+
precision : int
|
| 1877 |
The number of significant figures shown in the LaTeX
|
| 1878 |
representation.
|
| 1879 |
+
Default is `3`.
|
| 1880 |
|
| 1881 |
Returns
|
| 1882 |
-------
|
|
|
|
| 1903 |
|
| 1904 |
Parameters
|
| 1905 |
----------
|
| 1906 |
+
index : int | list[int]
|
| 1907 |
If you wish to select a particular equation from
|
| 1908 |
`self.equations_`, give the index number here. This overrides
|
| 1909 |
the `model_selection` parameter. If there are multiple output
|
|
|
|
| 1934 |
|
| 1935 |
Parameters
|
| 1936 |
----------
|
| 1937 |
+
index : int | list[int]
|
| 1938 |
If you wish to select a particular equation from
|
| 1939 |
`self.equations_`, give the index number here. This overrides
|
| 1940 |
the `model_selection` parameter. If there are multiple output
|
|
|
|
| 2154 |
|
| 2155 |
Parameters
|
| 2156 |
----------
|
| 2157 |
+
indices : list[int] | list[list[int]]
|
| 2158 |
If you wish to select a particular subset of equations from
|
| 2159 |
`self.equations_`, give the row numbers here. By default,
|
| 2160 |
all equations will be used. If there are multiple output
|
| 2161 |
features, then pass a list of lists.
|
| 2162 |
+
precision : int
|
| 2163 |
The number of significant figures shown in the LaTeX
|
| 2164 |
representations.
|
| 2165 |
+
Default is `3`.
|
| 2166 |
+
columns : list[str]
|
| 2167 |
Which columns to include in the table.
|
| 2168 |
+
Default is `["equation", "complexity", "loss", "score"]`.
|
| 2169 |
|
| 2170 |
Returns
|
| 2171 |
-------
|