Spaces:
Sleeping
Sleeping
Commit
路
8b49600
1
Parent(s):
beecd14
Proper pydoc markdown format
Browse files- pysr/sr.py +61 -61
pysr/sr.py
CHANGED
|
@@ -132,122 +132,122 @@ def pysr(X, y, weights=None,
|
|
| 132 |
|
| 133 |
# Arguments
|
| 134 |
|
| 135 |
-
X (np.ndarray/pandas.DataFrame): 2D array. Rows are examples,
|
| 136 |
-
columns are features. If pandas DataFrame, the columns are used
|
| 137 |
for variable names (so make sure they don't contain spaces).
|
| 138 |
-
y (np.ndarray): 1D array (rows are examples) or 2D array (rows
|
| 139 |
-
are examples, columns are outputs). Putting in a 2D array will
|
| 140 |
trigger a search for equations for each feature of y.
|
| 141 |
-
weights (np.ndarray): same shape as y. Each element is how to
|
| 142 |
-
weight the mean-square-error loss for that particular element
|
| 143 |
of y.
|
| 144 |
-
binary_operators (list): List of strings giving the binary operators
|
| 145 |
in Julia's Base. Default is ["+", "-", "*", "/",].
|
| 146 |
-
unary_operators (list): Same but for operators taking a single scalar.
|
| 147 |
Default is [].
|
| 148 |
procs (int): Number of processes (=number of populations running).
|
| 149 |
-
loss (str): String of Julia code specifying the loss function.
|
| 150 |
-
Can either be a loss from LossFunctions.jl, or your own
|
| 151 |
-
loss written as a function. Examples of custom written losses
|
| 152 |
-
include: `myloss(x, y) = abs(x-y)` for non-weighted, or
|
| 153 |
-
`myloss(x, y, w) = w*abs(x-y)` for weighted.
|
| 154 |
-
Among the included losses, these are as follows. Regression:
|
| 155 |
-
`LPDistLoss{P}()`, `L1DistLoss()`, `L2DistLoss()` (mean square),
|
| 156 |
-
`LogitDistLoss()`, `HuberLoss(d)`, `L1EpsilonInsLoss(系)`,
|
| 157 |
-
`L2EpsilonInsLoss(系)`, `PeriodicLoss(c)`, `QuantileLoss(蟿)`.
|
| 158 |
-
Classification: `ZeroOneLoss()`, `PerceptronLoss()`, `L1HingeLoss()`,
|
| 159 |
-
`SmoothedL1HingeLoss(纬)`, `ModifiedHuberLoss()`, `L2MarginLoss()`,
|
| 160 |
`ExpLoss()`, `SigmoidLoss()`, `DWDMarginLoss(q)`.
|
| 161 |
populations (int): Number of populations running.
|
| 162 |
-
niterations (int): Number of iterations of the algorithm to run. The best
|
| 163 |
-
equations are printed, and migrate between populations, at the
|
| 164 |
end of each.
|
| 165 |
-
ncyclesperiteration (int): Number of total mutations to run, per 10
|
| 166 |
samples of the population, per iteration.
|
| 167 |
alpha (float): Initial temperature.
|
| 168 |
annealing (bool): Whether to use annealing. You should (and it is default).
|
| 169 |
-
fractionReplaced (float): How much of population to replace with migrating
|
| 170 |
equations from other populations.
|
| 171 |
-
fractionReplacedHof (float): How much of population to replace with migrating
|
| 172 |
equations from hall of fame.
|
| 173 |
npop (int): Number of individuals in each population
|
| 174 |
parsimony (float): Multiplicative factor for how much to punish complexity.
|
| 175 |
migration (bool): Whether to migrate.
|
| 176 |
hofMigration (bool): Whether to have the hall of fame migrate.
|
| 177 |
-
shouldOptimizeConstants (bool): Whether to numerically optimize
|
| 178 |
constants (Nelder-Mead/Newton) at the end of each iteration.
|
| 179 |
topn (int): How many top individuals migrate from each population.
|
| 180 |
-
perturbationFactor (float): Constants are perturbed by a max
|
| 181 |
-
factor of (perturbationFactor*T + 1). Either multiplied by this
|
| 182 |
or divided by this.
|
| 183 |
weightAddNode (float): Relative likelihood for mutation to add a node
|
| 184 |
weightInsertNode (float): Relative likelihood for mutation to insert a node
|
| 185 |
weightDeleteNode (float): Relative likelihood for mutation to delete a node
|
| 186 |
weightDoNothing (float): Relative likelihood for mutation to leave the individual
|
| 187 |
-
weightMutateConstant (float): Relative likelihood for mutation to change
|
| 188 |
the constant slightly in a random direction.
|
| 189 |
-
weightMutateOperator (float): Relative likelihood for mutation to swap
|
| 190 |
an operator.
|
| 191 |
-
weightRandomize (float): Relative likelihood for mutation to completely
|
| 192 |
delete and then randomly generate the equation
|
| 193 |
-
weightSimplify (float): Relative likelihood for mutation to simplify
|
| 194 |
constant parts by evaluation
|
| 195 |
timeout (float): Time in seconds to timeout search
|
| 196 |
equation_file (str): Where to save the files (.csv separated by |)
|
| 197 |
verbosity (int): What verbosity level to use. 0 means minimal print statements.
|
| 198 |
progress (bool): Whether to use a progress bar instead of printing to stdout.
|
| 199 |
maxsize (int): Max size of an equation.
|
| 200 |
-
maxdepth (int): Max depth of an equation. You can use both maxsize and maxdepth.
|
| 201 |
maxdepth is by default set to = maxsize, which means that it is redundant.
|
| 202 |
-
fast_cycle (bool): (experimental) - batch over population subsamples. This
|
| 203 |
-
is a slightly different algorithm than regularized evolution, but does cycles
|
| 204 |
15% faster. May be algorithmically less efficient.
|
| 205 |
-
variable_names (list): a list of names for the variables, other
|
| 206 |
than "x0", "x1", etc.
|
| 207 |
-
batching (bool): whether to compare population members on small batches
|
| 208 |
-
during evolution. Still uses full dataset for comparing against
|
| 209 |
hall of fame.
|
| 210 |
batchSize (int): the amount of data to use if doing batching.
|
| 211 |
-
select_k_features (None/int), whether to run feature selection in
|
| 212 |
-
Python using random forests, before passing to the symbolic regression
|
| 213 |
-
code. None means no feature selection; an int means select that many
|
| 214 |
features.
|
| 215 |
-
warmupMaxsizeBy (float): whether to slowly increase max size from
|
| 216 |
-
a small number up to the maxsize (if greater than 0).
|
| 217 |
-
If greater than 0, says the fraction of training time at which
|
| 218 |
the current maxsize will reach the user-passed maxsize.
|
| 219 |
-
constraints (dict): Dictionary of `int` (unary operators)
|
| 220 |
-
or tuples of two `int`s (binary),
|
| 221 |
-
this enforces maxsize constraints on the individual
|
| 222 |
-
arguments of operators. e.g., `'pow': (-1, 1)`
|
| 223 |
-
says that power laws can have any complexity left argument, but only
|
| 224 |
1 complexity exponent. Use this to force more interpretable solutions.
|
| 225 |
-
useFrequency (bool): whether to measure the frequency of complexities,
|
| 226 |
-
and use that instead of parsimony to explore equation space. Will
|
| 227 |
naturally find equations of all complexities.
|
| 228 |
julia_optimization (int): Optimization level (0, 1, 2, 3)
|
| 229 |
tempdir (str/None): directory for the temporary files
|
| 230 |
delete_tempfiles (bool): whether to delete the temporary files after finishing
|
| 231 |
-
julia_project (str/None): a Julia environment location containing
|
| 232 |
-
a Project.toml (and potentially the source code for SymbolicRegression.jl).
|
| 233 |
-
Default gives the Python package directory, where a Project.toml file
|
| 234 |
should be present from the install.
|
| 235 |
-
user_input (bool): Whether to ask for user input or not for installing (to
|
| 236 |
be used for automated scripts). Will choose to install when asked.
|
| 237 |
update (bool): Whether to automatically update Julia packages.
|
| 238 |
-
temp_equation_file (bool): Whether to put the hall of fame file in
|
| 239 |
-
the temp directory. Deletion is then controlled with the
|
| 240 |
delete_tempfiles argument.
|
| 241 |
-
output_jax_format (bool): Whether to create a 'jax_format' column in the output,
|
| 242 |
containing jax-callable functions and the default parameters in a jax array.
|
| 243 |
-
output_torch_format (bool): Whether to create a 'torch_format' column in the output,
|
| 244 |
containing a torch module with trainable parameters.
|
| 245 |
|
| 246 |
# Returns
|
| 247 |
|
| 248 |
-
equations (pd.DataFrame/list): Results dataframe,
|
| 249 |
-
giving complexity, MSE, and equations (as strings), as well as functional
|
| 250 |
-
forms. If list, each element corresponds to a dataframe of equations
|
| 251 |
for each output.
|
| 252 |
"""
|
| 253 |
if binary_operators is None:
|
|
|
|
| 132 |
|
| 133 |
# Arguments
|
| 134 |
|
| 135 |
+
X (np.ndarray/pandas.DataFrame): 2D array. Rows are examples, \
|
| 136 |
+
columns are features. If pandas DataFrame, the columns are used \
|
| 137 |
for variable names (so make sure they don't contain spaces).
|
| 138 |
+
y (np.ndarray): 1D array (rows are examples) or 2D array (rows \
|
| 139 |
+
are examples, columns are outputs). Putting in a 2D array will \
|
| 140 |
trigger a search for equations for each feature of y.
|
| 141 |
+
weights (np.ndarray): same shape as y. Each element is how to \
|
| 142 |
+
weight the mean-square-error loss for that particular element \
|
| 143 |
of y.
|
| 144 |
+
binary_operators (list): List of strings giving the binary operators \
|
| 145 |
in Julia's Base. Default is ["+", "-", "*", "/",].
|
| 146 |
+
unary_operators (list): Same but for operators taking a single scalar. \
|
| 147 |
Default is [].
|
| 148 |
procs (int): Number of processes (=number of populations running).
|
| 149 |
+
loss (str): String of Julia code specifying the loss function. \
|
| 150 |
+
Can either be a loss from LossFunctions.jl, or your own \
|
| 151 |
+
loss written as a function. Examples of custom written losses \
|
| 152 |
+
include: `myloss(x, y) = abs(x-y)` for non-weighted, or \
|
| 153 |
+
`myloss(x, y, w) = w*abs(x-y)` for weighted. \
|
| 154 |
+
Among the included losses, these are as follows. Regression: \
|
| 155 |
+
`LPDistLoss{P}()`, `L1DistLoss()`, `L2DistLoss()` (mean square), \
|
| 156 |
+
`LogitDistLoss()`, `HuberLoss(d)`, `L1EpsilonInsLoss(系)`, \
|
| 157 |
+
`L2EpsilonInsLoss(系)`, `PeriodicLoss(c)`, `QuantileLoss(蟿)`. \
|
| 158 |
+
Classification: `ZeroOneLoss()`, `PerceptronLoss()`, `L1HingeLoss()`, \
|
| 159 |
+
`SmoothedL1HingeLoss(纬)`, `ModifiedHuberLoss()`, `L2MarginLoss()`, \
|
| 160 |
`ExpLoss()`, `SigmoidLoss()`, `DWDMarginLoss(q)`.
|
| 161 |
populations (int): Number of populations running.
|
| 162 |
+
niterations (int): Number of iterations of the algorithm to run. The best \
|
| 163 |
+
equations are printed, and migrate between populations, at the \
|
| 164 |
end of each.
|
| 165 |
+
ncyclesperiteration (int): Number of total mutations to run, per 10 \
|
| 166 |
samples of the population, per iteration.
|
| 167 |
alpha (float): Initial temperature.
|
| 168 |
annealing (bool): Whether to use annealing. You should (and it is default).
|
| 169 |
+
fractionReplaced (float): How much of population to replace with migrating \
|
| 170 |
equations from other populations.
|
| 171 |
+
fractionReplacedHof (float): How much of population to replace with migrating \
|
| 172 |
equations from hall of fame.
|
| 173 |
npop (int): Number of individuals in each population
|
| 174 |
parsimony (float): Multiplicative factor for how much to punish complexity.
|
| 175 |
migration (bool): Whether to migrate.
|
| 176 |
hofMigration (bool): Whether to have the hall of fame migrate.
|
| 177 |
+
shouldOptimizeConstants (bool): Whether to numerically optimize \
|
| 178 |
constants (Nelder-Mead/Newton) at the end of each iteration.
|
| 179 |
topn (int): How many top individuals migrate from each population.
|
| 180 |
+
perturbationFactor (float): Constants are perturbed by a max \
|
| 181 |
+
factor of (perturbationFactor*T + 1). Either multiplied by this \
|
| 182 |
or divided by this.
|
| 183 |
weightAddNode (float): Relative likelihood for mutation to add a node
|
| 184 |
weightInsertNode (float): Relative likelihood for mutation to insert a node
|
| 185 |
weightDeleteNode (float): Relative likelihood for mutation to delete a node
|
| 186 |
weightDoNothing (float): Relative likelihood for mutation to leave the individual
|
| 187 |
+
weightMutateConstant (float): Relative likelihood for mutation to change \
|
| 188 |
the constant slightly in a random direction.
|
| 189 |
+
weightMutateOperator (float): Relative likelihood for mutation to swap \
|
| 190 |
an operator.
|
| 191 |
+
weightRandomize (float): Relative likelihood for mutation to completely \
|
| 192 |
delete and then randomly generate the equation
|
| 193 |
+
weightSimplify (float): Relative likelihood for mutation to simplify \
|
| 194 |
constant parts by evaluation
|
| 195 |
timeout (float): Time in seconds to timeout search
|
| 196 |
equation_file (str): Where to save the files (.csv separated by |)
|
| 197 |
verbosity (int): What verbosity level to use. 0 means minimal print statements.
|
| 198 |
progress (bool): Whether to use a progress bar instead of printing to stdout.
|
| 199 |
maxsize (int): Max size of an equation.
|
| 200 |
+
maxdepth (int): Max depth of an equation. You can use both maxsize and maxdepth. \
|
| 201 |
maxdepth is by default set to = maxsize, which means that it is redundant.
|
| 202 |
+
fast_cycle (bool): (experimental) - batch over population subsamples. This \
|
| 203 |
+
is a slightly different algorithm than regularized evolution, but does cycles \
|
| 204 |
15% faster. May be algorithmically less efficient.
|
| 205 |
+
variable_names (list): a list of names for the variables, other \
|
| 206 |
than "x0", "x1", etc.
|
| 207 |
+
batching (bool): whether to compare population members on small batches \
|
| 208 |
+
during evolution. Still uses full dataset for comparing against \
|
| 209 |
hall of fame.
|
| 210 |
batchSize (int): the amount of data to use if doing batching.
|
| 211 |
+
select_k_features (None/int), whether to run feature selection in \
|
| 212 |
+
Python using random forests, before passing to the symbolic regression \
|
| 213 |
+
code. None means no feature selection; an int means select that many \
|
| 214 |
features.
|
| 215 |
+
warmupMaxsizeBy (float): whether to slowly increase max size from \
|
| 216 |
+
a small number up to the maxsize (if greater than 0). \
|
| 217 |
+
If greater than 0, says the fraction of training time at which \
|
| 218 |
the current maxsize will reach the user-passed maxsize.
|
| 219 |
+
constraints (dict): Dictionary of `int` (unary operators) \
|
| 220 |
+
or tuples of two `int`s (binary), \
|
| 221 |
+
this enforces maxsize constraints on the individual \
|
| 222 |
+
arguments of operators. e.g., `'pow': (-1, 1)` \
|
| 223 |
+
says that power laws can have any complexity left argument, but only \
|
| 224 |
1 complexity exponent. Use this to force more interpretable solutions.
|
| 225 |
+
useFrequency (bool): whether to measure the frequency of complexities, \
|
| 226 |
+
and use that instead of parsimony to explore equation space. Will \
|
| 227 |
naturally find equations of all complexities.
|
| 228 |
julia_optimization (int): Optimization level (0, 1, 2, 3)
|
| 229 |
tempdir (str/None): directory for the temporary files
|
| 230 |
delete_tempfiles (bool): whether to delete the temporary files after finishing
|
| 231 |
+
julia_project (str/None): a Julia environment location containing \
|
| 232 |
+
a Project.toml (and potentially the source code for SymbolicRegression.jl). \
|
| 233 |
+
Default gives the Python package directory, where a Project.toml file \
|
| 234 |
should be present from the install.
|
| 235 |
+
user_input (bool): Whether to ask for user input or not for installing (to \
|
| 236 |
be used for automated scripts). Will choose to install when asked.
|
| 237 |
update (bool): Whether to automatically update Julia packages.
|
| 238 |
+
temp_equation_file (bool): Whether to put the hall of fame file in \
|
| 239 |
+
the temp directory. Deletion is then controlled with the \
|
| 240 |
delete_tempfiles argument.
|
| 241 |
+
output_jax_format (bool): Whether to create a 'jax_format' column in the output, \
|
| 242 |
containing jax-callable functions and the default parameters in a jax array.
|
| 243 |
+
output_torch_format (bool): Whether to create a 'torch_format' column in the output, \
|
| 244 |
containing a torch module with trainable parameters.
|
| 245 |
|
| 246 |
# Returns
|
| 247 |
|
| 248 |
+
equations (pd.DataFrame/list): Results dataframe, \
|
| 249 |
+
giving complexity, MSE, and equations (as strings), as well as functional \
|
| 250 |
+
forms. If list, each element corresponds to a dataframe of equations \
|
| 251 |
for each output.
|
| 252 |
"""
|
| 253 |
if binary_operators is None:
|