Spaces:

MilesCranmer
/

PySR

Sleeping

App Files Files Community

MilesCranmer commited on Jun 1, 2021

Commit

5db0d89

1 Parent(s): 8b49600

Overhaul docstrings

Browse files

Files changed (1) hide show

pysr/sr.py +106 -119

pysr/sr.py CHANGED Viewed

@@ -130,125 +130,112 @@ def pysr(X, y, weights=None,
     equations, but you should adjust `niterations`,
     `binary_operators`, `unary_operators` to your requirements.
-    # Arguments
-    X (np.ndarray/pandas.DataFrame): 2D array. Rows are examples, \
-        columns are features. If pandas DataFrame, the columns are used \
-        for variable names (so make sure they don't contain spaces).
-    y (np.ndarray): 1D array (rows are examples) or 2D array (rows \
-        are examples, columns are outputs). Putting in a 2D array will \
-        trigger a search for equations for each feature of y.
-    weights (np.ndarray): same shape as y. Each element is how to \
-        weight the mean-square-error loss for that particular element \
-        of y.
-    binary_operators (list): List of strings giving the binary operators \
-        in Julia's Base. Default is ["+", "-", "*", "/",].
-    unary_operators (list): Same but for operators taking a single scalar. \
-        Default is [].
-    procs (int): Number of processes (=number of populations running).
-    loss (str): String of Julia code specifying the loss function. \
-        Can either be a loss from LossFunctions.jl, or your own \
-        loss written as a function. Examples of custom written losses \
-        include: `myloss(x, y) = abs(x-y)` for non-weighted, or  \
-        `myloss(x, y, w) = w*abs(x-y)` for weighted. \
-        Among the included losses, these are as follows. Regression: \
-        `LPDistLoss{P}()`, `L1DistLoss()`, `L2DistLoss()` (mean square), \
-        `LogitDistLoss()`, `HuberLoss(d)`, `L1EpsilonInsLoss(ϵ)`, \
-        `L2EpsilonInsLoss(ϵ)`, `PeriodicLoss(c)`, `QuantileLoss(τ)`. \
-        Classification: `ZeroOneLoss()`, `PerceptronLoss()`, `L1HingeLoss()`, \
-        `SmoothedL1HingeLoss(γ)`, `ModifiedHuberLoss()`, `L2MarginLoss()`, \
-        `ExpLoss()`, `SigmoidLoss()`, `DWDMarginLoss(q)`.
-    populations (int): Number of populations running.
-    niterations (int): Number of iterations of the algorithm to run. The best \
-        equations are printed, and migrate between populations, at the \
-        end of each.
-    ncyclesperiteration (int): Number of total mutations to run, per 10 \
-        samples of the population, per iteration.
-    alpha (float): Initial temperature.
-    annealing (bool): Whether to use annealing. You should (and it is default).
-    fractionReplaced (float): How much of population to replace with migrating \
-        equations from other populations.
-    fractionReplacedHof (float): How much of population to replace with migrating \
-        equations from hall of fame.
-    npop (int): Number of individuals in each population
-    parsimony (float): Multiplicative factor for how much to punish complexity.
-    migration (bool): Whether to migrate.
-    hofMigration (bool): Whether to have the hall of fame migrate.
-    shouldOptimizeConstants (bool): Whether to numerically optimize \
-        constants (Nelder-Mead/Newton) at the end of each iteration.
-    topn (int): How many top individuals migrate from each population.
-    perturbationFactor (float): Constants are perturbed by a max \
-        factor of (perturbationFactor*T + 1). Either multiplied by this \
-        or divided by this.
-    weightAddNode (float): Relative likelihood for mutation to add a node
-    weightInsertNode (float): Relative likelihood for mutation to insert a node
-    weightDeleteNode (float): Relative likelihood for mutation to delete a node
-    weightDoNothing (float): Relative likelihood for mutation to leave the individual
-    weightMutateConstant (float): Relative likelihood for mutation to change \
-        the constant slightly in a random direction.
-    weightMutateOperator (float): Relative likelihood for mutation to swap \
-        an operator.
-    weightRandomize (float): Relative likelihood for mutation to completely \
-        delete and then randomly generate the equation
-    weightSimplify (float): Relative likelihood for mutation to simplify \
-        constant parts by evaluation
-    timeout (float): Time in seconds to timeout search
-    equation_file (str): Where to save the files (.csv separated by |)
-    verbosity (int): What verbosity level to use. 0 means minimal print statements.
-    progress (bool): Whether to use a progress bar instead of printing to stdout.
-    maxsize (int): Max size of an equation.
-    maxdepth (int): Max depth of an equation. You can use both maxsize and maxdepth. \
-        maxdepth is by default set to = maxsize, which means that it is redundant.
-    fast_cycle (bool): (experimental) - batch over population subsamples. This \
-        is a slightly different algorithm than regularized evolution, but does cycles \
-        15% faster. May be algorithmically less efficient.
-    variable_names (list): a list of names for the variables, other \
-        than "x0", "x1", etc.
-    batching (bool): whether to compare population members on small batches \
-        during evolution. Still uses full dataset for comparing against \
-        hall of fame.
-    batchSize (int): the amount of data to use if doing batching.
-    select_k_features (None/int), whether to run feature selection in \
-        Python using random forests, before passing to the symbolic regression \
-        code. None means no feature selection; an int means select that many \
-        features.
-    warmupMaxsizeBy (float): whether to slowly increase max size from \
-        a small number up to the maxsize (if greater than 0). \
-        If greater than 0, says the fraction of training time at which \
-        the current maxsize will reach the user-passed maxsize.
-    constraints (dict): Dictionary of `int` (unary operators) \
-        or tuples of two `int`s (binary), \
-        this enforces maxsize constraints on the individual \
-        arguments of operators. e.g., `'pow': (-1, 1)` \
-        says that power laws can have any complexity left argument, but only \
-        1 complexity exponent. Use this to force more interpretable solutions.
-    useFrequency (bool): whether to measure the frequency of complexities, \
-        and use that instead of parsimony to explore equation space. Will \
-        naturally find equations of all complexities.
-    julia_optimization (int): Optimization level (0, 1, 2, 3)
-    tempdir (str/None): directory for the temporary files
-    delete_tempfiles (bool): whether to delete the temporary files after finishing
-    julia_project (str/None): a Julia environment location containing \
-        a Project.toml (and potentially the source code for SymbolicRegression.jl). \
-        Default gives the Python package directory, where a Project.toml file \
-        should be present from the install.
-    user_input (bool): Whether to ask for user input or not for installing (to \
-        be used for automated scripts). Will choose to install when asked.
-    update (bool): Whether to automatically update Julia packages.
-    temp_equation_file (bool): Whether to put the hall of fame file in \
-        the temp directory. Deletion is then controlled with the \
-        delete_tempfiles argument.
-    output_jax_format (bool): Whether to create a 'jax_format' column in the output, \
-        containing jax-callable functions and the default parameters in a jax array.
-    output_torch_format (bool): Whether to create a 'torch_format' column in the output, \
-        containing a torch module with trainable parameters.
-    # Returns
-    equations (pd.DataFrame/list): Results dataframe, \
-        giving complexity, MSE, and equations (as strings), as well as functional \
-        forms. If list, each element corresponds to a dataframe of equations \
-        for each output.
     """
     if binary_operators is None:
         binary_operators = '+ * - /'.split(' ')

     equations, but you should adjust `niterations`,
     `binary_operators`, `unary_operators` to your requirements.
+    :param X: 2D array. Rows are examples, columns are features. If pandas DataFrame, the columns are used for variable names (so make sure they don't contain spaces).
+    :type X: np.ndarray/pandas.DataFrame
+    :param y: 1D array (rows are examples) or 2D array (rows are examples, columns are outputs). Putting in a 2D array will trigger a search for equations for each feature of y.
+    :type y: np.ndarray
+    :param weights: same shape as y. Each element is how to weight the mean-square-error loss for that particular element of y.
+    :type weights: np.ndarray
+    :param binary_operators: List of strings giving the binary operators in Julia's Base. Default is ["+", "-", "*", "/",].
+    :type binary_operators: list
+    :param unary_operators: Same but for operators taking a single scalar. Default is [].
+    :type unary_operators: list
+    :param procs: Number of processes (=number of populations running).
+    :type procs: int
+    :param loss: String of Julia code specifying the loss function.  Can either be a loss from LossFunctions.jl, or your own loss written as a function. Examples of custom written losses include: `myloss(x, y) = abs(x-y)` for non-weighted, or `myloss(x, y, w) = w*abs(x-y)` for weighted.  Among the included losses, these are as follows. Regression: `LPDistLoss{P}()`, `L1DistLoss()`, `L2DistLoss()` (mean square), `LogitDistLoss()`, `HuberLoss(d)`, `L1EpsilonInsLoss(ϵ)`, `L2EpsilonInsLoss(ϵ)`, `PeriodicLoss(c)`, `QuantileLoss(τ)`.  Classification: `ZeroOneLoss()`, `PerceptronLoss()`, `L1HingeLoss()`, `SmoothedL1HingeLoss(γ)`, `ModifiedHuberLoss()`, `L2MarginLoss()`, `ExpLoss()`, `SigmoidLoss()`, `DWDMarginLoss(q)`.
+    :type loss: str
+    :param populations: Number of populations running.
+    :type populations: int
+    :param niterations: Number of iterations of the algorithm to run. The best equations are printed, and migrate between populations, at the end of each.
+    :type niterations: int
+    :param ncyclesperiteration: Number of total mutations to run, per 10 samples of the population, per iteration.
+    :type ncyclesperiteration: int
+    :param alpha: Initial temperature.
+    :type alpha: float
+    :param annealing: Whether to use annealing. You should (and it is default).
+    :type annealing: bool
+    :param fractionReplaced: How much of population to replace with migrating equations from other populations.
+    :type fractionReplaced: float
+    :param fractionReplacedHof: How much of population to replace with migrating equations from hall of fame.
+    :type fractionReplacedHof: float
+    :param npop: Number of individuals in each population
+    :type npop: int
+    :param parsimony: Multiplicative factor for how much to punish complexity.
+    :type parsimony: float
+    :param migration: Whether to migrate.
+    :type migration: bool
+    :param hofMigration: Whether to have the hall of fame migrate.
+    :type hofMigration: bool
+    :param shouldOptimizeConstants: Whether to numerically optimize constants (Nelder-Mead/Newton) at the end of each iteration.
+    :type shouldOptimizeConstants: bool
+    :param topn: How many top individuals migrate from each population.
+    :type topn: int
+    :param perturbationFactor: Constants are perturbed by a max factor of (perturbationFactor*T + 1). Either multiplied by this or divided by this.
+    :type perturbationFactor: float
+    :param weightAddNode: Relative likelihood for mutation to add a node
+    :type weightAddNode: float
+    :param weightInsertNode: Relative likelihood for mutation to insert a node
+    :type weightInsertNode: float
+    :param weightDeleteNode: Relative likelihood for mutation to delete a node
+    :type weightDeleteNode: float
+    :param weightDoNothing: Relative likelihood for mutation to leave the individual
+    :type weightDoNothing: float
+    :param weightMutateConstant: Relative likelihood for mutation to change the constant slightly in a random direction.
+    :type weightMutateConstant: float
+    :param weightMutateOperator: Relative likelihood for mutation to swap an operator.
+    :type weightMutateOperator: float
+    :param weightRandomize: Relative likelihood for mutation to completely delete and then randomly generate the equation
+    :type weightRandomize: float
+    :param weightSimplify: Relative likelihood for mutation to simplify constant parts by evaluation
+    :type weightSimplify: float
+    :param timeout: Time in seconds to timeout search
+    :type timeout: float
+    :param equation_file: Where to save the files (.csv separated by |)
+    :type equation_file: str
+    :param verbosity: What verbosity level to use. 0 means minimal print statements.
+    :type verbosity: int
+    :param progress: Whether to use a progress bar instead of printing to stdout.
+    :type progress: bool
+    :param maxsize: Max size of an equation.
+    :type maxsize: int
+    :param maxdepth: Max depth of an equation. You can use both maxsize and maxdepth.  maxdepth is by default set to = maxsize, which means that it is redundant.
+    :type maxdepth: int
+    :param fast_cycle: (experimental) - batch over population subsamples. This is a slightly different algorithm than regularized evolution, but does cycles 15% faster. May be algorithmically less efficient.
+    :type fast_cycle: bool
+    :param variable_names: a list of names for the variables, other than "x0", "x1", etc.
+    :type variable_names: list
+    :param batching: whether to compare population members on small batches during evolution. Still uses full dataset for comparing against hall of fame.
+    :type batching: bool
+    :param batchSize: the amount of data to use if doing batching.
+    :type batchSize: int
+    :param select_k_features: whether to run feature selection in Python using random forests, before passing to the symbolic regression code. None means no feature selection; an int means select that many features.
+    :type select_k_features: None/int
+    :param warmupMaxsizeBy: whether to slowly increase max size from a small number up to the maxsize (if greater than 0).  If greater than 0, says the fraction of training time at which the current maxsize will reach the user-passed maxsize.
+    :type warmupMaxsizeBy: float
+    :param constraints: dictionary of int (unary) or 2-tuples (binary), this enforces maxsize constraints on the individual arguments of operators. E.g., `'pow': (-1, 1)` says that power laws can have any complexity left argument, but only 1 complexity exponent. Use this to force more interpretable solutions.
+    :type constraints: dict
+    :param useFrequency: whether to measure the frequency of complexities, and use that instead of parsimony to explore equation space. Will naturally find equations of all complexities.
+    :type useFrequency: bool
+    :param julia_optimization: Optimization level (0, 1, 2, 3)
+    :type julia_optimization: int
+    :param tempdir: directory for the temporary files
+    :type tempdir: str/None
+    :param delete_tempfiles: whether to delete the temporary files after finishing
+    :type delete_tempfiles: bool
+    :param julia_project: a Julia environment location containing a Project.toml (and potentially the source code for SymbolicRegression.jl).  Default gives the Python package directory, where a Project.toml file should be present from the install.
+    :type julia_project: str/None
+    :param user_input: Whether to ask for user input or not for installing (to be used for automated scripts). Will choose to install when asked.
+    :type user_input: bool
+    :param update: Whether to automatically update Julia packages.
+    :type update: bool
+    :param temp_equation_file: Whether to put the hall of fame file in the temp directory. Deletion is then controlled with the delete_tempfiles argument.
+    :type temp_equation_file: bool
+    :param output_jax_format: Whether to create a 'jax_format' column in the output, containing jax-callable functions and the default parameters in a jax array.
+    :type output_jax_format: bool
+    :param output_torch_format: Whether to create a 'torch_format' column in the output, containing a torch module with trainable parameters.
+    :type output_torch_format: bool
+    :returns: Results dataframe, giving complexity, MSE, and equations (as strings), as well as functional forms. If list, each element corresponds to a dataframe of equations for each output.
+    :type: pd.DataFrame/list
     """
     if binary_operators is None:
         binary_operators = '+ * - /'.split(' ')