Performance issue debugging #518

eelregit · 2024-01-05T08:58:00Z

eelregit
Jan 5, 2024

Coming from MilesCranmer/SymbolicRegression.jl#270 (comment)

If the time complexity roughly scales as $\mathrm{maxsize}^2 \cdot \mathrm{datasize}$,
then the following code runs about 5x slower than expected.

It runs on 4 nodes each with 64 icelake cores for almost 2 days, and evaluates about 5e4 expr/sec.
max_size is 64 and batch_size is 1k. The whole dataset has 7k+ points.
Head worker occupation at first slowly increases to ~40%, then drops more slowly to below ~20% and is still decreasing.
CPU load by uptime on each node is something like load average: 66.29, 66.28, 66.20.

model = pysr.PySRRegressor(
    # Search Space & Complexity
    binary_operators=['+', '-', '*', '/', 'pow'],
    unary_operators=['exp', 'log', 'tanh_p1(x) = tanh(x)+1',
                     'atan_p1(x) = atan(x)+1', 'alg_sgmd(x) = x/sqrt(x^2+1)'],
    extra_sympy_mappings={
        'tanh_p1': lambda x: sympy.tanh(x) + 1,
        'atan_p1': lambda x: sympy.atan(x) + 1,
        'alg_sgmd': lambda x: x / sympy.sqrt(x**2 + 1)
    },
    maxsize=64,
    full_objective=objective,
    parsimony=1e-5,
    adaptive_parsimony_scaling=1e3,

    # Search Size
    niterations=1000000,
    populations=num_cores*4,
    ncyclesperiteration=10000,

    # Mutations
    weight_simplify=0.01,
    weight_optimize=0.01,

    # Performance and Parallelization
    procs=num_cores,
    cluster_manager='slurm',
    batching=True,
    batch_size=1000,
    turbo=True,

    # Monitoring
    verbosity=1,
    print_precision=2,
    progress=False,
)

Answered by MilesCranmer

Jan 17, 2024

Hey @eelregit,

Sorry, I forget if you already solved this. Is this issue still in place?

One quick tip is that the batch size is very large. I usually do batch size of 50 or even less. With 1000 points batch size and 7000 total, you might as well just run on the total dataset (because non-contiguous slicing can be expensive).

Also the weight_optimize=0.01 is a bit high. Generally constant optimization is the bottleneck, and you are doing it more frequently than normal (I usually do 0.001 or less, even for multi-node). Especially with a large maxsize and large batches, it perhaps is not too surprising that it is quite slow in the search.

Also what is the objective here you are using?

Cheers,
…

View full answer

MilesCranmer · 2024-01-17T23:50:18Z

MilesCranmer
Jan 17, 2024
Maintainer

Hey @eelregit,

Sorry, I forget if you already solved this. Is this issue still in place?

One quick tip is that the batch size is very large. I usually do batch size of 50 or even less. With 1000 points batch size and 7000 total, you might as well just run on the total dataset (because non-contiguous slicing can be expensive).

Also the weight_optimize=0.01 is a bit high. Generally constant optimization is the bottleneck, and you are doing it more frequently than normal (I usually do 0.001 or less, even for multi-node). Especially with a large maxsize and large batches, it perhaps is not too surprising that it is quite slow in the search.

Also what is the objective here you are using?

Cheers,
Miles

1 reply

eelregit Jan 19, 2024
Author

The objective is

function custom_obj(tree::Node{T}, dataset::Dataset{T,L}, options::Options, idx=nothing)::L where {T,L}
    tree.degree != 2 && return convert(L, 100)
    shape_tree = tree.l
    params_tree = tree.r
    params_tree.degree != 2 && return convert(L, 10)
    pivot_tree = params_tree.l
    tilt_tree = params_tree.r

    #HACK dummy feature, because the trees assume size(dataset.X, 1) features
    num_feat, batch_size = size(dataset.X)
    X = cat(zeros(Float32, 1, batch_size), dataset.X[2:end-1, :], zeros(Float32, 1, batch_size), dims=1)
    ln_pivot, flag = eval_tree_array(pivot_tree, X, options)
    !flag && return convert(L, Inf)
    tilt, flag = eval_tree_array(tilt_tree, X, options)
    !flag && return convert(L, Inf)

    # assuming universal shape as a function of the rescaled scale factor
    #HACK include both linear and log rescaled a
    a = dataset.X[1, :]
    lnar = (log.(a) - ln_pivot) .* tilt
    ar = exp.(lnar)
    lnar = cat(reshape(lnar, 1, :), zeros(Float32, num_feat - 2, batch_size), reshape(ar, 1, :), dims=1)

    pred, flag = eval_tree_array(shape_tree, lnar, options)
    !flag && return convert(L, Inf)

    pred = exp.(- exp.(pred))  # forcing Gompertz

    res = pred .- dataset.y
    w = min.(1e-2, dataset.y, (1 .- dataset.y))  # suppression near 0 & 1
    loss = sum(w .* res .^ 2) / sum(w)

    return loss
end

I have changed the options following your suggestions, but forgot to turn on the verbosity to see the current expr/sec.
I think they should help. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance issue debugging #518

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Performance issue debugging #518

eelregit Jan 5, 2024

Replies: 1 comment · 1 reply

MilesCranmer Jan 17, 2024 Maintainer

eelregit Jan 19, 2024 Author

eelregit
Jan 5, 2024

Replies: 1 comment 1 reply

MilesCranmer
Jan 17, 2024
Maintainer

eelregit Jan 19, 2024
Author