Skip to content

Faster GMM1_lpdf #940

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

antoine-galataud
Copy link

@antoine-galataud antoine-galataud commented Jun 16, 2025

Hi there,

While running hyperopt in Ray Tune with large number of samples, I noticed that performance was dropping quite fast after a few hundreds of them.

I did some profiling using cProfile and noticed that GMM1_lpdf had a important total time of execution. See below an example of profiling results sorted by total time

image

2 things to notice:

  • GMM1_lpdf total time is large
  • normal_cdf total time is large too, and its number of calls is quite important.

Looking at GMM1_lpdf code, the code block that calls normal_cdf most is the following:

hyperopt/hyperopt/tpe.py

Lines 153 to 166 in 0658f68

prob = np.zeros(samples.shape, dtype="float64")
for w, mu, sigma in zip(weights, mus, sigmas):
if high is None:
ubound = samples + q / 2
else:
ubound = np.minimum(samples + q / 2, high)
if low is None:
lbound = samples - q / 2
else:
lbound = np.maximum(samples - q / 2, low)
# -- two-stage addition is slightly more numerically accurate
inc_amt = w * normal_cdf(ubound, mu, sigma)
inc_amt -= w * normal_cdf(lbound, mu, sigma)
prob += inc_amt

Several observations here:

  • variables lbound and ubound don't depend on for loop parameters
  • calls to normal_cdf depends on length of given arrays, which grows with number of samples.

After analyzing the impact, I drafted a "vectorized" version of the code, which doesn't rely on for loop anymore but executes the computation in one pass.

Tests are passing, and I've been able to validate the results with in-house experiments too.

Feedback appreciated!

@antoine-galataud
Copy link
Author

Comparison of performance between the 2 versions of the code:

  • as weights / mus / sigmas length grows, and samples length remains constant and small, the new version of the code becomes more efficient.
  • if samples is large then there's a cut-off length where original version is again more efficient than new one. This is due to operations in normal_cdf on ubound and lbound that become large, especially the call to scipy.special.erf.

While testing with my scenarios, I encountered only case number 1. But that may not cover all use cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant