Faster GMM1_lpdf #940

antoine-galataud · 2025-06-16T19:05:35Z

Hi there,

While running hyperopt in Ray Tune with large number of samples, I noticed that performance was dropping quite fast after a few hundreds of them.

I did some profiling using cProfile and noticed that GMM1_lpdf had a important total time of execution. See below an example of profiling results sorted by total time

2 things to notice:

GMM1_lpdf total time is large
normal_cdf total time is large too, and its number of calls is quite important.

Looking at GMM1_lpdf code, the code block that calls normal_cdf most is the following:

hyperopt/hyperopt/tpe.py

Lines 153 to 166 in 0658f68

    
           prob = np.zeros(samples.shape, dtype="float64") 
        
           for w, mu, sigma in zip(weights, mus, sigmas): 
        
               if high is None: 
        
                   ubound = samples + q / 2 
        
               else: 
        
                   ubound = np.minimum(samples + q / 2, high) 
        
               if low is None: 
        
                   lbound = samples - q / 2 
        
               else: 
        
                   lbound = np.maximum(samples - q / 2, low) 
        
               # -- two-stage addition is slightly more numerically accurate 
        
               inc_amt = w * normal_cdf(ubound, mu, sigma) 
        
               inc_amt -= w * normal_cdf(lbound, mu, sigma) 
        
               prob += inc_amt

Several observations here:

variables lbound and ubound don't depend on for loop parameters
calls to normal_cdf depends on length of given arrays, which grows with number of samples.

After analyzing the impact, I drafted a "vectorized" version of the code, which doesn't rely on for loop anymore but executes the computation in one pass.

Tests are passing, and I've been able to validate the results with in-house experiments too.

Feedback appreciated!

for more information, see https://pre-commit.ci

antoine-galataud · 2025-06-17T08:44:38Z

Comparison of performance between the 2 versions of the code:

as weights / mus / sigmas length grows, and samples length remains constant and small, the new version of the code becomes more efficient.
if samples is large then there's a cut-off length where original version is again more efficient than new one. This is due to operations in normal_cdf on ubound and lbound that become large, especially the call to scipy.special.erf.

While testing with my scenarios, I encountered only case number 1. But that may not cover all use cases.

antoine-galataud and others added 4 commits June 16, 2025 18:01

use vectorized version of GMM1_lpdf

0a8696d

trigger build

9026c5a

use np.newaxis instead of None for clearer dim expansion

9a53b88

[pre-commit.ci] auto fixes from pre-commit.com hooks

8fbb774

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Faster GMM1_lpdf #940

Faster GMM1_lpdf #940

Uh oh!

antoine-galataud commented Jun 16, 2025 •

edited

Loading

Uh oh!

antoine-galataud commented Jun 17, 2025

Uh oh!

Uh oh!

	prob = np.zeros(samples.shape, dtype="float64")
	for w, mu, sigma in zip(weights, mus, sigmas):
	if high is None:
	ubound = samples + q / 2
	else:
	ubound = np.minimum(samples + q / 2, high)
	if low is None:
	lbound = samples - q / 2
	else:
	lbound = np.maximum(samples - q / 2, low)
	# -- two-stage addition is slightly more numerically accurate
	inc_amt = w * normal_cdf(ubound, mu, sigma)
	inc_amt -= w * normal_cdf(lbound, mu, sigma)
	prob += inc_amt

Faster GMM1_lpdf #940

Are you sure you want to change the base?

Faster GMM1_lpdf #940

Uh oh!

Conversation

antoine-galataud commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

antoine-galataud commented Jun 17, 2025

Uh oh!

Uh oh!

antoine-galataud commented Jun 16, 2025 •

edited

Loading