Adding query performance optimizations #57

tee8z · 2025-02-25T23:28:31Z

We have found that adding these few indexes and adding the most common query paths for the get_category_messages function as static queries to greatly improve the performance of our application using this database model. Based off of our own experience these changes have been able to reduce the query time for ~1000 rows using the get_category_messages from ~200ms to ~6ms. We leverage the get_category_messages heavily throughout our application logic to get the next batch of messages for a given stream to process, so this has had massively positive benefits. The additional indexes have also helped in that speed up. One thing to note, we are adding 3 indexes so this may slow down the writes some what and should be watch closely to see how often the indexes are hit for a given use-case.

sbellware · 2025-03-03T18:03:47Z

@tee8z This is good stuff. Thanks.

It's not clear yet that it's pertinent to most users. The point about write performance is top-of-mind, but it's not entirely definitive, either. It depends on the workloads.

I'm really curious about your workloads. Having some more insight would help characterize the needs of the change, and the tradeoff.

Typically, the query performance of retrieving a batch of messages from a category in order to process those messages is insignificant compared to the amount of time it takes to process those messages.

Optimizing the batch retrieval query improves the performance of an infinitesimal fraction of the total time between the execution of batch retrieval queries.

The typical workload looks like:

retrieve 1000 messages -> process each message of the 1000 retrieved -> retrieve anther 1000 messages -> process each message of the 1000 retrieved -> and repeat

The slow part is the processing of the messages. Making the batch retrieval go faster doesn't make much of a noticeable impact on the system performance because for each message in the batch, there's at least one message write, or some other I/O.

So, lets say that it takes 1000 units of time to retrieve and process 1000 messages. Out of that 1000 units of time, 1 unit of time might be taken up by the retrieval. Optimizing 1/1000th of the operation usually doesn't yield an improvement worth undertaking with the extra cost of ownership of the additional implementation.

In other workloads, like bulk loading and data analysis for example, the category batch retrieval might happen in a tight loop, which makes its performance for more significant.

Can you disclose the kind of workload you have, or can you characterize how much time is spent in the retrieval of a batch and the processing of a batch?

Thanks!

adding query performance optimizations

6f0593a

tee8z force-pushed the improve-get-category-messages branch from bd5bfd4 to 6f0593a Compare February 26, 2025 15:00

tee8z marked this pull request as ready for review February 26, 2025 15:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding query performance optimizations #57

Adding query performance optimizations #57

Uh oh!

tee8z commented Feb 25, 2025

Uh oh!

sbellware commented Mar 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

Adding query performance optimizations #57

Are you sure you want to change the base?

Adding query performance optimizations #57

Uh oh!

Conversation

tee8z commented Feb 25, 2025

Uh oh!

sbellware commented Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

sbellware commented Mar 3, 2025 •

edited

Loading