Skip to content

Labels: simpler/faster stringlabels encoding #16069

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 30, 2025

Conversation

bboreham
Copy link
Member

@bboreham bboreham commented Feb 23, 2025

Instead of using varint to encode the size of each label, use a single byte for size 0-254, or a flag value of 255 followed by the size in 3 bytes little-endian.

This reduces the amount of code, and also the number of branches in commonly-executed code, so it runs faster.

The maximum allowed label name or value length is now 2^24 or 16MB.

Memory used by labels changes as follows:

  • Labels from 0 to 127 bytes length: same
  • From 128 to 254: 1 byte less
  • From 255 to 16383: 2 bytes more
  • From 16384 to 2MB: 1 byte more
  • From 2MB to 16MB: same

Benchmarks - Go 1.24.0, x64:

goos: linux
goarch: amd64
pkg: github.com/prometheus/prometheus/model/labels
cpu: Intel(R) Core(TM) i7-14700K
                                                 │ before-124.txt │            after-124.txt            │
                                                 │     sec/op     │    sec/op     vs base               │
Labels_Get/with_5_labels/first_label/get-28          3.348n ±  1%    3.344n ± 0%        ~ (p=0.420 n=6)
Labels_Get/with_5_labels/first_label/has-28          2.421n ±  0%    2.599n ± 1%   +7.33% (p=0.002 n=6)
Labels_Get/with_5_labels/middle_label/get-28         6.792n ±  1%    5.438n ± 0%  -19.94% (p=0.002 n=6)
Labels_Get/with_5_labels/middle_label/has-28         3.633n ±  1%    3.825n ± 1%   +5.30% (p=0.002 n=6)
Labels_Get/with_5_labels/last_label/get-28           8.537n ±  1%    7.258n ± 1%  -14.98% (p=0.002 n=6)
Labels_Get/with_5_labels/last_label/has-28           7.284n ±  1%    6.510n ± 0%  -10.63% (p=0.002 n=6)
Labels_Get/with_5_labels/not-found_label/get-28      2.794n ±  1%    2.945n ± 0%   +5.39% (p=0.002 n=6)
Labels_Get/with_5_labels/not-found_label/has-28      2.804n ±  1%    2.885n ± 1%   +2.89% (p=0.002 n=6)
Labels_Get/with_10_labels/first_label/get-28         3.338n ±  0%    3.331n ± 0%        ~ (p=0.102 n=6)
Labels_Get/with_10_labels/first_label/has-28         2.411n ±  1%    2.591n ± 0%   +7.42% (p=0.002 n=6)
Labels_Get/with_10_labels/middle_label/get-28        9.257n ±  1%    8.177n ± 1%  -11.66% (p=0.002 n=6)
Labels_Get/with_10_labels/middle_label/has-28        8.072n ±  1%    7.429n ± 0%   -7.97% (p=0.002 n=6)
Labels_Get/with_10_labels/last_label/get-28          12.60n ±  0%    13.54n ± 0%   +7.50% (p=0.002 n=6)
Labels_Get/with_10_labels/last_label/has-28          11.46n ±  0%    12.71n ± 0%  +10.96% (p=0.002 n=6)
Labels_Get/with_10_labels/not-found_label/get-28     2.772n ±  1%    2.945n ± 1%   +6.22% (p=0.002 n=6)
Labels_Get/with_10_labels/not-found_label/has-28     2.787n ±  1%    2.876n ± 1%   +3.19% (p=0.002 n=6)
Labels_Get/with_30_labels/first_label/get-28         3.343n ±  2%    3.334n ± 0%        ~ (p=0.563 n=6)
Labels_Get/with_30_labels/first_label/has-28         2.421n ±  1%    2.596n ± 1%   +7.21% (p=0.002 n=6)
Labels_Get/with_30_labels/middle_label/get-28        39.58n ±  2%    23.95n ± 0%  -39.48% (p=0.002 n=6)
Labels_Get/with_30_labels/middle_label/has-28        37.31n ±  3%    23.07n ± 1%  -38.17% (p=0.002 n=6)
Labels_Get/with_30_labels/last_label/get-28          74.30n ±  0%    73.53n ± 1%   -1.04% (p=0.004 n=6)
Labels_Get/with_30_labels/last_label/has-28          73.51n ±  1%    73.40n ± 1%        ~ (p=1.000 n=6)
Labels_Get/with_30_labels/not-found_label/get-28     2.788n ±  1%    2.970n ± 1%   +6.57% (p=0.002 n=6)
Labels_Get/with_30_labels/not-found_label/has-28     2.791n ±  1%    2.886n ± 1%   +3.39% (p=0.002 n=6)
Labels_Equals/equal-28                               1.668n ±  1%    1.669n ± 1%        ~ (p=0.468 n=6)
Labels_Equals/not_equal-28                          0.1863n ±  0%   0.1861n ± 1%        ~ (p=0.649 n=6)
Labels_Equals/different_sizes-28                    0.1864n ±  1%   0.1853n ± 1%        ~ (p=0.119 n=6)
Labels_Equals/lots-28                                1.667n ±  1%    1.679n ± 2%   +0.69% (p=0.013 n=6)
Labels_Equals/real_long_equal-28                     4.287n ±  1%    4.317n ± 1%   +0.68% (p=0.041 n=6)
Labels_Equals/real_long_different_end-28             3.474n ±  1%    3.448n ± 1%   -0.75% (p=0.024 n=6)
Labels_Compare/equal-28                              3.392n ±  1%    3.359n ± 0%   -0.97% (p=0.006 n=6)
Labels_Compare/not_equal-28                          12.79n ±  1%    10.22n ± 1%  -20.06% (p=0.002 n=6)
Labels_Compare/different_sizes-28                    2.595n ±  2%    2.605n ± 0%        ~ (p=0.061 n=6)
Labels_Compare/lots-28                               18.95n ±  1%    18.07n ± 1%   -4.65% (p=0.002 n=6)
Labels_Compare/real_long_equal-28                    19.61n ±  0%    19.38n ± 1%   -1.20% (p=0.002 n=6)
Labels_Compare/real_long_different_end-28            22.71n ±  1%    22.83n ± 1%   +0.51% (p=0.017 n=6)
Labels_Hash/typical_labels_under_1KB-28              41.52n ±  0%    41.68n ± 1%        ~ (p=0.084 n=6)
Labels_Hash/bigger_labels_over_1KB-28                50.95n ±  0%    50.81n ± 1%        ~ (p=0.662 n=6)
Labels_Hash/extremely_large_label_value_10MB-28      502.1µ ±  1%    514.2µ ± 1%   +2.42% (p=0.002 n=6)
Builder-28                                           229.8n ±  1%    214.3n ± 1%   -6.74% (p=0.002 n=6)
Labels_Copy-28                                       43.47n ± 17%    44.34n ± 6%        ~ (p=0.132 n=6)
geomean                                              8.780n          8.505n        -3.13%

Benchmarks - Go 1.24.0, ARM64:

goos: darwin
goarch: arm64
pkg: github.com/prometheus/prometheus/model/labels
cpu: Apple M2
                                                │  before.txt   │              after.txt              │
                                                │    sec/op     │    sec/op     vs base               │
Labels_Get/with_5_labels/first_label/get-8         5.333n ±  1%    4.844n ± 3%   -9.18% (p=0.002 n=6)
Labels_Get/with_5_labels/first_label/has-8         4.444n ±  4%    4.495n ± 3%        ~ (p=0.240 n=6)
Labels_Get/with_5_labels/middle_label/get-8        8.277n ±  1%    7.799n ± 3%   -5.78% (p=0.002 n=6)
Labels_Get/with_5_labels/middle_label/has-8        7.386n ±  6%    7.424n ± 1%        ~ (p=0.132 n=6)
Labels_Get/with_5_labels/last_label/get-8          10.94n ±  2%    10.38n ± 1%   -5.08% (p=0.002 n=6)
Labels_Get/with_5_labels/last_label/has-8          10.05n ±  1%    10.09n ± 1%        ~ (p=0.139 n=6)
Labels_Get/with_5_labels/not-found_label/get-8     6.240n ±  1%    5.396n ± 1%  -13.53% (p=0.002 n=6)
Labels_Get/with_5_labels/not-found_label/has-8     5.333n ±  3%    5.328n ± 0%        ~ (p=0.561 n=6)
Labels_Get/with_10_labels/first_label/get-8        5.319n ±  1%    4.727n ± 0%  -11.12% (p=0.002 n=6)
Labels_Get/with_10_labels/first_label/has-8        4.431n ±  2%    4.445n ± 3%        ~ (p=0.584 n=6)
Labels_Get/with_10_labels/middle_label/get-8       12.71n ±  0%    12.12n ± 2%   -4.60% (p=0.002 n=6)
Labels_Get/with_10_labels/middle_label/has-8       11.84n ±  2%    11.85n ± 1%        ~ (p=0.935 n=6)
Labels_Get/with_10_labels/last_label/get-8         20.08n ±  0%    19.58n ± 1%   -2.49% (p=0.002 n=6)
Labels_Get/with_10_labels/last_label/has-8         18.58n ±  0%    18.54n ± 0%        ~ (p=0.167 n=6)
Labels_Get/with_10_labels/not-found_label/get-8    6.260n ±  1%    5.396n ± 1%  -13.80% (p=0.002 n=6)
Labels_Get/with_10_labels/not-found_label/has-8    5.341n ±  0%    5.326n ± 0%        ~ (p=0.325 n=6)
Labels_Get/with_30_labels/first_label/get-8        5.316n ±  0%    4.730n ± 0%  -11.02% (p=0.002 n=6)
Labels_Get/with_30_labels/first_label/has-8        4.449n ±  4%    4.434n ± 0%        ~ (p=0.556 n=6)
Labels_Get/with_30_labels/middle_label/get-8       37.02n ±  1%    36.44n ± 1%   -1.57% (p=0.002 n=6)
Labels_Get/with_30_labels/middle_label/has-8       35.12n ±  1%    35.62n ± 0%   +1.44% (p=0.013 n=6)
Labels_Get/with_30_labels/last_label/get-8         78.47n ±  2%    77.29n ± 1%   -1.50% (p=0.002 n=6)
Labels_Get/with_30_labels/last_label/has-8         75.53n ±  1%    75.83n ± 0%        ~ (p=0.093 n=6)
Labels_Get/with_30_labels/not-found_label/get-8    6.261n ±  1%    5.391n ± 0%  -13.89% (p=0.002 n=6)
Labels_Get/with_30_labels/not-found_label/has-8    5.324n ±  0%    5.332n ± 1%        ~ (p=0.058 n=6)
Labels_Equals/equal-8                              2.956n ±  0%    2.953n ± 1%        ~ (p=0.190 n=6)
Labels_Equals/not_equal-8                         0.3651n ± 14%   0.3140n ± 0%        ~ (p=0.320 n=6)
Labels_Equals/different_sizes-8                   0.3154n ±  1%   0.3137n ± 0%        ~ (p=0.093 n=6)
Labels_Equals/lots-8                               2.668n ±  1%    2.664n ± 0%        ~ (p=0.169 n=6)
Labels_Equals/real_long_equal-8                    7.975n ±  1%    8.003n ± 1%        ~ (p=0.126 n=6)
Labels_Equals/real_long_different_end-8            6.788n ±  1%    6.790n ± 1%        ~ (p=0.325 n=6)
Labels_Compare/equal-8                             6.621n ±  1%    6.580n ± 0%        ~ (p=0.065 n=6)
Labels_Compare/not_equal-8                         13.61n ±  1%    12.72n ± 1%   -6.54% (p=0.002 n=6)
Labels_Compare/different_sizes-8                   5.877n ±  2%    5.787n ± 1%   -1.54% (p=0.015 n=6)
Labels_Compare/lots-8                              26.50n ±  0%    25.68n ± 1%   -3.09% (p=0.002 n=6)
Labels_Compare/real_long_equal-8                   16.71n ±  0%    16.73n ± 2%        ~ (p=0.087 n=6)
Labels_Compare/real_long_different_end-8           28.27n ±  0%    26.87n ± 0%   -4.97% (p=0.002 n=6)
Labels_Hash/typical_labels_under_1KB-8             54.77n ±  0%    54.68n ± 0%   -0.17% (p=0.015 n=6)
Labels_Hash/bigger_labels_over_1KB-8               67.16n ±  0%    66.97n ± 0%   -0.28% (p=0.002 n=6)
Labels_Hash/extremely_large_label_value_10MB-8     678.5µ ±  0%    679.2µ ± 0%        ~ (p=0.394 n=6)
Builder-8                                          254.1n ±  0%    233.5n ± 0%   -8.09% (p=0.002 n=6)
Labels_Copy-8                                      28.27n ±  1%    27.52n ± 1%   -2.64% (p=0.002 n=6)
geomean                                            13.18n          12.74n        -3.34%

@machine424
Copy link
Member

machine424 commented Feb 28, 2025

I like the idea, we just need to agree that 16MB is enough (maybe some docs to update)

Also, just thinking out loud, we could take this a step further.
Currently, the flag 255 indicates that the next 3 bytes should be used.
What if we use the flag 254 for integers that fit within 2 bytes?

This way, it's incremental.
Do you think it'd be costly? I don’t think it would be as expensive as varint.

(actually, with the current approach, I think we can make the 3 bytes store number from 254 to 16MB + 254 but that would require more computations maybe (substract and add 254). but well if you're reaching 16MB, extra 254 isn't really the answer )

@bboreham
Copy link
Member Author

bboreham commented Mar 1, 2025

Every branch slows it down. This is why I went for a very simple scheme.
However it isn’t that great on AMD64. Maybe some analysis can improve it.

@machine424
Copy link
Member

machine424 commented Mar 3, 2025

I missed that, I even re-ran the benchmarks, but I have an Arm x)
It's true that we end up with either 1 or 4 "memory accesses", with no in-between, but with less logic/branching and that actually suits Arm better. So yes, maybe there are some changes we could make to optimize for Amd as well.
I'll take a look as well when I have some time.

@machine424
Copy link
Member

Found myself looking at the assembly for regressions, apparently hoisting (I thought gc was able to do that) index in decodeSize

 func decodeSize(data string, index int) (int, int) {
        b := data[index]
        index++
        if b < 255 {
                return int(b), index
        }
        ...

helps speed up some of them, the change also makes that 1-byte case "identical" with the code in main.

goos: linux
goarch: amd64
pkg: github.com/prometheus/prometheus/model/labels
cpu: AMD EPYC 7R32
                                                │ prometheus-main/main.txt │        prometheus-pr/pr.txt         │ prometheus-pr-hoisting/pr-hoisting.txt │
                                                │          sec/op          │    sec/op     vs base               │     sec/op       vs base               │
Labels_Get/with_5_labels/first_label/get-8                     7.801n ± 1%    7.427n ± 1%   -4.79% (p=0.002 n=6)      7.744n ±  1%        ~ (p=0.065 n=6)
Labels_Get/with_5_labels/first_label/has-8                     5.896n ± 1%    6.806n ± 1%  +15.45% (p=0.002 n=6)      5.923n ±  1%        ~ (p=0.394 n=6)
Labels_Get/with_5_labels/middle_label/get-8                    11.03n ± 2%    11.17n ± 0%        ~ (p=0.071 n=6)      11.01n ±  2%        ~ (p=0.903 n=6)
Labels_Get/with_5_labels/middle_label/has-8                    8.940n ± 2%   10.530n ± 3%  +17.79% (p=0.002 n=6)      8.908n ±  1%        ~ (p=0.589 n=6)
Labels_Get/with_5_labels/last_label/get-8                      16.43n ± 1%    16.64n ± 0%   +1.25% (p=0.002 n=6)      16.16n ±  0%   -1.64% (p=0.002 n=6)
Labels_Get/with_5_labels/last_label/has-8                      13.40n ± 1%    14.62n ± 0%   +9.06% (p=0.002 n=6)      13.35n ±  1%   -0.45% (p=0.032 n=6)
Labels_Get/with_5_labels/not-found_label/get-8                 6.706n ± 1%    7.380n ± 0%  +10.04% (p=0.002 n=6)      7.423n ±  0%  +10.68% (p=0.002 n=6)
Labels_Get/with_5_labels/not-found_label/has-8                 6.831n ± 2%    7.785n ± 8%  +13.97% (p=0.002 n=6)      6.685n ±  0%   -2.14% (p=0.002 n=6)
Labels_Get/with_10_labels/first_label/get-8                    7.731n ± 1%    7.447n ± 0%   -3.67% (p=0.002 n=6)      7.769n ±  0%        ~ (p=0.558 n=6)
Labels_Get/with_10_labels/first_label/has-8                    5.917n ± 2%    6.832n ± 4%  +15.47% (p=0.002 n=6)      6.098n ±  6%        ~ (p=0.310 n=6)
Labels_Get/with_10_labels/middle_label/get-8                   19.33n ± 0%    19.38n ± 0%   +0.23% (p=0.011 n=6)      18.45n ±  1%   -4.55% (p=0.002 n=6)
Labels_Get/with_10_labels/middle_label/has-8                   15.92n ± 1%    17.25n ± 0%   +8.29% (p=0.002 n=6)      15.88n ±  0%        ~ (p=0.134 n=6)
Labels_Get/with_10_labels/last_label/get-8                     31.89n ± 0%    31.74n ± 0%   -0.45% (p=0.002 n=6)      31.67n ±  0%   -0.69% (p=0.002 n=6)
Labels_Get/with_10_labels/last_label/has-8                     28.34n ± 0%    28.83n ± 0%   +1.71% (p=0.002 n=6)      28.28n ±  0%        ~ (p=0.054 n=6)
Labels_Get/with_10_labels/not-found_label/get-8                6.796n ± 2%    7.413n ± 1%   +9.09% (p=0.002 n=6)      7.417n ±  0%   +9.15% (p=0.002 n=6)
Labels_Get/with_10_labels/not-found_label/has-8                6.809n ± 1%    7.743n ± 1%  +13.72% (p=0.002 n=6)      6.668n ±  0%   -2.06% (p=0.002 n=6)
Labels_Get/with_30_labels/first_label/get-8                    7.763n ± 7%    7.414n ± 0%   -4.48% (p=0.002 n=6)      7.731n ±  0%        ~ (p=0.180 n=6)
Labels_Get/with_30_labels/first_label/has-8                    5.893n ± 0%    6.795n ± 2%  +15.32% (p=0.002 n=6)      6.115n ± 19%   +3.78% (p=0.041 n=6)
Labels_Get/with_30_labels/middle_label/get-8                   50.14n ± 0%    49.94n ± 0%   -0.40% (p=0.002 n=6)      49.88n ±  0%   -0.51% (p=0.002 n=6)
Labels_Get/with_30_labels/middle_label/has-8                   46.62n ± 0%    47.44n ± 0%   +1.77% (p=0.002 n=6)      46.57n ±  0%   -0.11% (p=0.032 n=6)
Labels_Get/with_30_labels/last_label/get-8                     93.72n ± 0%    93.41n ± 0%   -0.33% (p=0.002 n=6)      93.48n ±  0%   -0.26% (p=0.002 n=6)
Labels_Get/with_30_labels/last_label/has-8                     90.26n ± 0%    90.89n ± 0%   +0.70% (p=0.002 n=6)      90.17n ±  0%   -0.09% (p=0.024 n=6)
Labels_Get/with_30_labels/not-found_label/get-8                6.716n ± 1%    7.377n ± 4%   +9.83% (p=0.002 n=6)      7.411n ±  1%  +10.34% (p=0.002 n=6)
Labels_Get/with_30_labels/not-found_label/has-8                6.800n ± 1%    7.721n ± 0%  +13.54% (p=0.002 n=6)      6.699n ±  1%   -1.49% (p=0.002 n=6)
Labels_Equals/equal-8                                          5.267n ± 0%    5.247n ± 0%        ~ (p=0.065 n=6)      5.255n ±  0%        ~ (p=0.167 n=6)
Labels_Equals/not_equal-8                                     0.6199n ± 0%   0.6215n ± 1%   +0.26% (p=0.039 n=6)     0.6200n ±  1%        ~ (p=1.000 n=6)
Labels_Equals/different_sizes-8                               0.6205n ± 1%   0.6193n ± 1%        ~ (p=0.589 n=6)     0.6179n ±  1%   -0.42% (p=0.035 n=6)
Labels_Equals/lots-8                                           5.244n ± 0%    5.255n ± 0%   +0.22% (p=0.037 n=6)      5.240n ±  0%        ~ (p=0.329 n=6)
Labels_Equals/real_long_equal-8                                12.49n ± 1%    12.67n ± 2%   +1.44% (p=0.026 n=6)      12.55n ±  1%        ~ (p=0.368 n=6)
Labels_Equals/real_long_different_end-8                        9.410n ± 2%    9.460n ± 3%        ~ (p=0.240 n=6)      9.413n ±  0%        ~ (p=0.416 n=6)
Labels_Compare/equal-8                                         9.245n ± 4%    9.312n ± 2%        ~ (p=0.485 n=6)      9.008n ±  3%        ~ (p=0.065 n=6)
Labels_Compare/not_equal-8                                     20.88n ± 9%    18.47n ± 0%  -11.56% (p=0.002 n=6)      19.35n ±  5%   -7.35% (p=0.002 n=6)
Labels_Compare/different_sizes-8                               6.993n ± 3%    7.044n ± 0%        ~ (p=0.372 n=6)      7.019n ±  1%        ~ (p=0.589 n=6)
Labels_Compare/lots-8                                          40.14n ± 1%    50.34n ± 0%  +25.38% (p=0.002 n=6)      39.43n ±  0%   -1.79% (p=0.002 n=6)
Labels_Compare/real_long_equal-8                               22.41n ± 3%    22.37n ± 2%        ~ (p=0.699 n=6)      22.75n ±  2%   +1.56% (p=0.039 n=6)
Labels_Compare/real_long_different_end-8                       48.12n ± 0%    46.35n ± 0%   -3.69% (p=0.002 n=6)      46.29n ±  0%   -3.81% (p=0.002 n=6)
Labels_Hash/typical_labels_under_1KB-8                         72.49n ± 0%    72.33n ± 0%        ~ (p=0.056 n=6)      72.59n ±  0%   +0.14% (p=0.009 n=6)
Labels_Hash/bigger_labels_over_1KB-8                           88.73n ± 0%    88.44n ± 0%   -0.34% (p=0.041 n=6)      88.81n ±  0%        ~ (p=0.076 n=6)
Labels_Hash/extremely_large_label_value_10MB-8                 837.7µ ± 0%    827.8µ ± 0%   -1.18% (p=0.002 n=6)      831.3µ ±  0%   -0.76% (p=0.002 n=6)
Builder-8                                                      565.2n ± 0%    500.3n ± 0%  -11.47% (p=0.002 n=6)      527.4n ±  0%   -6.70% (p=0.002 n=6)
Labels_Copy-8                                                  69.42n ± 1%    67.78n ± 2%   -2.36% (p=0.015 n=6)      68.11n ±  1%   -1.89% (p=0.009 n=6)
geomean

Still not-found_label/get is intriguing, I'll look at that.

Note that only the 1-byte case in decodeSize is exercised in the benchmarks above, we should cover the rest.

I also need to run this on ARM.

@bboreham
Copy link
Member Author

Note that only the 1-byte case in decodeSize is exercised in the benchmarks above, we should cover the rest.

Meh, I don't much care how fast it goes if you abuse the Prometheus data model.

Instead of using varint to encode the size of each label, use a single
byte for size 0-254, or a flag value of 255 followed by the size in
3 bytes little-endian.

This reduces the amount of code, and also the number of branches in
commonly-executed code, so it runs faster.

The maximum allowed label name or value length is now 2^24 or 16MB.

Memory used by labels changes as follows:
* Labels from 0 to 127 bytes length: same
* From 128 to 254: 1 byte less
* From 255 to 16383: 2 bytes more
* From 16384 to 2MB: 1 byte more
* From 2MB to 16MB: same

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Goes noticeably faster on AMD64 architecture.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Also add test that shows the problem. Credit to @dgl.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Slightly more user-friendly than encoding bad data and finding out when
we decode.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
@bboreham bboreham force-pushed the simpler-stringlabels branch from b835d3b to 78ab02a Compare April 22, 2025 18:34
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
@bboreham
Copy link
Member Author

bboreham commented Apr 24, 2025

Added @machine424's change which makes a decent improvement on amd64.
I found that reversing the conditions in decodeSize stops arm64 benchmarking slower.

Latest benchmark comparison against main (8487ed8).
There are a couple in Labels_Equals that are so fast they're probably not doing anything.

goos: darwin
goarch: arm64
pkg: github.com/prometheus/prometheus/model/labels
cpu: Apple M2
                                                │  before3.txt  │              after3.txt              │
                                                │    sec/op     │    sec/op      vs base               │
Labels_Get/with_5_labels/first_label/get-8         5.317n ±  1%    4.445n ±  2%  -16.41% (p=0.002 n=6)
Labels_Get/with_5_labels/first_label/has-8         4.423n ±  0%    4.214n ±  2%   -4.73% (p=0.002 n=6)
Labels_Get/with_5_labels/middle_label/get-8        8.264n ±  1%    6.269n ±  1%  -24.14% (p=0.002 n=6)
Labels_Get/with_5_labels/middle_label/has-8        7.381n ±  0%    5.971n ±  1%  -19.10% (p=0.002 n=6)
Labels_Get/with_5_labels/last_label/get-8         10.925n ±  4%    8.883n ±  5%  -18.70% (p=0.002 n=6)
Labels_Get/with_5_labels/last_label/has-8         10.050n ±  1%    8.050n ±  4%  -19.90% (p=0.002 n=6)
Labels_Get/with_5_labels/not-found_label/get-8     6.275n ±  1%    5.181n ±  1%  -17.43% (p=0.002 n=6)
Labels_Get/with_5_labels/not-found_label/has-8     5.364n ± 12%    5.429n ±  0%        ~ (p=0.394 n=6)
Labels_Get/with_10_labels/first_label/get-8        5.314n ±  2%    4.429n ±  4%  -16.64% (p=0.002 n=6)
Labels_Get/with_10_labels/first_label/has-8        4.447n ±  1%    4.141n ±  1%   -6.89% (p=0.002 n=6)
Labels_Get/with_10_labels/middle_label/get-8       12.72n ±  2%    10.82n ±  4%  -15.01% (p=0.002 n=6)
Labels_Get/with_10_labels/middle_label/has-8      11.885n ±  7%    9.804n ±  1%  -17.51% (p=0.002 n=6)
Labels_Get/with_10_labels/last_label/get-8         20.00n ±  7%    18.41n ±  3%   -8.00% (p=0.002 n=6)
Labels_Get/with_10_labels/last_label/has-8         18.62n ±  4%    17.16n ±  1%   -7.82% (p=0.002 n=6)
Labels_Get/with_10_labels/not-found_label/get-8    6.306n ±  7%    5.129n ± 13%  -18.66% (p=0.002 n=6)
Labels_Get/with_10_labels/not-found_label/has-8    5.341n ±  4%    5.468n ±  4%   +2.39% (p=0.041 n=6)
Labels_Get/with_30_labels/first_label/get-8        5.500n ±  9%    4.431n ±  0%  -19.44% (p=0.002 n=6)
Labels_Get/with_30_labels/first_label/has-8        4.491n ± 10%    4.136n ±  0%   -7.89% (p=0.002 n=6)
Labels_Get/with_30_labels/middle_label/get-8       37.44n ±  3%    36.41n ±  0%   -2.72% (p=0.002 n=6)
Labels_Get/with_30_labels/middle_label/has-8       35.48n ±  4%    34.65n ±  4%   -2.33% (p=0.035 n=6)
Labels_Get/with_30_labels/last_label/get-8         77.73n ±  1%    76.51n ±  1%   -1.58% (p=0.002 n=6)
Labels_Get/with_30_labels/last_label/has-8         75.49n ±  3%    74.13n ± 14%        ~ (p=0.061 n=6)
Labels_Get/with_30_labels/not-found_label/get-8    6.272n ±  3%    5.148n ±  2%  -17.91% (p=0.002 n=6)
Labels_Get/with_30_labels/not-found_label/has-8    5.323n ±  0%    5.399n ±  2%   +1.41% (p=0.002 n=6)
Labels_Equals/equal-8                              2.953n ±  2%    2.956n ±  1%        ~ (p=0.446 n=6)
Labels_Equals/not_equal-8                         0.3141n ±  1%   0.3136n ±  0%        ~ (p=0.154 n=6)
Labels_Equals/different_sizes-8                   0.3136n ±  0%   0.3136n ±  0%        ~ (p=0.652 n=6)
Labels_Equals/lots-8                               2.661n ±  1%    2.659n ±  1%        ~ (p=0.675 n=6)
Labels_Equals/real_long_equal-8                    7.973n ±  0%    8.015n ±  7%        ~ (p=0.180 n=6)
Labels_Equals/real_long_different_end-8            6.800n ±  3%    6.794n ±  4%        ~ (p=0.974 n=6)
Labels_Compare/equal-8                             6.498n ±  1%    6.506n ±  1%        ~ (p=0.589 n=6)
Labels_Compare/not_equal-8                         13.59n ±  4%    11.76n ±  4%  -13.46% (p=0.002 n=6)
Labels_Compare/different_sizes-8                   5.484n ±  1%    5.676n ±  1%   +3.52% (p=0.002 n=6)
Labels_Compare/lots-8                              26.52n ±  1%    23.21n ±  1%  -12.52% (p=0.002 n=6)
Labels_Compare/real_long_equal-8                   16.78n ±  1%    16.79n ± 10%        ~ (p=0.193 n=6)
Labels_Compare/real_long_different_end-8           28.31n ±  1%    25.40n ±  0%  -10.29% (p=0.002 n=6)
Labels_Hash/typical_labels_under_1KB-8             54.87n ±  2%    54.55n ±  4%        ~ (p=0.132 n=6)
Labels_Hash/bigger_labels_over_1KB-8               66.97n ±  1%    66.75n ±  0%        ~ (p=0.102 n=6)
Labels_Hash/extremely_large_label_value_10MB-8     676.3µ ±  0%    675.6µ ±  1%        ~ (p=0.818 n=6)
Builder-8                                          251.8n ±  4%    235.9n ±  0%   -6.33% (p=0.002 n=6)
Labels_Copy-8                                      27.91n ±  3%    27.76n ±  2%        ~ (p=0.104 n=6)
geomean                                            13.12n          12.11n         -7.68%
goos: linux
goarch: amd64
pkg: github.com/prometheus/prometheus/model/labels
cpu: Intel(R) Core(TM) i7-14700K
                                                 │  before.txt  │             after3.txt              │
                                                 │    sec/op    │    sec/op     vs base               │
Labels_Get/with_5_labels/first_label/get-28         3.353n ± 0%    3.161n ± 1%   -5.73% (p=0.002 n=6)
Labels_Get/with_5_labels/first_label/has-28         2.419n ± 0%    2.414n ± 1%        ~ (p=0.227 n=6)
Labels_Get/with_5_labels/middle_label/get-28        6.647n ± 1%    4.085n ± 1%  -38.53% (p=0.002 n=6)
Labels_Get/with_5_labels/middle_label/has-28        3.632n ± 1%    3.623n ± 1%   -0.26% (p=0.050 n=6)
Labels_Get/with_5_labels/last_label/get-28          8.486n ± 1%    6.436n ± 0%  -24.16% (p=0.002 n=6)
Labels_Get/with_5_labels/last_label/has-28          7.330n ± 1%    6.942n ± 2%   -5.30% (p=0.002 n=6)
Labels_Get/with_5_labels/not-found_label/get-28     2.788n ± 0%    2.780n ± 2%        ~ (p=0.818 n=6)
Labels_Get/with_5_labels/not-found_label/has-28     2.796n ± 1%    2.778n ± 0%   -0.64% (p=0.002 n=6)
Labels_Get/with_10_labels/first_label/get-28        3.353n ± 0%    3.162n ± 1%   -5.70% (p=0.002 n=6)
Labels_Get/with_10_labels/first_label/has-28        2.427n ± 1%    2.410n ± 0%   -0.72% (p=0.019 n=6)
Labels_Get/with_10_labels/middle_label/get-28       9.269n ± 1%    7.255n ± 0%  -21.73% (p=0.002 n=6)
Labels_Get/with_10_labels/middle_label/has-28       8.111n ± 1%    7.967n ± 1%   -1.77% (p=0.002 n=6)
Labels_Get/with_10_labels/last_label/get-28         12.70n ± 0%    12.47n ± 0%   -1.77% (p=0.002 n=6)
Labels_Get/with_10_labels/last_label/has-28         11.57n ± 0%    11.69n ± 0%   +1.08% (p=0.002 n=6)
Labels_Get/with_10_labels/not-found_label/get-28    2.795n ± 0%    2.774n ± 1%        ~ (p=0.065 n=6)
Labels_Get/with_10_labels/not-found_label/has-28    2.813n ± 1%    2.780n ± 0%   -1.19% (p=0.002 n=6)
Labels_Get/with_30_labels/first_label/get-28        3.356n ± 0%    3.162n ± 0%   -5.77% (p=0.002 n=6)
Labels_Get/with_30_labels/first_label/has-28        2.422n ± 1%    2.413n ± 0%   -0.39% (p=0.002 n=6)
Labels_Get/with_30_labels/middle_label/get-28       40.18n ± 2%    22.18n ± 0%  -44.81% (p=0.002 n=6)
Labels_Get/with_30_labels/middle_label/has-28       38.39n ± 2%    37.70n ± 3%        ~ (p=0.132 n=6)
Labels_Get/with_30_labels/last_label/get-28         74.80n ± 0%    73.95n ± 0%   -1.12% (p=0.002 n=6)
Labels_Get/with_30_labels/last_label/has-28         73.58n ± 0%    73.97n ± 0%   +0.53% (p=0.004 n=6)
Labels_Get/with_30_labels/not-found_label/get-28    2.796n ± 1%    2.780n ± 1%   -0.57% (p=0.041 n=6)
Labels_Get/with_30_labels/not-found_label/has-28    2.817n ± 1%    2.784n ± 0%   -1.17% (p=0.002 n=6)
Labels_Equals/equal-28                              1.673n ± 0%    1.663n ± 0%   -0.60% (p=0.002 n=6)
Labels_Equals/not_equal-28                         0.1861n ± 1%   0.2479n ± 7%  +33.21% (p=0.002 n=6)
Labels_Equals/different_sizes-28                   0.1860n ± 0%   0.2497n ± 7%  +34.25% (p=0.002 n=6)
Labels_Equals/lots-28                               1.673n ± 0%    1.667n ± 0%   -0.33% (p=0.032 n=6)
Labels_Equals/real_long_equal-28                    4.295n ± 0%    4.278n ± 1%   -0.40% (p=0.009 n=6)
Labels_Equals/real_long_different_end-28            3.440n ± 1%    3.527n ± 1%   +2.53% (p=0.002 n=6)
Labels_Compare/equal-28                             3.377n ± 0%    3.365n ± 1%        ~ (p=0.074 n=6)
Labels_Compare/not_equal-28                         12.38n ± 1%    10.34n ± 1%  -16.51% (p=0.002 n=6)
Labels_Compare/different_sizes-28                   2.605n ± 1%    2.598n ± 0%        ~ (p=0.061 n=6)
Labels_Compare/lots-28                              19.13n ± 0%    17.64n ± 1%   -7.79% (p=0.002 n=6)
Labels_Compare/real_long_equal-28                   19.46n ± 1%    19.06n ± 1%   -2.08% (p=0.002 n=6)
Labels_Compare/real_long_different_end-28           22.58n ± 1%    23.14n ± 1%   +2.50% (p=0.002 n=6)
Labels_Hash/typical_labels_under_1KB-28             41.74n ± 0%    41.69n ± 0%        ~ (p=0.366 n=6)
Labels_Hash/bigger_labels_over_1KB-28               51.23n ± 0%    51.09n ± 0%   -0.26% (p=0.015 n=6)
Labels_Hash/extremely_large_label_value_10MB-28     506.6µ ± 1%    506.7µ ± 2%        ~ (p=0.589 n=6)
Builder-28                                          219.8n ± 1%    191.2n ± 0%  -13.05% (p=0.002 n=6)
Labels_Copy-28                                      34.95n ± 2%    33.36n ± 2%   -4.55% (p=0.002 n=6)
geomean                                             8.739n         8.362n        -4.31%

@bboreham bboreham marked this pull request as ready for review April 24, 2025 08:58
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks!

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
@bboreham bboreham merged commit b2c2146 into prometheus:main Apr 30, 2025
27 checks passed
@machine424
Copy link
Member

There are a couple in Labels_Equals that are so fast they're probably not doing anything.

Yes some of the cases results are intriguing, I even got completely different results on different AMD CPUs (lost the diff unfortunately).
Given that all the benchs except Labels_Hash only exercise the same < 0x80 branch which should be identical now for both versions, the change in the code layout (changes in branches that aren't exercised) has really considerable impact in here.

@bboreham bboreham deleted the simpler-stringlabels branch May 2, 2025 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants