Skip to content

[OptionsResolver] Optimize splitOutsideParenthesis() - 2.91x faster #61239

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: 7.4
Choose a base branch
from

Conversation

bendavies
Copy link
Contributor

@bendavies bendavies commented Jul 25, 2025

Q A
Branch? 7.4
Bug fix? no
New feature? no
Deprecations? no
Issues Fix #59354
License MIT

This PR optimises the splitOutsideParenthesis method in OptionsResolver.php, achieving a 2.91x performance improvement.

I discovered this method as a performance hotspot while benchmarking a large Symfony form with many fields. Profiling revealed that splitOutsideParenthesis was consuming a significant portion of the form processing time.

The splitOutsideParenthesis method (introduced in PR #59354) is called frequently during options resolution and has several performance bottlenecks:

  1. Character-by-character string concatenation creates new string objects on each iteration (particularly inefficient in PHP due to copy-on-write behavior)
  2. All input strings are processed the same way, regardless of complexity - no fast path for simple types
  3. Multiple conditional checks per character

Test Methodology

Here's how all performance measurements were conducted:

  • Benchmark tool: hyperfine (10 runs with 1 warmup run)
  • Test iterations: 100,000 iterations per test case
  • Test data: 16 different type patterns:
    • Simple types: string, int, bool, array
    • Union types: string|int, string|int|bool, string|int|bool|array
    • Parentheses types: string|(int|bool), (string|int)|bool
    • Nested types: string|(int|(bool|float)), (string|int)|(bool|float)
    • Array types: string[], int[]
    • Class types: MyClass, \\Namespace\\Class
    • Complex union: string|int|bool|array|object|resource|callable

Each optimisation was tested in isolation to measure its individual impact, then all optimisations were combined for the final benchmark.

Optimisations

1. Fast Path for Simple Types (No Pipes)

Most type declarations are simple types like string, int, MyClass, etc. without any union types.

Implementation:

if (!\str_contains($type, '|')) {
    return [$type];
}

2. Fast Path for Union Types (No Parentheses)

Common union types like string|int|bool don't need complex parsing - PHP's explode() is much faster.

Implementation:

if (!\str_contains($type, '(') && !\str_contains($type, ')')) {
    return \explode('|', $type);
}

3. Eliminate String Concatenation

String concatenation in loops creates memory overhead. Using substr() avoids creating intermediate strings.

Implementation:

// Instead of: $currentPart .= $char;
// Use: $parts[] = \substr($type, $start, $i - $start);

4. Switch Statement Optimisation

Eliminates Multiple conditional checks per character.

Implementation:

switch ($char) {
    case '(':
        ++$parenthesisLevel;
        break;
    case ')':
        --$parenthesisLevel;
        break;
    case '|':
        // ...
}

Benchmark Results

Individual Optimisation Impact

Testing each optimisation in isolation:

hyperfine --warmup 1 --runs 10 \
  --sort=command \
  --reference 'php test_original.php' \
  'php test_opt1_fast_path_simple.php' \
  'php test_opt2_fast_path_union.php' \
  'php test_opt3_no_string_concat.php'  \
  'php test_opt4_switch_statement.php'
Relative speed comparison
        1.00          php test_original.php
        1.23 ±  0.02  php test_opt1_fast_path_simple.php
        1.95 ±  0.04  php test_opt2_fast_path_union.php
        1.13 ±  0.03  php test_opt3_no_string_concat.php
        1.35 ±  0.03  php test_opt4_switch_statement.php

Combined Optimisation Impact

Combining all optimisations:

hyperfine --warmup 1 --runs 10 \
  --sort=command \
  --reference 'php test_original.php' \
  'php test_optimised.php'
Relative speed comparison
        1.00          php test_original.php
        2.91 ±  0.03  php test_optimised.php

@carsonbot

This comment has been minimized.

@bendavies bendavies force-pushed the optionresolver-splitOutsideParenthesis-performance branch from 1312319 to e01b177 Compare July 25, 2025 16:10
@bendavies bendavies changed the title [OptionsResolver] Optimize splitOutsideParenthesis() - 2.90x faster [OptionsResolver] Optimize splitOutsideParenthesis() - 2.91x faster Jul 25, 2025
@bendavies bendavies changed the base branch from 7.4 to 7.3 July 25, 2025 16:11
Copy link
Member

@alexandre-daubois alexandre-daubois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance improvements are refactors, which means this should target 7.4. 7.3 being already released, the branch receives bug fixes only.

Also, would you mind sharing the test set you used to run benchmarks? In addition to the methodology used as explained in the description, having the opportunity to see the benchmark code would be a plus!

@bendavies bendavies force-pushed the optionresolver-splitOutsideParenthesis-performance branch from e01b177 to 1591e54 Compare July 25, 2025 16:20
@yceruto
Copy link
Member

yceruto commented Jul 25, 2025

Hey @bendavies thanks for this optimization proposal! For performance refactoring PRs, we usually target the latest dev branch, in this case, 7.4

 - Fast path for simple types (no pipes)
 - Fast path for unions without parentheses
 - Eliminate string concatenation overhead
 - Switch statement for character matching

  Reduces form processing time significantly for large forms.
@bendavies bendavies force-pushed the optionresolver-splitOutsideParenthesis-performance branch from 1591e54 to c135a89 Compare July 25, 2025 16:26
@bendavies bendavies changed the base branch from 7.3 to 7.4 July 25, 2025 16:26
@bendavies
Copy link
Contributor Author

bendavies commented Jul 25, 2025

Performance improvements are refactors, which means this should target 7.4. 7.3 being already released, the branch receives bug fixes only.

Also, would you mind sharing the test set you used to run benchmarks? In addition to the methodology used as explained in the description, having the opportunity to see the benchmark code would be a plus!

# Test individual optimizations
hyperfine --warmup 1 --runs 10 \
  --reference 'php test_original.php' \
  'php test_opt1_fast_path_simple.php' \
  'php test_opt2_fast_path_union.php' \
  'php test_opt3_no_string_concat.php' \
  'php test_opt4_switch_statement.php'

# Test combined optimization
hyperfine --warmup 1 --runs 10 \
  --reference 'php test_original.php' \
  'php test_optimized.php'
test_original.php - Original implementation
<?php

function splitOutsideParenthesis(string $type): array
{
    $parts = [];
    $currentPart = '';
    $parenthesisLevel = 0;

    $typeLength = \strlen($type);
    for ($i = 0; $i < $typeLength; ++$i) {
        $char = $type[$i];

        if ('(' === $char) {
            ++$parenthesisLevel;
        } elseif (')' === $char) {
            --$parenthesisLevel;
        }

        if ('|' === $char && 0 === $parenthesisLevel) {
            $parts[] = $currentPart;
            $currentPart = '';
        } else {
            $currentPart .= $char;
        }
    }

    if ('' !== $currentPart) {
        $parts[] = $currentPart;
    }

    return $parts;
}

$testCases = [
    'string',
    'int',
    'bool',
    'array',
    'string|int',
    'string|int|bool',
    'string|int|bool|array',
    'string|(int|bool)',
    '(string|int)|bool',
    'string|(int|(bool|float))',
    '(string|int)|(bool|float)',
    'MyClass',
    'string[]',
    'int[]',
    '\\Namespace\\Class',
    'string|int|bool|array|object|resource|callable',
];

$iterations = 100000;

$start = microtime(true);

for ($i = 0; $i < $iterations; $i++) {
    foreach ($testCases as $testCase) {
        splitOutsideParenthesis($testCase);
    }
}

$end = microtime(true);
$duration = ($end - $start) * 1000;

$peakMemory = memory_get_peak_usage(true) / 1024 / 1024;

echo "Original implementation: " . number_format($duration, 2) . "ms for " . ($iterations * count($testCases)) . " operations\n";
echo "Peak memory usage: " . number_format($peakMemory, 2) . " MB\n";
test_opt1_fast_path_simple.php - Optimization 1: Fast path for simple types
<?php

function splitOutsideParenthesis(string $type): array
{
    if (!\str_contains($type, '|')) {
        return [$type];
    }
    
    $parts = [];
    $currentPart = '';
    $parenthesisLevel = 0;

    $typeLength = \strlen($type);
    for ($i = 0; $i < $typeLength; ++$i) {
        $char = $type[$i];

        if ('(' === $char) {
            ++$parenthesisLevel;
        } elseif (')' === $char) {
            --$parenthesisLevel;
        }

        if ('|' === $char && 0 === $parenthesisLevel) {
            $parts[] = $currentPart;
            $currentPart = '';
        } else {
            $currentPart .= $char;
        }
    }

    if ('' !== $currentPart) {
        $parts[] = $currentPart;
    }

    return $parts;
}

$testCases = [
    'string',
    'int',
    'bool',
    'array',
    'string|int',
    'string|int|bool',
    'string|int|bool|array',
    'string|(int|bool)',
    '(string|int)|bool',
    'string|(int|(bool|float))',
    '(string|int)|(bool|float)',
    'MyClass',
    'string[]',
    'int[]',
    '\\Namespace\\Class',
    'string|int|bool|array|object|resource|callable',
];

$iterations = 100000;

$start = microtime(true);

for ($i = 0; $i < $iterations; $i++) {
    foreach ($testCases as $testCase) {
        splitOutsideParenthesis($testCase);
    }
}

$end = microtime(true);
$duration = ($end - $start) * 1000;

$peakMemory = memory_get_peak_usage(true) / 1024 / 1024;

echo "Optimization 1 (fast path simple): " . number_format($duration, 2) . "ms for " . ($iterations * count($testCases)) . " operations\n";
echo "Peak memory usage: " . number_format($peakMemory, 2) . " MB\n";
test_opt2_fast_path_union.php - Optimization 2: Fast path for union types
<?php

function splitOutsideParenthesis(string $type): array
{
    if (!\str_contains($type, '(') && !\str_contains($type, ')')) {
        return \explode('|', $type);
    }

    $parts = [];
    $currentPart = '';
    $parenthesisLevel = 0;

    $typeLength = \strlen($type);
    for ($i = 0; $i < $typeLength; ++$i) {
        $char = $type[$i];

        if ('(' === $char) {
            ++$parenthesisLevel;
        } elseif (')' === $char) {
            --$parenthesisLevel;
        }

        if ('|' === $char && 0 === $parenthesisLevel) {
            $parts[] = $currentPart;
            $currentPart = '';
        } else {
            $currentPart .= $char;
        }
    }

    if ('' !== $currentPart) {
        $parts[] = $currentPart;
    }

    return $parts;
}

$testCases = [
    'string',
    'int',
    'bool',
    'array',
    'string|int',
    'string|int|bool',
    'string|int|bool|array',
    'string|(int|bool)',
    '(string|int)|bool',
    'string|(int|(bool|float))',
    '(string|int)|(bool|float)',
    'MyClass',
    'string[]',
    'int[]',
    '\\Namespace\\Class',
    'string|int|bool|array|object|resource|callable',
];

$iterations = 100000;

$start = microtime(true);

for ($i = 0; $i < $iterations; $i++) {
    foreach ($testCases as $testCase) {
        splitOutsideParenthesis($testCase);
    }
}

$end = microtime(true);
$duration = ($end - $start) * 1000;

$peakMemory = memory_get_peak_usage(true) / 1024 / 1024;

echo "Optimization 2 (fast path union): " . number_format($duration, 2) . "ms for " . ($iterations * count($testCases)) . " operations\n";
echo "Peak memory usage: " . number_format($peakMemory, 2) . " MB\n";
test_opt3_no_string_concat.php - Optimization 3: Eliminate string concatenation
<?php

function splitOutsideParenthesis(string $type): array
{
    $parts = [];
    $start = 0;
    $parenthesisLevel = 0;
    $length = \strlen($type);
    
    for ($i = 0; $i < $length; ++$i) {
        $char = $type[$i];

        if ('(' === $char) {
            ++$parenthesisLevel;
        } elseif (')' === $char) {
            --$parenthesisLevel;
        } elseif ('|' === $char && 0 === $parenthesisLevel) {
            $parts[] = \substr($type, $start, $i - $start);
            $start = $i + 1;
        }
    }

    if ($start < $length) {
        $parts[] = \substr($type, $start);
    }
    
    return $parts;
}

$testCases = [
    'string',
    'int',
    'bool',
    'array',
    'string|int',
    'string|int|bool',
    'string|int|bool|array',
    'string|(int|bool)',
    '(string|int)|bool',
    'string|(int|(bool|float))',
    '(string|int)|(bool|float)',
    'MyClass',
    'string[]',
    'int[]',
    '\\Namespace\\Class',
    'string|int|bool|array|object|resource|callable',
];

$iterations = 100000;

$start = microtime(true);

for ($i = 0; $i < $iterations; $i++) {
    foreach ($testCases as $testCase) {
        splitOutsideParenthesis($testCase);
    }
}

$end = microtime(true);
$duration = ($end - $start) * 1000;

$peakMemory = memory_get_peak_usage(true) / 1024 / 1024;

echo "Optimization 3 (no string concat): " . number_format($duration, 2) . "ms for " . ($iterations * count($testCases)) . " operations\n";
echo "Peak memory usage: " . number_format($peakMemory, 2) . " MB\n";
test_opt4_switch_statement.php - Optimization 4: Switch statement
<?php

function splitOutsideParenthesis(string $type): array
{
    $parts = [];
    $currentPart = '';
    $parenthesisLevel = 0;

    $typeLength = \strlen($type);
    for ($i = 0; $i < $typeLength; ++$i) {
        $char = $type[$i];

        switch ($char) {
            case '(':
                ++$parenthesisLevel;
                $currentPart .= $char;
                break;
            case ')':
                --$parenthesisLevel;
                $currentPart .= $char;
                break;
            case '|':
                if (0 === $parenthesisLevel) {
                    $parts[] = $currentPart;
                    $currentPart = '';
                } else {
                    $currentPart .= $char;
                }
                break;
            default:
                $currentPart .= $char;
                break;
        }
    }

    if ('' !== $currentPart) {
        $parts[] = $currentPart;
    }

    return $parts;
}

$testCases = [
    'string',
    'int',
    'bool',
    'array',
    'string|int',
    'string|int|bool',
    'string|int|bool|array',
    'string|(int|bool)',
    '(string|int)|bool',
    'string|(int|(bool|float))',
    '(string|int)|(bool|float)',
    'MyClass',
    'string[]',
    'int[]',
    '\\Namespace\\Class',
    'string|int|bool|array|object|resource|callable',
];

$iterations = 100000;

$start = microtime(true);

for ($i = 0; $i < $iterations; $i++) {
    foreach ($testCases as $testCase) {
        splitOutsideParenthesis($testCase);
    }
}

$end = microtime(true);
$duration = ($end - $start) * 1000;

$peakMemory = memory_get_peak_usage(true) / 1024 / 1024;

echo "Optimization 4 (switch statement): " . number_format($duration, 2) . "ms for " . ($iterations * count($testCases)) . " operations\n";
echo "Peak memory usage: " . number_format($peakMemory, 2) . " MB\n";
test_optimized.php - Final optimized implementation (all optimizations combined)
<?php

function splitOutsideParenthesis(string $type): array
{
    if (!\str_contains($type, '|')) {
        return [$type];
    }
    
    if (!\str_contains($type, '(') && !\str_contains($type, ')')) {
        return \explode('|', $type);
    }
    
    $parts = [];
    $start = 0;
    $parenthesisLevel = 0;
    $length = \strlen($type);
    
    for ($i = 0; $i < $length; ++$i) {
        $char = $type[$i];
        
        switch ($char) {
            case '(':
                ++$parenthesisLevel;
                break;
            case ')':
                --$parenthesisLevel;
                break;
            case '|':
                if (0 === $parenthesisLevel) {
                    $parts[] = \substr($type, $start, $i - $start);
                    $start = $i + 1;
                }
                break;
        }
    }
    
    if ($start < $length) {
        $parts[] = \substr($type, $start);
    }
    
    return $parts;
}

$testCases = [
    'string',
    'int',
    'bool',
    'array',
    'string|int',
    'string|int|bool',
    'string|int|bool|array',
    'string|(int|bool)',
    '(string|int)|bool',
    'string|(int|(bool|float))',
    '(string|int)|(bool|float)',
    'MyClass',
    'string[]',
    'int[]',
    '\\Namespace\\Class',
    'string|int|bool|array|object|resource|callable',
];

$iterations = 100000;

$start = microtime(true);

for ($i = 0; $i < $iterations; $i++) {
    foreach ($testCases as $testCase) {
        splitOutsideParenthesis($testCase);
    }
}

$end = microtime(true);
$duration = ($end - $start) * 1000;

$peakMemory = memory_get_peak_usage(true) / 1024 / 1024;

echo "Optimized implementation: " . number_format($duration, 2) . "ms for " . ($iterations * count($testCases)) . " operations\n";
echo "Peak memory usage: " . number_format($peakMemory, 2) . " MB\n";

@@ -1215,30 +1215,40 @@ private function verifyTypes(string $type, mixed $value, ?array &$invalidTypes =
*/
private function splitOutsideParenthesis(string $type): array
{
if (!str_contains($type, '|')) {
Copy link
Member

@dunglas dunglas Jul 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe could we even store a hardcoded list of known simple types somewhere to avoid the repeated calls to str_contains() in these cases.

Here is a list: https://www.php.net/manual/en/function.gettype.php

return [$type];
}

if (!str_contains($type, '(') && !str_contains($type, ')')) {
Copy link
Member

@dunglas dunglas Jul 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could likely optimize that too by looping over characters and check for both parenthesis. This will prevent looping two times.

A regexp to avoid the two calls to str_contains() may also be faster than the current implementation (but slower that the loop I propose).

@bendavies
Copy link
Contributor Author

bendavies commented Jul 26, 2025

i investigated @dunglas suggestion to use regex.
a pure regex solution appears to be almost twice as fast as my current solution.
I'll update this PR on Monday

Summary
  php test_original.php ran
    5.53 ± 0.15 times slower than php test_optimized_v2.php <-- pure regex, no loops
    3.12 ± 0.08 times slower than php test_optimized.php

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants