Skip to content

[Yaml] Remove escaping regex #19782

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 32 additions & 19 deletions src/Symfony/Component/Yaml/Escaper.php
Original file line number Diff line number Diff line change
Expand Up @@ -21,25 +21,33 @@
*/
class Escaper
{
// Characters that would cause a dumped string to require double quoting.
const REGEX_CHARACTER_TO_ESCAPE = "[\\x00-\\x1f]|\xc2\x85|\xc2\xa0|\xe2\x80\xa8|\xe2\x80\xa9";
/**
* Characters that would cause a dumped string to require double quoting.
*
* @internal
*/
const CHARACTER_TO_ESCAPE_MASK = "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f\xc2\xe2";

// Mapping arrays for escaping a double quoted string. The backslash is
// first to ensure proper escaping because str_replace operates iteratively
// on the input arrays. This ordering of the characters avoids the use of strtr,
// which performs more slowly.
private static $escapees = array('\\', '\\\\', '\\"', '"',
"\x00", "\x01", "\x02", "\x03", "\x04", "\x05", "\x06", "\x07",
"\x08", "\x09", "\x0a", "\x0b", "\x0c", "\x0d", "\x0e", "\x0f",
"\x10", "\x11", "\x12", "\x13", "\x14", "\x15", "\x16", "\x17",
"\x18", "\x19", "\x1a", "\x1b", "\x1c", "\x1d", "\x1e", "\x1f",
"\xc2\x85", "\xc2\xa0", "\xe2\x80\xa8", "\xe2\x80\xa9");
private static $escaped = array('\\\\', '\\"', '\\\\', '\\"',
'\\0', '\\x01', '\\x02', '\\x03', '\\x04', '\\x05', '\\x06', '\\a',
'\\b', '\\t', '\\n', '\\v', '\\f', '\\r', '\\x0e', '\\x0f',
'\\x10', '\\x11', '\\x12', '\\x13', '\\x14', '\\x15', '\\x16', '\\x17',
'\\x18', '\\x19', '\\x1a', '\\e', '\\x1c', '\\x1d', '\\x1e', '\\x1f',
'\\N', '\\_', '\\L', '\\P');
private static $escapees = array(
'\\', '\\\\', '\\"', '"', '/',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW this fixes a "bug", / was missing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be done in 2.7 then?

Copy link
Contributor Author

@GuilhemN GuilhemN Sep 2, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works in 2.7, it should be escaped but that's not mandatory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so why doing it then?

Copy link
Contributor Author

@GuilhemN GuilhemN Dec 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually that's not clear what we should do here. Non-printable characters must be escaped but / is a printable character (the spec says \/ is available for json compatibily), so I guess it's up to us to decide what to do.

For readability / should indeed be better while \/ is more compliant with json. If you prefer / I'll revert this change.

"\x00", "\x01", "\x02", "\x03", "\x04", "\x05", "\x06", "\x07",
"\x08", "\x09", "\x0a", "\x0b", "\x0c", "\x0d", "\x0e", "\x0f",
"\x10", "\x11", "\x12", "\x13", "\x14", "\x15", "\x16", "\x17",
"\x18", "\x19", "\x1a", "\x1b", "\x1c", "\x1d", "\x1e", "\x1f",
"\xc2\x85", "\xc2\xa0", "\xe2\x80\xa8", "\xe2\x80\xa9",
);
private static $escaped = array(
'\\\\', '\\"', '\\\\', '\\"', '\\/',
'\\0', '\\x01', '\\x02', '\\x03', '\\x04', '\\x05', '\\x06', '\\a',
'\\b', '\\t', '\\n', '\\v', '\\f', '\\r', '\\x0e', '\\x0f',
'\\x10', '\\x11', '\\x12', '\\x13', '\\x14', '\\x15', '\\x16', '\\x17',
'\\x18', '\\x19', '\\x1a', '\\e', '\\x1c', '\\x1d', '\\x1e', '\\x1f',
'\\N', '\\_', '\\L', '\\P',
);

/**
* Determines if a PHP value would require double quoting in YAML.
Expand All @@ -50,7 +58,7 @@ class Escaper
*/
public static function requiresDoubleQuoting($value)
{
return preg_match('/'.self::REGEX_CHARACTER_TO_ESCAPE.'/u', $value);
return strlen($value) !== strcspn($value, self::CHARACTER_TO_ESCAPE_MASK);
}

/**
Expand All @@ -76,13 +84,18 @@ public static function requiresSingleQuoting($value)
{
// Determines if a PHP value is entirely composed of a value that would
// require single quoting in YAML.
if (in_array(strtolower($value), array('null', '~', 'true', 'false', 'y', 'n', 'yes', 'no', 'on', 'off'))) {
if (in_array(strtolower($value), array('', 'null', '~', 'true', 'false', 'y', 'n', 'yes', 'no', 'on', 'off'))) {
return true;
}

// First character is an indicator
// @see http://yaml.org/spec/1.2/spec.html#c-indicator
if (1 === strspn($value[0], '-?&*!|<>\'"=%@`')) {
return true;
}

// Determines if the PHP value contains any single characters that would
// cause it to require single quoting in YAML.
return preg_match('/[ \s \' " \: \{ \} \[ \] , & \* \# \?] | \A[ \- ? | < > = ! % @ ` ]/x', $value);
// Contains spaces or ambigous characters
return strlen($value) !== strcspn($value, "\011\n\013\014\r :{}[],#");
}

/**
Expand Down
2 changes: 1 addition & 1 deletion src/Symfony/Component/Yaml/Inline.php
Original file line number Diff line number Diff line change
Expand Up @@ -684,7 +684,7 @@ public static function evaluateBinaryScalar($scalar)

private static function isBinaryString($value)
{
return !preg_match('//u', $value) || preg_match('/[^\x09-\x0d\x20-\xff]/', $value);
return !preg_match('//u', $value) || preg_match('/[^\x00-\x1f\x20-\xff]/', $value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because these caracters are escapable in double quoted strings

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay, maybe we should add a comment with some reference to an external source or something like that to help understanding the code.

Copy link
Contributor Author

@GuilhemN GuilhemN Sep 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in fact i'm not even sure we should use !!binary when dumping, all characters are supported in double-quoted scalars anyway.
I'll try to find what the spec says about that.

}

/**
Expand Down
50 changes: 26 additions & 24 deletions src/Symfony/Component/Yaml/Tests/DumperTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ public function testIndentationInConstructor()
$expected = <<<'EOF'
'': bar
foo: '#bar'
'foo''bar': { }
foo'bar: { }
bar:
- 1
- foo
Expand Down Expand Up @@ -86,7 +86,7 @@ public function testSetIndentation()
$expected = <<<'EOF'
'': bar
foo: '#bar'
'foo''bar': { }
foo'bar: { }
bar:
- 1
- foo
Expand Down Expand Up @@ -133,15 +133,15 @@ public function testSpecifications()
public function testInlineLevel()
{
$expected = <<<'EOF'
{ '': bar, foo: '#bar', 'foo''bar': { }, bar: [1, foo], foobar: { foo: bar, bar: [1, foo], foobar: { foo: bar, bar: [1, foo] } } }
{ '': bar, foo: '#bar', foo'bar: { }, bar: [1, foo], foobar: { foo: bar, bar: [1, foo], foobar: { foo: bar, bar: [1, foo] } } }
EOF;
$this->assertEquals($expected, $this->dumper->dump($this->array, -10), '->dump() takes an inline level argument');
$this->assertEquals($expected, $this->dumper->dump($this->array, 0), '->dump() takes an inline level argument');

$expected = <<<'EOF'
'': bar
foo: '#bar'
'foo''bar': { }
foo'bar: { }
bar: [1, foo]
foobar: { foo: bar, bar: [1, foo], foobar: { foo: bar, bar: [1, foo] } }

Expand All @@ -151,7 +151,7 @@ public function testInlineLevel()
$expected = <<<'EOF'
'': bar
foo: '#bar'
'foo''bar': { }
foo'bar: { }
bar:
- 1
- foo
Expand All @@ -166,7 +166,7 @@ public function testInlineLevel()
$expected = <<<'EOF'
'': bar
foo: '#bar'
'foo''bar': { }
foo'bar: { }
bar:
- 1
- foo
Expand All @@ -185,7 +185,7 @@ public function testInlineLevel()
$expected = <<<'EOF'
'': bar
foo: '#bar'
'foo''bar': { }
foo'bar: { }
bar:
- 1
- foo
Expand Down Expand Up @@ -257,23 +257,25 @@ public function testEscapedEscapeSequencesInQuotedScalar($input, $expected)
public function getEscapeSequences()
{
return array(
'null' => array("\t\\0", '"\t\\\\0"'),
'bell' => array("\t\\a", '"\t\\\\a"'),
'backspace' => array("\t\\b", '"\t\\\\b"'),
'horizontal-tab' => array("\t\\t", '"\t\\\\t"'),
'line-feed' => array("\t\\n", '"\t\\\\n"'),
'vertical-tab' => array("\t\\v", '"\t\\\\v"'),
'form-feed' => array("\t\\f", '"\t\\\\f"'),
'carriage-return' => array("\t\\r", '"\t\\\\r"'),
'escape' => array("\t\\e", '"\t\\\\e"'),
'space' => array("\t\\ ", '"\t\\\\ "'),
'double-quote' => array("\t\\\"", '"\t\\\\\\""'),
'slash' => array("\t\\/", '"\t\\\\/"'),
'backslash' => array("\t\\\\", '"\t\\\\\\\\"'),
'next-line' => array("\t\\N", '"\t\\\\N"'),
'non-breaking-space' => array("\t\\�", '"\t\\\\�"'),
'line-separator' => array("\t\\L", '"\t\\\\L"'),
'paragraph-separator' => array("\t\\P", '"\t\\\\P"'),
'empty string' => array('', "''"),
'null' => array("\x0", '"\\0"'),
'bell' => array("\x7", '"\\a"'),
'backspace' => array("\x8", '"\\b"'),
'horizontal-tab' => array("\t", '"\\t"'),
'line-feed' => array("\n", '"\\n"'),
'vertical-tab' => array("\v", '"\\v"'),
'form-feed' => array("\xC", '"\\f"'),
'carriage-return' => array("\r", '"\\r"'),
'escape' => array("\x1B", '"\\e"'),
'space' => array(' ', "' '"),
'double-quote' => array('"', "'\"'"),
'slash' => array('/', '/'),
'backslash' => array('\\', '\\'),
'next-line' => array("\xC2\x85", '"\\N"'),
'non-breaking-space' => array("\xc2\xa0", '"\\_"'),
'line-separator' => array("\xE2\x80\xA8", '"\\L"'),
'paragraph-separator' => array("\xE2\x80\xA9", '"\\P"'),
'colon' => array(':', "':'"),
);
}

Expand Down