-
-
Notifications
You must be signed in to change notification settings - Fork 9.7k
Closed
Labels
BugGood first issueIdeal for your first contribution! (some Symfony experience may be required)Ideal for your first contribution! (some Symfony experience may be required)Status: Needs Review
Description
Symfony version(s) affected
5.4.37
Description
The Crawler
encodes html entities before parsing it in
private function convertToHtmlEntities(string $htmlContent, string $charset = 'UTF-8'): string |
<script>
and </script>
).
How to reproduce
$crawler = new Crawler();
$crawler->addContent('<!doctype html><html><script>var foo = "bär";</script></html>', 'text/html; charset=UTF-8');
echo $crawler->filterXPath('//script')->text();
// output: var foo = "bär";
// expected: var foo = "bär";
Possible Solution
I’m not sure what’s the best way to fix it as convertToHtmlEntities()
cannot distinguish between outside and inside <script>
. But as it seems that convertToHtmlEntities()
is only there to fix issues with the libxml parser, maybe it can be skipped if the Masterminds\HTML5
is used?
Additional Context
No response
Metadata
Metadata
Assignees
Labels
BugGood first issueIdeal for your first contribution! (some Symfony experience may be required)Ideal for your first contribution! (some Symfony experience may be required)Status: Needs Review