Page MenuHomePhabricator

Remaining feature parity issues between the two Cite parsers
Open, Needs TriagePublic

Description

Umbrella ticket for all (remaining) parity issues between the two Cite implementations.

Note that we are intentionally working in both directions. Sometimes it makes much more sense to clean up odd, one-off behaviors in the classic parser instead of being forced to re-implement irrelevant details in Parsoid. Still the general idea is to make Parsoid behave as close as possible to the classic parser so that users don't experience breaking changes that negatively affect the content or the established community processes.

Some examples to illustrate that point:

  • Parsoid never bothered re-implementing rarely used customization options. We ended dropping a lot of that from the classic parser instead, see T321217: [Epic] Get rid of Cite backlink formatting i18n messages that are not actually localized.
  • Practically it doesn't make much of a difference where an error message is rendered. Very early we established that Parsoid should feel free to render most errors at the very bottom of the document as part of the reference list, ignoring where the classic parser render(ed) the same message.
  • There is a cite_error message that looks harmless but became critical when the communities started to put templates in this message. We have been forced to re-implement this in Parsoid, see T372709: Missing cite error message and category.

Known issues (also see the list of subtickets):

Event Timeline

Change #1165905 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/extensions/Cite@master] Fix re-serialization of incomplete follow in <ref> tags

https://gerrit.wikimedia.org/r/1165905

Change #1165905 merged by jenkins-bot:

[mediawiki/extensions/Cite@master] Fix re-serialization of incomplete follow in <ref> tags

https://gerrit.wikimedia.org/r/1165905

It appears like Parsoid allows much more complex nesting of <ref> tags. This was intentionally blocked in the classic parser.

That's news to me. Nesting of <ref> tags is a syntactic issue only, and is routinely worked around w/ {{#tag:ref}} and other means. I am not aware of any "intentional block" here, although I'd readily believe that various wikis may have editorial policies against reference nesting. More information here would be welcome.

In any case, there is no syntactic/semantic prohibition on extensions containing extension content. If Cite wants to enforce a restriction, it can do so as part of its own extension code by tracking nesting level or some such. I strongly suspect that any attempt to do so with meet with pushback from real world usage however.

@cscott Sorry for the confusion. This was just a general observation I made while working on Sub-referencing (product board) the past months. I didn't had a specific issue in mind. We planned to discuss this with the Content-Transform-Team later, but since it's not strictly related to our current project and not blocking us it had to wait.

Let me find some examples to illustrate what I mean.

<ref name="a" />
<references>
{{#tag:ref|This is A<ref name="b">This is B<ref name="c">This is C<ref name="c">This is D</ref></ref></ref>|name=a}}
</references>

The main difference here is that the classic parser cannot even understand this because it doesn't work with a syntax tree, but Parsoid does. The classic parser stops at the first closing </ref> and ends refusing everything. Parsoid renders all 4 nested ref tags without any error.

This might just be an oversight.

<ref name="a">This is A<ref name="b" /></ref>
<references>
<ref name="b">But this is B</ref>
</references>

Here the classic parser refuses to render both ref and tracks both as errors. Parsoid renders both in the list without any visible error, but misses the footnote marker for one of the two.

<ref name="a">This is A<ref name="b">But this is B</ref></ref>
<references />

Here the classic parser also refuses to render the refs and tracks them as errors. Parsoid renders something that looks a little mangled, without tracking it as an error.

<ref name="a">This is A{{#tag:ref|This is B|name=b}}</ref>
<references />

Here Parsoid appears to ignore the inner ref while the classic parser shows an error.

When I say "intentionally blocked" I'm mostly referring to the cite_error_included_ref error message and this syntax check that is in the code since 2008. This actively blocks recursive parsing of <ref> tags.