Page MenuHomePhabricator

PageHTMLHandler: add support for getting fully annotated parsoid HTML.
Closed, ResolvedPublic

Description

The simples way to do HTML based editing is for the client to receive fully annotated parsoid HTML, modify it, and then send back the modified HTML (plus optionally the unmodified HTML, for selser). This is much simpler than the approach used by Visual Editor on WMF sites, which relies on server side stashin of content to save bandwidth.

It would be simple enough to allow the PageHTMLHandler to return fully annotated HTML for editing.

Parsoid can generate three flavors of HTML:

  • HTML suitable only for page views. No data attributes or element IDs needed.
  • with data-parsoid attributs inlined,
  • with only element IDs inlined, with data-parsoid (page bundle) stashed.

Currently, PageHTMLHandle effectively supports two flavors:

  • per default, it returns HTML with parsoid element IDs
  • if the stash parameter is set, it will also stash the page bundle in the background.

To support output of fully annotated HTML, we will need to:

  • introduce a flavor parameter. flavor=view is the default, flavor=edit is the new mode. We may want to support additional flavors in the future (e.g. a low bandwidth flavor).
  • If stash is set, that implies flavor=edit. If both are set and flavor is not "edit", an error should be returned.
  • The ETag of the fully annotated HTML must be different from the ETag of the stripped HTML.
  • Currently, the "view" flavor will return the same HTML as the "edit"+"stash" (no data-parsoid attributes). So they may have the same ETag.
  • There is a bit of confusion here: based on the intent, "stash" implied "edit". But effectively, "stah" causes the same freturn value as "view".
NOTE: In the future, the "view" flavor may support language variants via the Accept-Language header (T262593). Since currently, Parsoid doesn't support language variants, we'd have to fall back to the old parser when variant conversion is needed.
NOTE: PageHTMLHandler should be able to also return HTML for content models not supported by parsoid, at least for the "view" flavor (T311728). It should refuse to return output when asked for "edit" flavor for non-wikitext content models.
NOTE: viewing is done per page, but editing is typically done per slot. Also, HTML based editing is not supported for all kinds of content. And language variants are supported for viewing, but not for editing. Because of these differences, we may introduce a dedicated endpoint for loading HTML for editing in the future. It should however share most of the code with PageHTMLHandler internally.

Event Timeline

daniel triaged this task as Medium priority.May 24 2022, 7:59 AM
daniel renamed this task from PageHTMLHandler: support multiple flavors of HTML to PageHTMLHandler: add support for getting fully parsoid HTML..Aug 4 2022, 10:15 AM
daniel renamed this task from PageHTMLHandler: add support for getting fully parsoid HTML. to PageHTMLHandler: add support for getting fully annotated parsoid HTML..
daniel updated the task description. (Show Details)

Change 838901 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/core@master] ParsoidOutputAccess: Add support for `fragment` flavor

https://gerrit.wikimedia.org/r/838901

Change 838901 merged by jenkins-bot:

[mediawiki/core@master] ParsoidOutputAccess: Add support for `fragment` flavor

https://gerrit.wikimedia.org/r/838901