Page MenuHomePhabricator

Remove "<a href" from licensing messages in WikimediaMessages
Closed, ResolvedPublicBUG REPORT

Description

In the WikimediaMessages extension, I've found seven messages that deal with licenses and include <a href:

  1. wikimedia-copyright
  2. wikimedia-commons-copyright
  3. mediawiki.org-copyright
  4. wikidata-copyright
  5. wikifunctions-site-footer-copyright
  6. wikifunctions-edit-copyrightwarning-function
  7. wikifunctions-edit-copyrightwarning-implementation

(There's also readinglists-import-app-with-link, but there's a separate task for it: T360394.)

Translations of messages with <a href have to be manually reviewed when they are imported from translatewiki to Gerrit, which is time-consuming for the translatewiki maintainers and error-prone, too. It would be nice to get rid of them.

I don't know what's the reason for using <a href and not the common MediaWiki external or internal links there. I guess it has something to do with the special way in which copyright messages are parsed, but I don't know the details. With my naïve git blame skills, I found this patch from 2009 by @bvibber in WikimediaMessages, but the actual reason for the need for <a href is probably even older.

I am tagging a few people who appear to have been involved with maintaining those things in WikimediaMessages over the years, but perhaps you have a better idea of someone to ask. It would be really nice to get rid of those and switch to using something more standard and secure. Thanks to anyone who can help with this. <3

Event Timeline

I am not familiar with the origin story - and my hunch is the capabilities of the system messages has evolved over the years. My experience with system messages is far less technical than others and more admin-based experience. With that experience in mind, I have certainly seen (as recently as today) a mix usage of interwiki and external link setups used pretty interchangeably across (I think) each of these messages. I would look to engineers for more official technical answer, but observationally it appears that either route is today supported in system messages such than updates would be useful. You can see a smattering of how local wikis have modified these messages (and used a mix of link types) with Global Search, example: Wikimedia-copyright.

Also, thank you for your work on this. I empathize with the need to do some historical digging to sort out capabilities of these messages, and appreciate your efforts to get things more synced up and "modernized".

T45646: "MediaWiki:Copyright" message allows raw HTML seems relevant here (still open from Jan 2013), at least as the underlying issue. Which resulted in https://www.mediawiki.org/wiki/Manual:$wgRawHtmlMessages being added into 1.32.

For this specific issue, Skin::getCopyright does send the message through the parser, so I think the only reason not to use wikitext is the rel="license" attributes on the links, right? That shouldn't be too difficult to solve.

Quite a lot of history in there!

And https://gerrit.wikimedia.org/r/c/mediawiki/core/+/449689 was an attempt at a fix for this, just never got reviewed/merged

T45646: "MediaWiki:Copyright" message allows raw HTML seems relevant here (still open from Jan 2013), at least as the underlying issue. Which resulted in https://www.mediawiki.org/wiki/Manual:$wgRawHtmlMessages being added into 1.32.

For this specific issue, Skin::getCopyright does send the message through the parser, so I think the only reason not to use wikitext is the rel="license" attributes on the links, right? That shouldn't be too difficult to solve.

Quite a lot of history in there!

Um... is rel="license" really the only reason?

I see that the English Wikipedia uses it locally. Is it actually useful there?

If it's useful there, why don't all the wikis and languages use it?

And if it's not useful, can we remove it, and then remove the raw HTML? :)

If it's useful there, why don't all the wikis and languages use it?

And if it's not useful, can we remove it, and then remove the raw HTML? :)

I don't think there's a singular voice who could unilaterally make that call. What most likely needs to happen is a public gerrit patch that removes the html and solicits concerns via code review.

I think the better solution would be to just make MW generate the rel link. HTML should not be in messages, but that doesn't mean we shouldn't use it all.

I'm fairly sure that this message is raw html for historical reasons and the microformat stuff was added later.

That said, is the review really that problematic? The text of these messages have legal effect over how the site is licensed. It seems like changes to them would be something we would want to review carefully for legal reasons.

That said, is the review really that problematic? The text of these messages have legal effect over how the site is licensed. It seems like changes to them would be something we would want to review carefully for legal reasons.

The review is a time-consuming step of a process that should be mostly automatic. And the considerations of security and correctness are not really working here: in the current situation, it's practically always a false positive, which makes the review not only time-consuming, but also pointless.

If there was actually a serious chance that the review could catch real problems with security or functionality, it would be justified. But currently, the only reason seems to be that these messages allow <a href in the first place. And if the only reason for allowing <a href is the need to use rel=, then it requires fixing: either making a different way to add rel= if it's truly needed, or removing the raw HTML support entirely.

All subtasks resolved.

All subtasks resolved.

❤️ ❤️ ❤️ ❤️ ❤️