Page MenuHomePhabricator

Extend #time parser function to display time in format specific to each language
Closed, ResolvedPublic

Description

Parser function #time is able to show dates in different languages by translating month, etc. However each language prefers dates to be displayed in in a different format / order and that part is not handled at the moment by that function. That niche is filled on Commons by Module:DateI18n used on 50M pages and which was transplanted to many other projects, see Q56528363 . The capabilities of that module should be moved to #time parser function (or to some new parser function, if more convenient), since it is a basic functionality, which should be handled by MediaWiki software uniformly on all the wikis instead of local clone of a module.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Pols12 subscribed.

Added I18n since this is an issue when localizing global messages (central notices, tech news, etc.), but I don’t think it is a real code internationalization issue since there is Language::userTimeAndDate() which solves it in code.

tstarling subscribed.

Goals:

  • Reasonably compact
  • Encourage the use of the page language
  • Give access to the user language, for the convenience of Commons where user and content language are conflated
  • Give access to date, time and "both" formats

For performance, it's not ideal to give access to all languages, although I suppose it could be done if there were a use case for it.

Options:

  1. A family of parser functions supply formats
    • {{#time:{{#dateformat:}}|now}}, {{#time:{{#timeformat:}}|now}}, {{#time:{{#timeanddateformat:}}|now}}, {{#time:{{#timeanddateformat:user}}|now}}
  2. A single parser function to supply formats
    • {{#time:{{#dateformat:time}}|now}}, {{#time:{{#dateformat:date}}|now}}, {{#time:{{#dateformat:both|user}}|now}}
    • Need localisation for special parameter values
  3. A family of magic words supply formats
    • {{#time:{{DATEFORMAT}}|now}}, {{#time:{{USERDATEFORMAT}}|now}}, {{#time:{{USERTIMEANDDATEFORMAT}}|now}}
  4. Extend date formats to provide a symbolic replacement syntax
    • {{#time:%date%|now}}, {{#time:%both%|now}}, {{#time:%user-both%|now}}
    • Need localisation for the symbolic strings?
  5. A family of parser functions which format dates without a format parameter
    • {{#date:now}}, {{#timeanddate:now}}, {{#timeonly:now}}
  6. A single parser function to format dates with an optional special parameter, like option 2 but composed
    • {{#timef:now}} {{#timef:now|both}}, {{#timef:now|time|user}}

Let's explore option 6 in code.

Change #1053532 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/extensions/ParserFunctions@master] Add parser functions giving access to standard date/time formats

https://gerrit.wikimedia.org/r/1053532

user language

Although not really a blocker, please note Parsoid currently does not have the concept of user language, and will always use page language when parsing. See T85581, T4085, T322206 and comments on https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1032542.

#time allows the user to specify an arbitrary language code, so I may as well extend that feature to the new function. The target keyword is not really necessary, since omitting the parameter would have the same effect. I can split the user keyword out to a separate commit.

T85581 was filed before the port to PHP and doesn't really reflect the current state of the codebase. Parsoid is given a ParserOptions which includes user language, and that same ParserOptions will be given to the preprocessor. We have ParsoidOutputAccess::getCachedParserOutput() which splits the cache for Parsoid using the same code that we use for the old parser. As always, there are aspirations to reduce the number of options that split the cache, but that's not linked to the migration to Parsoid anymore.

I'll let the Content Transform team comment on whether they'd rather have users doing {{#timef:now||{{int:lang}}}} or {{#timef:now||user}}.

{{int:lang}} is a hack, and requires wikis creating some MediaWiki messages. See T4085: Add a {{USERLANGUAGE}} magic word

{{int:lang}} is a hack, and requires wikis creating some MediaWiki messages. See T4085: Add a {{USERLANGUAGE}} magic word

Exactly, that's the point.

Change #1053820 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] ParserTestRunner: add timezone and user language options

https://gerrit.wikimedia.org/r/1053820

Change #1053817 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/extensions/ParserFunctions@master] In #time, specify user language with special keyword

https://gerrit.wikimedia.org/r/1053817

Change #1053821 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/extensions/ParserFunctions@master] Proper timezone and user language tests

https://gerrit.wikimedia.org/r/1053821

Note: any magic words that behave differently based on user language or interface languages need to check whether they have parser cache pollution issues. A simple check: view and purge a page using English, then view the page using (uselang) a different language. Although parser cache will be split if some function is called.

It's possibel to split the parser cache by user language. All you have to do is call getUserLangObj() on the ParserOptions object. -- daniel

Change #1053532 merged by jenkins-bot:

[mediawiki/extensions/ParserFunctions@master] Add parser functions giving access to standard date/time formats

https://gerrit.wikimedia.org/r/1053532

Change #1053820 merged by jenkins-bot:

[mediawiki/core@master] ParserTestRunner: add timezone and user language options

https://gerrit.wikimedia.org/r/1053820

On Commons, the main use-case is that you are provided with a date in YYYY-MM-DD, YYYY-MM, YYYY and few other formats and a language code and need to display that date in that language. That is how it is used by c:Template:Information and other infoboxes. For last 11 years, c:Module:DateI18n (which is a rewrite of even older commons template) does that with preferred formats for each language stored at c:Data:DateI18n.tab. The code has to handle cases where format changes depending on a day (different format for 1st of each month, or sometimes 1st, 11th, 21st and 31st of each month), and some languages adding extra letters and punctuations to the date. The code uses English formatting as defaults (like YYYY for the year) and has to only store formats for languages that deviate from it.

The second use case, need by Module:Complex_date is to put the date in different grammatical case for some languages, for example locative or instrumental cases for Slavic languages or partitive case for Finnish. Translations needed by Complex_date are stored in MonthCases.tab. #time parser function can handle genitive case but other cases are also needed.

On Commons, there is little need for "now" date or figuring out user's language as those are inputs.

I learnt that the update is productive now, but user language did not work on BETA ever.

From the code I understood that a magic word is required, but even simulating MediaWiki:pfunc-user did not help.

I learnt that the update is productive now,

If you mean, the base functionality is now live in Wikimedia production, that is correct (as of this week).

but user language did not work on BETA ever.

Yes, that part of the new functionality is not yet merged, as listed on this task.

From the code I understood that a magic word is required, but even simulating MediaWiki:pfunc-user did not help.

It's not yet announced in Tech/News, and the documentation will be added first before the announcement.

The user magic might be just taken as a reminder.

Okay, then another issue:

  • pretty does not fully react on language.
  • Apparently it is always English day number (digits) plus adapted full month name: 2 August
  • That does not follow project/page language, as date both do.
  • Germans would expect 2. August with a dot after day number.
  • Month name is following supposed language.
  • Same story for Portuguese:
    • {{#timef:now|date|pt}}2 de agosto de 2024
    • {{#timef:now|pretty|pt}}2 agosto
    • Expecting a de after day number (digits).
  • Not pretty.

Change #1053817 abandoned by Tim Starling:

[mediawiki/extensions/ParserFunctions@master] In #time, specify user language with special keyword

Reason:

Superseded by Iab7fda272ec81af88c74612727ff6bed014d4a81

https://gerrit.wikimedia.org/r/1053817

  • pretty does not fully react on language.
  • Apparently it is always English day number (digits) plus adapted full month name: 2 August
  • That does not follow project/page language, as date both do.
  • Germans would expect 2. August with a dot after day number.
  • Month name is following supposed language.
  • Same story for Portuguese:
    • {{#timef:now|date|pt}}2 de agosto de 2024
    • {{#timef:now|pretty|pt}}2 agosto
    • Expecting a de after day number (digits).
  • Not pretty.

This sort of thing is out of scope. You can file a bug against core if you want changes to the formats themselves.

It's not yet announced in Tech/News, and the documentation will be added first before the announcement.

From the task resolution, I guess this is ready to be announced now/soon (when?).
Please could someone suggest some wording for the Tech News entry, and specify which documentation it should link to? (And create that documentation if it's still to-be-done.)

I imagine it might be similar to the "{{#dir}} and {{#bcp47}}" entry that we had in (top-entry) https://meta.wikimedia.org/wiki/Tech/News/2024/32 - but I'm uncertain about how to tweak that for accuracy. Thanks.

I am sorry if it was already explained above, but I am trying to understand what is the new function doing. Is there a documentation for it somewhere?

The description of the request was to move work done by Module:DateI18n lua code to Mediawiki software. Module:DateI18n is ported to 97 different projects, so ideally new interface would be compatible with the existing one. Module:DateI18n main interface is "{{#invoke:DateI18n|Date|year=...|month=...|day=...|hour=...|minute=...|second=...|tzhour=...|tzmin=...|case=...|lang=...}}" function where most of the parameters are optional. On Commons (where it was developed) the frontend is the Template:Date template where there is more documentation. It would be nice to test it with the similar testcases as used in Module:DateI18n/testcases. So how do we interact with the new function?

I began testing and by try and error figured out some rules :

  1. {{#invoke:DateI18n|Date|year=2009|month=12|day=09|hour=13|minute=20|second=17|lang=en}} is equivalent to {{#timef:2009-12-09T13:20:17|both|en}} - we call it "YMDhms" format
  2. {{#invoke:DateI18n|Date|year=2024|month=09|day=01|lang=en}} is equivalent to {{#timef:2024-09-01|date|en}} - "YMD" format
  3. {{#invoke:DateI18n|Date|month=09|day=01|lang=en}} is equivalent to {{#timef:0000-09-01|pretty|en}} - "MD" format

I did not see any way to ask for localized date in the following formats which are supported by DateI18n:

  • year only format - we call it "Y" format
  • year-month format - "YM" format
  • date + hour:minute format - or "YMDhm" format
  • month-day plus time - or "MDhms" format

I wrote some unit testing comparing 3 available formats at c:Module:DateI18n/timef_test and got 42 out of 139 tests correct (30%). The formats we use on Commons are being tweaked, adjusted and argued over since 2009 when Template:Date was introduced with the purpose of localizing dates displayed by c:Template:Information template. They are all vetted by the native speakers and were quite stable last decade or so. Any chance of synching MediaWiki version with Commons and adding those 4 other formats, before we roll it out?

  1. {{#invoke:DateI18n|Date|year=2009|month=12|day=09|hour=13|minute=20|second=17|lang=en}} is equivalent to {{#timef:2009-12-09T13:20:17|both|en}} - we call it "YMDhms" format

Actually, both is equivalent to the YMDhm format, i.e. without seconds, in almost all languages – the only exception I could find is Finnish, where users can choose a date format with seconds as well (but the default format is still one with a minute-level precision, and the parser function always uses the default format).

  1. {{#invoke:DateI18n|Date|year=2009|month=12|day=09|hour=13|minute=20|second=17|lang=en}} is equivalent to {{#timef:2009-12-09T13:20:17|both|en}} - we call it "YMDhms" format

Actually, both is equivalent to the YMDhm format, i.e. without seconds, in almost all languages – the only exception I could find is Finnish, where users can choose a date format with seconds as well (but the default format is still one with a minute-level precision, and the parser function always uses the default format).

You are right, I corrected my unit testing page. So I guess we are missing support for "Y", "YM", "YMDhms" and "MDhms" date formats.

I forgot to mention that DateI18n also supports different grammatical cases for a handful of languages (as described in Template:Date template documentation). So without "case" parameter DateI18n returns the date in whichever case given language dictates, and with "case" parameter it will return case in specified case. Most of the time it is nominative or genitive case but for some Slavic languages and the Finish language other cases are allowed. That functionality would also be needed if we going to replace DateI18n .

Change #1075031 had a related patch set uploaded (by Ejegg; author: Ejegg):

[integration/config@master] Remove tests for ParserFunctions master against core LTS

https://gerrit.wikimedia.org/r/1075031

Change #1075031 merged by jenkins-bot:

[integration/config@master] Zuul, jjb: [mediawiki/extensions/ParserFunctions] Remove master tests for LTS

https://gerrit.wikimedia.org/r/1075031

Change #1053821 merged by jenkins-bot:

[mediawiki/extensions/ParserFunctions@master] Proper timezone tests

https://gerrit.wikimedia.org/r/1053821

I just saw it in the Tech News. Thank you. Could you please write a manual? Because I've tried

{{#timef:2000-09-01|F|en}}

and get an error message. Actually, 49 from 53 standard tests failed, and the rest four showed wrong date.
UPD: I've tried to understand it by myself, and this is what I get:

  1. The first parameter is a regular datetime parameter from #time and #timel. The default value is "now"
  2. The second parameter is "both", "date", "time" or "pretty". The default value is "both".
  3. The third parameter is a language code or "user". The default value is "user".
  4. For the local time #timefl should be used instead.

I do not know if this is right, and if there is something else. Also, pretty does not work in a lot of languages.
UPD: And also, there is a total directionality problem. Until it is fixed, every usage of #timef should be wrapped by

<span dir="{{#dir:<third parameter>}}">...</span>

Until the bug will be fixed, I've created this wrapper template.

I just saw it in the Tech News. Thank you. Could you please write a manual?

Done.

  1. The third parameter is a language code or "user". The default value is "user".

The "user" keyword was not implemented. Instead we added {{USERLANGUAGE}} (T4085). The default value is the page language, not the user language.

I just saw it in the Tech News. Thank you. Could you please write a manual?

Done.

  1. The third parameter is a language code or "user". The default value is "user".

The "user" keyword was not implemented. Instead we added {{USERLANGUAGE}} (T4085). The default value is the page language, not the user language.

Great. Thanks. What about "pretty"? There are still enough languages that show not "pretty" and not "date", but just weird irrelevant text.
And what about the directionality?

What about "pretty"? There are still enough languages that show not "pretty" and not "date", but just weird irrelevant text.

I think it’s because those languages have no pretty localization, so they fall back to the US English order (but with translated month – and year and day, if applicable – names), although mentioning a few specific examples could help better understanding the situation. (Of course, if it’s really the missing localization, the solution is adding it. Maybe a new task could be created in which people speaking different languages could collect the correct formats – I’d be happy to fix them as a developer, but I have no idea what the correct format is in Hebrew, Chinese or Czech.)

And what about the directionality?

I’d assume most of the time this parser function will be used in contexts that have the given language already set (a table that has table-level lang and dir attributes, running text in that language etc.), so the extra markup would just clutter the HTML code without any benefit. Where do you expect to use a free-standing date, without, for example, a label that tells what on that date happens?

What about "pretty"? There are still enough languages that show not "pretty" and not "date", but just weird irrelevant text.

I think it’s because those languages have no pretty localization, so they fall back to the US English order (but with translated month – and year and day, if applicable – names), although mentioning a few specific examples could help better understanding the situation. (Of course, if it’s really the missing localization, the solution is adding it. Maybe a new task could be created in which people speaking different languages could collect the correct formats – I’d be happy to fix them as a developer, but I have no idea what the correct format is in Hebrew, Chinese or Czech.)

Yes, the English format is exactly the problem. A lot of languages have different formats, and the English one makes no sence. Is there a way to take instead for any language a "date" format as a fallback if there is no "pretty"? I could suggest even removing the year, but I don't, it can be dangerous, because of a need to locate and remove extra spaces, commas, semicolons and so on. And anyway, creation of such subtask while inviting on Tech News people knowing different languages to update the formats could be a good thing to do.

And what about the directionality?

I’d assume most of the time this parser function will be used in contexts that have the given language already set (a table that has table-level lang and dir attributes, running text in that language etc.), so the extra markup would just clutter the HTML code without any benefit. Where do you expect to use a free-standing date, without, for example, a label that tells what on that date happens?

I see. For example, if this date is everything that exists in some table cell.

Please see my conversation above at Sep 10 2024, 03:46.

And what about the directionality?

I’d assume most of the time this parser function will be used in contexts that have the given language already set (a table that has table-level lang and dir attributes, running text in that language etc.), so the extra markup would just clutter the HTML code without any benefit. Where do you expect to use a free-standing date, without, for example, a label that tells what on that date happens?

I see. For example, if this date is everything that exists in some table cell.

If there is a table cell that requires a single date in foreign representation,

  1. the directionality inside a table cell does not matter,
  2. those who put some content into a table cell do know that the output is supposed to use a different scripting than the general page context, and it might be wrapped by templates or cell attributes.

I can understand your 2., but I don't agree with 1., because wrong directionalty shows something like "2024 in October, year 14".

If the entire content of a table cell is that parser function, the UBA does not influence anything.

  • dir or <bdi> is important if a text fragment is embedded within inline fluent text in other directionality, especially if symbols or digits are direct neighbours of the insertion.
  • In this case the symbols might be thrown to the wrong side, since they do not bear any directionality (letters do have a knowledge of themselves whether they are LTR or RTL). UBA might think that they should precede since they are following RTL.
  • If the column width is greater than the parser function result, you might want to start an RTL date at the right cell border within a LTR page. However, such style="text-align:right" is not a property of the date, but needs to be applied to the entire table cell as block element.
  • If inside the cell the parser function result is a mixture of letters with directionality and things with no directionality, UBA is made to arrange that in the appropriate order (as is).

I thing we're talking about different things. Here you are, on rtl page:

Screenshot_20241015_122122_Samsung Internet.png (131×831 px, 35 KB)

I see. Digits before and after, UBA is confused. Perhaps improved one day; an entire block of three groups only (digits letters digits) must not adjust order. There are older and newer versions of UBA, and they might depend on browsers.

I guess it would be sufficient in such cases to add the dir to the table cell:

before ||dir="ltr"| {{#timef:now|date|en}} || next

This is much shorter than introducing a <span> element.

Remark: UBA shall modify if there are LettersLTR Digits LettersRTL Digits LettersLTR or the other way around LettersLTR Digits LettersLTR Digits LettersLTR as in your example. Then it is ambiguous to which fragment the digits belong and shall be rendered before or after. However, if there is no change of directionality since all letters in block are in the same direction no modification by UBA shall happen.

I guess it would be sufficient in such cases to add the dir to the table cell:

before ||dir="ltr"| {{#timef:now|date|en}} || next

This is much shorter than introducing a <span> element.

Sure, I've started with it and changed to make it more visual to you.

Is there a way to take instead for any language a "date" format as a fallback if there is no "pretty"?

I fear the only way to do so is copying and pasting the format in all languages. The format comes from rMW languages/messages/MessagesEn.php:157-189 (at 73bb50edb4ee), and the comment there says all subclasses, i.e. languages, automatically inherit it.

I see. But how about to check if there is such a field in the correspondent php file?

I'm arriving from this conversation, where pretty is discussed.

The documentation says:

Not all languages support this; if it is not supported, the "date" format is used.

My questions are:

  • As this limitation is documented, shouldn't the case of "pretty" internationalization be taken as a separate task?
  • What is the process to have the "pretty" option taken into account by a language?

if it is not supported, the "date" format is used.

This is wrong. Many languages that has no "pretty" defined use "j F" that has no sence.

if it is not supported, the "date" format is used.

This is wrong. Many languages that has no "pretty" defined use "j F" that has no sence.

Do you mean that date is not used when pretty is not defined? Or something else?

Do you mean that date is not used when pretty is not defined? Or something else?

Exactly.

In T223772#10239463, Tacsipacsi wrote:

Is there a way to take instead for any language a "date" format as a fallback if there is no "pretty"?

I fear the only way to do so is copying and pasting the format in all languages. The format comes from rMW languages/messages/MessagesEn.php:157-189 (at 73bb50edb4ee), and the comment there says all subclasses, i.e. languages, automatically inherit it.

Thanks.

I'll let @tstarling reply on the pretty fallback issue. This wasn't documented for no reason.

Can we have a different name for this nondescript pretty?
If that's day+month call it daymonth or dm.
If people want month+year, introduce monthyear or my.
With pretty, you'll have one thing in a template for one language, and another thing in the template for another language (on Commons, for example), depending on what people decide is pretty in their language.

Would words other than date/time/both/pretty work? MessagesPl.php define a monthonly option, which is month+year.

Also, since it's so hard (time consuming) to get anything changed in MessagesXX.php, I suggest you ask our most active communities to propose what these should be, and have someone upload and +2-approve the patches on gerrit.