Jump to content

Extension:Chart/Transforms

From mediawiki.org

To maximize the ability of editors to work with complex data sets, we have extended Extension:Chart so that tabular datasets can be transformed, modified, clipped, or generated from Lua code, using the same underlying Extension:Scribunto engine that drives complex templates with the {{#invoke:}} parser function.

Chart format descriptions can include a transform which refers to a Lua Module: page and function, and passes some key-value parameters as an associative array.

The source data set is translated to Lua objects and passed in, along with the arguments, and the transform may modify or replace the entire tabular data set as it wishes.

Loading of additional data modules is recorded for cache invalidation purposes, so a change to the referenced module or any data pages will cause re-rendering of pages with the chart.

Developer internals

[edit]

See Extension:JsonConfig/Transforms for internals documentation or to add related support to another data type / output method.

Editor usage

[edit]

Invocation

[edit]

Currently, transforms for Charts must be set up in the Data:.chart format description page, like so:

{
    "license": "CC0-1.0",
    "version": 1,
    "type": "bar",
    "xAxis": {
        "title": { "en": "Day" }
    },
    "yAxis": {
        "title": { "en": "Temperature" }
    },
    "transform": {
        "module": "Weekly average temperature chart",
        "function": "convert_temps",
        "args": {
            "units": "C"
        }
    },
    "source": "Sample weekly temperature dataset.tab"
}

The module and function parameters are equivalent to the first two parameters to Scribunto 's {{#invoke:}} and refer to the Module: page with the source code, and the specific function to invoke.

Important: like the Data: pages, the Lua Module: code will be loaded from and run in the context of the centralized data store wiki (eg, Wikimedia Commons).

This means whichever wiki you're rendering onto, you'll be running the code in a centralized place -- so modules can be shared across projects and languages, and should consider good practices for reuse and localization.

Arguments are name-value string pairs. If passing numbers around, be sure to convert them suitably if necessary.

Arguments can be overridden on the {{#chart:}} invocation by prefixing them with "arg" like arg:name=value; this allows using different parameters for each chart invocation with the same format, which we expect to be useful for a lot of cases with complex datasets.

<!-- Default Celsius -->
{{#chart:Weekly average temperature chart.chart}}
<!-- Override units to Fahrenheit -->
{{#chart:Weekly average temperature chart.chart|arg:units=F}}

Renderings of non-transformed C vs transformed F chart using same sample temperature data set:

Chart in celsius -> Chart in fahrenheit

Code layout

[edit]

You can name your functions anything you like, and can include related functions in the same module or even use a module to provide both template functions and data transform functions. We're flexible!

Your transform function should take two arguments: a tabular data object converted from JSON, and an arguments list taken from the invocation which will carry key-value pairs. Return a modified tabular data object (it may be the input object after modification, or a new object in the same layout).

local p = {}

local function celsius_to_fahrenheit(val)
	return val * 1.8 + 32
end

--
-- input data:
-- * tabular JSON strcuture with 1 label and 2 temp rows stored in C
--
-- arguments:
-- * units: "F" or "C"
--
function p.convert_temps(tab, args)
	if args.units == "C" then
		-- Stored data is in Celsius
		return tab
	elseif args.units == "F" then
		-- Have to convert if asked for Fahrenheit
		for _, row in ipairs(tab.data) do
			-- first column is month
			row[2] = celsius_to_fahrenheit(row[2])
			row[3] = celsius_to_fahrenheit(row[3])
		end
		return tab
	else
		error("Units must be either 'C' or 'F'")
	end
end

return p

Data formatting

[edit]

See Help:Tabular data for general documentation on the tabular data format. The transform will work with a Lua transformation of the data, meaning objects and 0-based arrays will appear as Lua tables with either string keys or numeric 1-based indexes. They will be transformed back to JSON on output for handling by the renderer.

Example small dataset:

{
    "license": "CC0-1.0",
    "description": {
        "en": "Sample monthly temperature data in degrees C"
    },
    "schema": {
        "fields": [
            {
                "name": "month",
                "type": "localized",
                "title": { "en": "Month" }
            },
            {
                "name": "low",
                "type": "number",
                "title": { "en": "Low temp" }
            },
            {
                "name": "high",
                "type": "number",
                "title": { "en": "High temp" }
            }
        ]
    },
    "data": [
        [ { "en": "January" }, 5, 20 ],
        [ { "en": "July" }, 15, 30 ]
    ]
}

Beware that data format validation will be applied after your transform, so any invalid format should be called out with an error if you run through the actual transform pipeline.

Localized strings

[edit]

Localizable strings in the { "lang": "string" } format are preserved in this handling, and if you return a multilingual string it will be localized as best as can for the appropriate context of the output on the rendering end.

However if it is expensive to fetch all conceivable strings, it should be acceptable to look up only the current page view language and include it.

Null/nil

[edit]

Note that Lua does not allow storing a nil value in a table; this means that JSON tables containing null values may require careful handling by Lua transform code.

If your data set requires working with empty data cells for which null sounds appropriate in the JSON, be careful in iterating over rows.

For instance you can iterate using the tab.schema.fields list, which always contains every column, and use the array indexes as you got them rather than appending with table.insert:

proc sum(tab, args)
    -- Append a sum field
    table.insert(tab.schema.fields, {
        ["name"] = "sum",
        ["type"] = "number",
        ["title"] = {
            ["en"] = "Sum"
        }
    })
    local sum_index = #tab.schema.fields

    for i, row in ipairs(tab.data) do
        local sum = 0
        -- iterate over schema.fields, not row which
        -- may have "holes" in it
        for j, field in ipairs(tab.schema.fields) do
            if row[j] then
                sum = sum + row[j]
            end
        end

        -- Do not use table.insert here!
        -- It could go in the wrong column due to adjacent nils.
        row[sum_index] = sum
    end
end

Loading additional data sets

[edit]

Additional Data: pages may be loaded via mw.ext.data.get() ; note that if you pass the optional language parameter as _ you'll get the full multilingual strings, otherwise it'll pare them down to just the rendering language.

Try not to load excessive additional data, as it may increase runtime and resource usage.

You may also load additional Lua code or data modules, and they will be recorded for cache invalidation handling.

External data sources

[edit]

Anything you can run from a Lua module invoked from a template, you can run here -- however be aware that this does not currently include interfaces to fetch from Wikidata Query Service or from RESTbase, so older Graphs usages that require them are not yet ready to be ported.

There is some limited interface for data fetches from Wikidata but this is likely of limited use for charts for now.

Future Lua-facing APIs are possible for other types of lookups, such as into RESTbase or via SPARQL queries, and we hope to be able to tackle multiple such features in the future through Community Wishlist projects.

Performance

[edit]

Currently the transform and the chart render are run fresh on every page parse, and the resulting output saved into the parser cache. If transforms are found to be unexpectedly expensive we may need to add more aggressive caching and/or resource limits in place.

Assume that speed and memory are constrained and you should aim to conserve them for the best reader and editor experience alike. There are hard limits on memory usage and processing time, which will be enforced similarly as for template Lua code.

Remember: limits for memory and CPU time are the same as for Lua modules used in templates. Using too much memory or running too long will result in the script being canceled and an error message being shown on the page.

Note that input Data: pages are restricted in production to 2 megabytes, but the chart renderer may limit input data size further -- final size limits are to be determined, but if you find youself hitting the limits consider decimating a large data set to fewer data points.

Remember: there's a strict limit on chart input data, as data sets must not only render fast on the server, they are sent to the client for interactive rendering.

For an example if a tabular data set has 24k rows and will be rendered to a chart for mobile and desktop computers, you have many more data points than pixels -- decimating to 1 out of 10 data points either in the input data set or through a transform should reduce processing time. (See the example under #Decimating input data.)

Testing hints

[edit]

Lua runtime errors should be returned through the pipeline and reported when rendering is attempted. To iterate more quickly you can test your code at the transform or bare Lua level.

Special:ApiSandbox

[edit]

You can test the actual JSON-transform pipeline with Commons:Special:ApiSandbox pointed at action=jsontransform, which will produce nicely formatted output from actual input pages.

Debug console

[edit]

You can wrap a test harness around your module to call it from the Lua editor debugger console during preview, in a pinch:

local p = {}

-- Your fancy transform code here
function p.transform(tab, args)
    return tab
end

-- Put p.test() in the debug console while you're editing the code to test
-- with a specific sample data set / args
function p.test(func, args, tabname, lang)
    -- pass "_" for lang to get the raw multilingual source data
    local tab = mw.ext.data.get(tabname or "Chart Example Data.tab", lang or "_")
    local args = args or {
       ["key"] = "value"
    }
    tab = p[func or 'transform'](tab, args)
    return mw.dumpObject(tab)
end

return p

Real World Examples

[edit]

Chart transform modules can be found at commons:Category:Chart transform modules.

Chart pages that include transforms can be added to commons:Category:Transformed charts.

Climate example

[edit]

commons:Data:Climate_Paris.tab has a lot of interesting data in it but unfortunately if we render it as a chart directly the chart itself is not useful:

A whole bunch of things that shouldn't be on same Y axis (mm, days, celcius)Month of the year-50050100150200250JanuaryMaySeptemberRecord high (°C)Mean daily maximum (°C)Mean daily (°C)Mean daily minimum (°C)Record low (°C)Average precipitation (mm)Average precipitation days (> 1mm)Average snowy daysAverage relative humidity (%)Mean monthly sunshine hoursPercentage possible sunshineAverage ultraviolet index

We can use transforms to focus on certain columns of data and to make conversions such as Celsius to Fahrenheit or mm to inches:

Selecting curves and columns

[edit]

A simple transform in commons:Module:TabUtils can be used to take the data and using the exported filter function "select" with argument "cols": "month,averageprecip" selects the first column (labeled month) of commons:Data:Climate_Paris.tab as xseries data and column 5 (averageprecip) as an yseries in commons:Data:Climate_Paris/transformed.chart:

    "transform": {
        "module": "TabUtils",
        "function": "select",
        "args": {
            "cols": "month,averageprecip"
        }
    },

Calling {{#chart:Climate_Paris/transformed.chart}} on a wiki page results in: Average precipitation (mm)Month of the year010203040506070JanuaryMaySeptemberAverage precipitation (mm)

Other columns can be chosen in the wikicode call by overriding the args of a .chart page that includes a TabUtils transform. Here, the two temperature columns recordhigh and meandaily are chosen as yseries. Note that the yaxis title can not be changed this way, so in this case, the yaxis title is wrong:

{{#chart:Climate Paris/transformed.chart
|arg:cols=month,recordhigh,meandaily
}}

This override results in the following chart: Average precipitation (mm)Month of the year01020304050JanuaryMaySeptemberRecord high (°C)Mean daily (°C)

If the .chart page default should show all curves, but allow selection of curves in the wikicode as in the above example, it is sufficient if the .chart json code includes the following:

    "transform": {
        "module": "TabUtils",
        "function": "select"
    }

Decimating input points

[edit]

The select transform in Commons:Module:TabUtils can also decimate the input data set if you have more points than you need to produce a graph (or more than can be fed into the chart renderer successfully!)

Here's a sample that uses a 10x decimation with the select transform to make a too-large data set render:

    "transform": {
        "module": "TabUtils",
        "function": "select",
        "args": {
            "decimate": "10"
        }
    },

y(k)log(k)00.10.20.30.40.50.60.7051015202530y(k)Markov constant chart