Skip to content

Large excels - optimize performance of writing file by excelJS + optimize generated file (MS excel opens it much faster) #1018

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 12, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 51 additions & 3 deletions lib/xlsx/xform/sheet/data-validations-xform.js
Original file line number Diff line number Diff line change
Expand Up @@ -28,17 +28,65 @@ function assignBool(definedName, attributes, name, defaultValue) {
}
}

function mergeDataValidations(model) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment describing how the model is being optimised here?

Copy link
Author

@pzawadzki82 pzawadzki82 Nov 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently excelJS model stores dataValidation per each cell. So if there is many cells data validation is created for every cell. I'll give you example: on my project there 5k rows x 20 columns ~ 100k cells. So in this case MS Excel openes generated file very long (minutes) or if some number is exceeded it reports that ms excel is corrupted.
In many cases validation is the same for bigger range (in our case it is while column but header). So my optimization just replaces multiple dataValidation xml nodes(with the same formula and continous range) with with one where address is defined as range instead one cell. So on my project instead 100k dataValidation nodes there is 20 and this file is opened by MS Excel fast and without errors. This modification does not change excelJS internal model - optimization is made just before save

const valueAddressesMap = new Map();
_.each(model, (value, address) => {
const key = JSON.stringify(value);
let mapPerColumn = valueAddressesMap.get(key);
if (!mapPerColumn) {
mapPerColumn = new Map();
valueAddressesMap.set(key, mapPerColumn);
}

const columnNo = address.match(/[A-Z]+/)[0];
const rowNo = parseInt(address.match(/\d+/)[0], 10);

let rowArray = mapPerColumn.get(columnNo);
if (!rowArray) {
rowArray = [];
mapPerColumn.set(columnNo, rowArray);
}
rowArray.push(rowNo);
});

const mergedResults = {};
valueAddressesMap.forEach((columnsMap, formulaeValue) => {
columnsMap.forEach((rows, columnNo) => {
const sortedRowNumbers = rows.sort((a, b) => a - b);
const arrayOfRanges = [];
sortedRowNumbers.forEach(rowNum => {
const previousNumber = arrayOfRanges.length > 0 ? arrayOfRanges[arrayOfRanges.length - 1].max : null;
if (previousNumber && previousNumber === rowNum - 1) {
arrayOfRanges[arrayOfRanges.length - 1].max = rowNum;
} else {
arrayOfRanges.push({min: rowNum, max: rowNum});
}
});
arrayOfRanges.forEach(range => {
if (range.min === range.max) {
mergedResults[`${columnNo}${range.min}`] = JSON.parse(formulaeValue);
} else {
mergedResults[`${columnNo}${range.min}:${columnNo}${range.max}`] = JSON.parse(formulaeValue);
}
});
});
});

return mergedResults;
}

class DataValidationsXform extends BaseXform {
get tag() {
return 'dataValidations';
}

render(xmlStream, model) {
const count = model && Object.keys(model).length;
const optimizedModel = mergeDataValidations(model);
const count = optimizedModel && Object.keys(optimizedModel).length;
if (count) {
xmlStream.openNode('dataValidations', {count});

_.each(model, (value, address) => {
_.each(optimizedModel, (value, address) => {
xmlStream.openNode('dataValidation');
if (value.type !== 'any') {
xmlStream.addAttribute('type', value.type);
Expand Down Expand Up @@ -75,7 +123,7 @@ class DataValidationsXform extends BaseXform {
(value.formulae || []).forEach((formula, index) => {
xmlStream.openNode(`formula${index + 1}`);
if (value.type === 'date') {
xmlStream.writeText(utils.dateToExcel(formula));
xmlStream.writeText(utils.dateToExcel(new Date(formula)));
} else {
xmlStream.writeText(formula);
}
Expand Down
2 changes: 2 additions & 0 deletions lib/xlsx/xform/style/styles-xform.js
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,8 @@ class StylesXform extends BaseXform {
// add default fills
this._addFill({type: 'pattern', pattern: 'none'});
this._addFill({type: 'pattern', pattern: 'gray125'});

this.weakMap = new WeakMap();
}

render(xmlStream, model) {
Expand Down