Skip to content

[BUG] Streaming WorkbookReader _parseSharedStrings doesn't handle rich text within shared string nodes #1431

@rheidari

Description

@rheidari

🐛 Bug Report

Lib version: 4.1.1

Rich Text nodes contained within the internal sharedStrings.xml file results in each node within the rich text shared string to be emitted / pushed separately into the cached sharedStrings array, resulting in sharedString indexes to be incorrect.

Steps To Reproduce

const workbook = new Excel.stream.xlsx.WorkbookWriter({
    filename: "./test.xlsx",
    useSharedStrings: true,
});

const sheet = workbook.addWorksheet("data");

const rowData = [
    {
        richText: [
            { font: { bold: true }, text: "This should " },
            { font: { italic: true }, text: "be one shared string value" },
        ],
    },
    "this should be the second shared string",
];

sheet.addRow(rowData);

await workbook.commit();

const workbookReader = new Excel.stream.xlsx.WorkbookReader("./test.xlsx", {
    entries: "emit",
    hyperlinks: "cache",
    sharedStrings: "cache",
    styles: "cache",
    worksheets: "emit",
});

for await (const worksheetReader of workbookReader) {
    for await (const row of worksheetReader) {
        // actual: 'This Should '
        expect(row.values[1]).toEqual(rowData[0]);
        // actual: 'be one shared string value'
        expect(row.values[2]).toEqual(rowData[1]);
    }
}

The expected behaviour:

The sharedString value should be the entire rich text object rather being split into separate pieces.

Possible solution (optional, but very helpful):

async *_parseSharedStrings(entry) {
  this._emitEntry({type: 'shared-strings'});
  switch (this.options.sharedStrings) {
    case 'cache':
      this.sharedStrings = [];
      break;
    case 'emit':
      break;
    default:
      return;
  }

  let text = null;
  let richText = [];
  let index = 0;
  let font = null;
  for await (const events of parseSax(iterateStream(entry))) {
    for (const {eventType, value} of events) {
      if (eventType === 'opentag') {
        const node = value;
        switch (node.name) {
          case 'b':
            font = font || {};
            font.bold = true;
            break;
          case 'charset':
            font = font || {};
            font.charset = parseInt(node.attributes.charset, 10);
            break;
          case 'color': 
            font = font || {};
            font.color = {};
            if (node.attributes.rgb) {
              font.color.argb = node.attributes.argb;
            }
            if (node.attributes.val) {
              font.color.argb = node.attributes.val;
            }
            if (node.attributes.theme) {
              font.color.theme = node.attributes.theme;
            }
            break;
          case 'family':
            font = font || {};
            font.family = parseInt(node.attributes.val, 10);
            break;
          case 'i':
            font = font || {};
            font.italic = true;
            break;
          case 'outline':
            font = font || {};
            font.outline = true;
            break;
          case 'rFont':
            font = font || {};
            font.name = node.value;
            break;
          case 'si':
            font = null;
            richText = [];
            text = null;
            break;
          case 'sz':
            font = font || {};
            font.size = parseInt(node.attributes.val, 10);
            break;
          case 'strike':
            break;
          case 't':
            text = null;
            break;
          case 'u':
            font = font || {};
            font.underline = true;
            break;
          case 'vertAlign':
            font = font || {};
            font.vertAlign = node.attributes.val
            break;
        }
      } else if (eventType === 'text') {
        text = text ? text + value : value;
      } else if (eventType === 'closetag') {
        const node = value;
        switch (node.name) {
          case 'r':
            richText.push({
              font,
              text
            });

            font = null;
            text = null;
            break;
          case 'si':
            let data = text;
            if (richText.length) {
              data = { richText };
            }
            if (this.options.sharedStrings === 'cache') {
              this.sharedStrings.push(data);
            } else if (this.options.sharedStrings === 'emit') {
              yield { index: index++, text: data };
            }

            richText = [];
            font = null;
            text = null;
            break;
        }
      }
    }
  }
}

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions