Skip to content

Jinja upgrade to v3 #1490

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
630ecbf
rfctr(lint): tune in ruff settings
scanny Apr 29, 2024
5a22c52
rfctr: improve typing for tables
scanny Nov 6, 2023
cf5286c
rfctr: modernize table tests
scanny Apr 27, 2024
d8a3289
rfctr: improve expression
scanny Apr 27, 2024
6c34f12
rfctr: modernize opc.shared.lazyproperty
scanny Apr 29, 2024
4e5dd91
feat(table): add _Row.grid_cols_before
scanny Apr 27, 2024
1cfcee7
feat(table): add _Row.grid_cols_after
scanny Apr 27, 2024
5a1d614
docs: update Table docs
scanny Apr 28, 2024
6d49a69
rfctr(table): reimplement CT_Tc._tr_above
scanny Apr 29, 2024
382d43e
feat(table): add _Cell.grid_span
scanny Apr 28, 2024
7508051
feat(table): add CT_Tc.grid_offset
scanny Apr 28, 2024
512f269
rfctr(table): reimplement CT_Tc.tc_at_grid_offset
scanny Apr 28, 2024
f4a48b5
fix(table): fix _Row.cells can raise IndexError
scanny Apr 28, 2024
89b399b
feat(typing): add py.typed, improve public types
scanny Apr 29, 2024
94802e4
fix: fix some shortlist issues
scanny Apr 29, 2024
5a80006
fix(packaging): small packaging and doc tweaks
scanny Apr 30, 2024
0a09474
rfctr: resolve some import cycles
scanny Apr 30, 2024
e531576
release: prepare v1.1.1 release
scanny Apr 30, 2024
3f56b7d
rfctr(dev): use more performant `fd` for clean
scanny May 1, 2024
e493474
fix: XmlPart._rel_ref_count
scanny May 1, 2024
f246fde
rfctr: improve typing
scanny Apr 30, 2024
0ec5dcd
fix(pkg): pull lxml pin
scanny Apr 30, 2024
4cbbdab
fix: accommodate docxtpl use of Part._rels
scanny Apr 30, 2024
0a8e9c4
fix: Python 3.12 fixes
scanny May 1, 2024
0cf6d71
release: prepare v1.1.2 release
scanny May 1, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions HISTORY.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,21 @@
Release History
---------------

1.1.2 (2024-05-01)
++++++++++++++++++

- Fix #1383 Revert lxml<=4.9.2 pin that breaks Python 3.12 install
- Fix #1385 Support use of Part._rels by python-docx-template
- Add support and testing for Python 3.12

1.1.1 (2024-04-29)
++++++++++++++++++

- Fix #531, #1146 Index error on table with misaligned borders
- Fix #1335 Tolerate invalid float value in bottom-margin
- Fix #1337 Do not require typing-extensions at runtime


1.1.0 (2023-11-03)
++++++++++++++++++

Expand Down
3 changes: 2 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,8 @@ build:
$(BUILD)

clean:
find . -type f -name \*.pyc -exec rm {} \;
# find . -type f -name \*.pyc -exec rm {} \;
fd -e pyc -I -x rm
rm -rf dist *.egg-info .coverage .DS_Store

cleandocs:
Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ User Guide
user/install
user/quickstart
user/documents
user/tables
user/text
user/sections
user/hdrftr
Expand Down
202 changes: 202 additions & 0 deletions docs/user/tables.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
.. _tables:

Working with Tables
===================

Word provides sophisticated capabilities to create tables. As usual, this power comes with
additional conceptual complexity.

This complexity becomes most apparent when *reading* tables, in particular from documents drawn from
the wild where there is limited or no prior knowledge as to what the tables might contain or how
they might be structured.

These are some of the important concepts you'll need to understand.


Concept: Simple (uniform) tables
--------------------------------

::

+---+---+---+
| a | b | c |
+---+---+---+
| d | e | f |
+---+---+---+
| g | h | i |
+---+---+---+

The basic concept of a table is intuitive enough. You have *rows* and *columns*, and at each (row,
column) position is a different *cell*. It can be described as a *grid* or a *matrix*. Let's call
this concept a *uniform table*. A relational database table and a Pandas dataframe are both examples
of a uniform table.

The following invariants apply to uniform tables:

* Each row has the same number of cells, one for each column.
* Each column has the same number of cells, one for each row.


Complication 1: Merged Cells
----------------------------

::

+---+---+---+ +---+---+---+
| a | b | | | b | c |
+---+---+---+ + a +---+---+
| c | d | e | | | d | e |
+---+---+---+ +---+---+---+
| f | g | h | | f | g | h |
+---+---+---+ +---+---+---+

While very suitable for data processing, a uniform table lacks expressive power desireable for
tables intended for a human reader.

Perhaps the most important characteristic a uniform table lacks is *merged cells*. It is very common
to want to group multiple cells into one, for example to form a column-group heading or provide the
same value for a sequence of cells rather than repeat it for each cell. These make a rendered table
more *readable* by reducing the cognitive load on the human reader and make certain relationships
explicit that might easily be missed otherwise.

Unfortunately, accommodating merged cells breaks both the invariants of a uniform table:

* Each row can have a different number of cells.
* Each column can have a different number of cells.

This challenges reading table contents programatically. One might naturally want to read the table
into a uniform matrix data structure like a 3 x 3 "2D array" (list of lists perhaps), but this is
not directly possible when the table is not known to be uniform.


Concept: The layout grid
------------------------

::

+ - + - + - +
| | | |
+ - + - + - +
| | | |
+ - + - + - +
| | | |
+ - + - + - +

In Word, each table has a *layout grid*.

- The layout grid is *uniform*. There is a layout position for every (layout-row, layout-column)
pair.
- The layout grid itself is not visible. However it is represented and referenced by certain
elements and attributes within the table XML
- Each table cell is located at a layout-grid position; i.e. the top-left corner of each cell is the
top-left corner of a layout-grid cell.
- Each table cell occupies one or more whole layout-grid cells. A merged cell will occupy multiple
layout-grid cells. No table cell can occupy a partial layout-grid cell.
- Another way of saying this is that every vertical boundary (left and right) of a cell aligns with
a layout-grid vertical boundary, likewise for horizontal boundaries. But not all layout-grid
boundaries need be occupied by a cell boundary of the table.


Complication 2: Omitted Cells
-----------------------------

::

+---+---+ +---+---+---+
| a | b | | a | b | c |
+---+---+---+ +---+---+---+
| c | d | | d |
+---+---+ +---+---+---+
| e | | e | f | g |
+---+ +---+---+---+

Word is unusual in that it allows cells to be omitted from the beginning or end (but not the middle)
of a row. A typical practical example is a table with both a row of column headings and a column of
row headings, but no top-left cell (position 0, 0), such as this XOR truth table.

::

+---+---+
| T | F |
+---+---+---+
| T | F | T |
+---+---+---+
| F | T | F |
+---+---+---+

In `python-docx`, omitted cells in a |_Row| object are represented by the ``.grid_cols_before`` and
``.grid_cols_after`` properties. In the example above, for the first row, ``.grid_cols_before``
would equal ``1`` and ``.grid_cols_after`` would equal ``0``.

Note that omitted cells are not just "empty" cells. They represent layout-grid positions that are
unoccupied by a cell and they cannot be represented by a |_Cell| object. This distinction becomes
important when trying to produce a uniform representation (e.g. a 2D array) for an arbitrary Word
table.


Concept: `python-docx` approximates uniform tables by default
-------------------------------------------------------------

To accurately represent an arbitrary table would require a complex graph data structure. Navigating
this data structure would be at least as complex as navigating the `python-docx` object graph for a
table. When extracting content from a collection of arbitrary Word files, such as for indexing the
document, it is common to choose a simpler data structure and *approximate* the table in that
structure.

Reflecting on how a relational table or dataframe represents tabular information, a straightforward
approximation would simply repeat merged-cell values for each layout-grid cell occupied by the
merged cell::


+---+---+---+ +---+---+---+
| a | b | -> | a | a | b |
+---+---+---+ +---+---+---+
| | d | e | -> | c | d | e |
+ c +---+---+ +---+---+---+
| | f | g | -> | c | f | g |
+---+---+---+ +---+---+---+

This is what ``_Row.cells`` does by default. Conceptually::

>>> [tuple(c.text for c in r.cells) for r in table.rows]
[
(a, a, b),
(c, d, e),
(c, f, g),
]

Note this only produces a uniform "matrix" of cells when there are no omitted cells. Dealing with
omitted cells requires a more sophisticated approach when maintaining column integrity is required::

# +---+---+
# | a | b |
# +---+---+---+
# | c | d |
# +---+---+
# | e |
# +---+

def iter_row_cell_texts(row: _Row) -> Iterator[str]:
for _ in range(row.grid_cols_before):
yield ""
for c in row.cells:
yield c.text
for _ in range(row.grid_cols_after):
yield ""

>>> [tuple(iter_row_cell_texts(r)) for r in table.rows]
[
("", "a", "b"),
("c", "d", ""),
("", "e", ""),
]


Complication 3: Tables are Recursive
------------------------------------

Further complicating table processing is their recursive nature. In Word, as in HTML, a table cell
can itself include one or more tables.

These can be detected using ``_Cell.tables`` or ``_Cell.iter_inner_content()``. The latter preserves
the document order of the table with respect to paragraphs also in the cell.
50 changes: 0 additions & 50 deletions features/steps/cell.py

This file was deleted.

Loading