-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Description
Bug report
Bug summary
Characters from beyond the Basic Multilingual Plane (BMP) in Unicode are not displayed in PDFs with Type 42 fonts. Characters beyond the BMP have a code point greater than 65535 and cannot be encoded in a fixed-size 2-byte encoding, such as the obsolete UCS-2. My understanding is that the CID maps still use such an encoding and cannot handle code points beyond that.
In the case of the m
in STIX Sans, it's implemented via a virtual font embedded in the base font with chars shifted to higher code points, here code point 120366.
Code for reproduction
import matplotlib.pyplot as plt
from matplotlib import rcParams
rcParams["pdf.fonttype"] = 42
rcParams["mathtext.fontset"] = "stixsans"
plt.text(0.5, 0.5, "Mass $m$ \U00010308")
plt.savefig("beyond_bmp.pdf")
Actual outcome
Possible solutions
We could take a similar approach as we did for Type 3 fonts. There any char>255 is embedded via an XObject
and not via the font directly. So, here the solution would be to use XObjects
if the char>65535 in text and math mode.
I have a local, modified version of matplotlib based on #20615 which implements this approach. It extends the use of _font_supports_char
and restricts the supported range for Type 42 to <=65535. I can polish this local fix into a PR if people what to go this route.
Matplotlib version
- Operating system: Debian 11
- Matplotlib version (
import matplotlib; print(matplotlib.__version__)
): 3.4.2 - Matplotlib backend (
print(matplotlib.get_backend())
): TkAgg (but I assume pdf) - Python version: 3.9.2
- Matplotlib installed with pip in a virtual env