Skip to content

Implement TeX's fraction and script alignment#31046

Merged
QuLogic merged 3 commits intomatplotlib:text-overhaulfrom
QuLogic:22852/mathtext-vertical-align
Feb 7, 2026
Merged

Implement TeX's fraction and script alignment#31046
QuLogic merged 3 commits intomatplotlib:text-overhaulfrom
QuLogic:22852/mathtext-vertical-align

Conversation

@QuLogic
Copy link
Member

@QuLogic QuLogic commented Jan 28, 2026

PR summary

This is a rebase of #22852 by @tfpf. However, since we are planning to refresh the test images already, this reverts the change to move many mathtext images to SVG only (and I believe fixes some duplicated tests due to incorrect conflict resolution). Now, the only test change is to a nested-\frac test that is a duplicate and now is a nested-\dfrac test.

This is based on #30059 plus all the current test image changes, so that you can look at the second-last commit for this change and the last commit to review image changes from only this PR. I believe it does a fairly good job of fixing the fraction bar alignment issue that came up in #30059.

I have reviewed the fraction and sub/super script implementation with reference to the TeX book, but have not yet finished reviewing the font constants.

Fixes #18086
Fixes #18389
Fixes #22172

PR checklist

@QuLogic
Copy link
Member Author

QuLogic commented Jan 28, 2026

WRT the original #22852 (comment):

  • r'$\sqrt{1+\sqrt{1+\sqrt{1+\sqrt{1+\sqrt{1+\sqrt{1+\sqrt{1+x}}}}}}}$' (There are neither fractions nor superscripts or superscripts. And yet, this image failed the image comparison test.)

Also unsure about this one; it appears that one square root sign is sized a bit different?

  • r"$f'\quad f'''(x)\quad ''/\mathrm{yr}$" (Looks like apostrophes affected by the superscript positioning.)

Seems to be the case, but apostrophes are handled by the subsuper method, so this seems okay.

  • test_operator_space (In this check_figures_equal test, the 's' of 'cos' is displaced slightly compared to the reference image, even though 'co' is placed identically. This is strange, because operator kerning should have remained unchanged.)

This is no longer failing; I think this may be one of the rounding issues for the initial character found (and fixed) in #30059.

@QuLogic QuLogic requested a review from anntzer January 28, 2026 09:46
@QuLogic
Copy link
Member Author

QuLogic commented Jan 28, 2026

TODO: Since the height of fractions is a little bigger, I think I may need to tweak the AutoHeightChar tests to ensure that all sizes are correctly included.

@QuLogic
Copy link
Member Author

QuLogic commented Jan 29, 2026

I've gone through the sub/super script changes and they seem fine as well, other than a couple known shortcuts.

I'm only uncertain about the constants; they may have only been chosen to minimize image changes instead of matching TeX. For CM, it was also calculated with our current hinting settings and not the defaults that have been switched to in the text-overhaul branch.

Copy link
Member

@tacaswell tacaswell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modulo re basing and shortening the constants.

@tacaswell tacaswell moved this from Waiting for other PR to Ready for Review in Font and text overhaul Jan 29, 2026
@QuLogic QuLogic force-pushed the 22852/mathtext-vertical-align branch from b8b8063 to b6c10a0 Compare January 29, 2026 23:21
@QuLogic QuLogic force-pushed the 22852/mathtext-vertical-align branch from b6c10a0 to 75c11c0 Compare January 29, 2026 23:32
@QuLogic
Copy link
Member Author

QuLogic commented Jan 31, 2026

While looking at the TeX algorithms, I found out about tftopl which can show some metrics from TFM files. For Computer Modern, I looked at the output for cmsy10:

(FAMILY CMSY)
(FACE O 352)
(CODINGSCHEME TEX MATH SYMBOLS)
(DESIGNSIZE R 10.0)
(COMMENT DESIGNSIZE IS IN POINTS)
(COMMENT OTHER SIZES ARE MULTIPLES OF DESIGNSIZE)
(CHECKSUM O 4110426232)
(FONTDIMEN
   (SLANT R 0.25)
   (SPACE R 0.0)
   (STRETCH R 0.0)
   (SHRINK R 0.0)
   (XHEIGHT R 0.430555)
   (QUAD R 1.000003)
   (EXTRASPACE R 0.0)
   (NUM1 R 0.676508)
   (NUM2 R 0.393732)
   (NUM3 R 0.443731)
   (DENOM1 R 0.685951)
   (DENOM2 R 0.344841)
   (SUP1 R 0.412892)
   (SUP2 R 0.362892)
   (SUP3 R 0.288889)
   (SUB1 R 0.15)
   (SUB2 R 0.247217)
   (SUPDROP R 0.386108)
   (SUBDROP R 0.05)
   (DELIM1 R 2.389999)
   (DELIM2 R 1.01)
   (AXISHEIGHT R 0.25)
   )

Comparing the values, we have:

metric current tfm tfm/xheight
supdrop 0.354296875 0.386108 0.896768125
subdrop 0.354296875 0.05 0.116129182
sup1 0.79716796875 0.412892 0.958976205
sub1 0.354296875 0.15 0.348387546
sub2 0.5314453125 0.247217 0.57418216
num1 1.5 0.676508 1.571246415
num2 1.5 0.393732 0.914475503
num3 1.5 0.443731 1.030602362
denom1 1.6 0.685951 1.593178572
denom2 1.2 0.344841 0.80092206

Remember that we multiply everything by x-height, so here I've divided that metric out in the last column. Also, we use subdrop in for both sub/superscripts, so it appears as the current values for both.

So, the numbers we have for sub1, sub2, are fairly close. subdrop / subdrop are quite different from what we have, and sup1 is also a bit bigger, but if we change the code to apply each separately, then it's mostly that superscripts drop by another pixel or so. If we make this change, about 100 test images are affected.

For the fraction metrics (num1, num2, num3, denom1, denom2) there is a quite a bit of a difference in some of them. If we apply those, it affects about 70 test images, and mostly fractions "close up" a bit, but this is closer to how TeX does it.

Before we had this (with usetex=True on the right):
Figure_1
and with these updated constants we have:
Figure_2
It's still a tiny bit off, but I think that's because we aren't the ones rendering the usetex text; with #30039 it's even closer.

I haven't found any corresponding TFM files with the same metrics in them for the other fonts we have. I believe modern LaTeX can synthesize these directly from the font, and I do see some embedded MATH tables in there, but I haven't worked out the conversions for those yet.

I've pushed the changes to the Computer Modern constants as two commits for ease of review.

@tfpf
Copy link
Contributor

tfpf commented Jan 31, 2026

This extra +/- rule is odd compared to TeX, but it is explained in the original PR

Vlists containing Hrules don't render vertical spaces correctly. The code for reproduction given in #23763 still works!

  • test_operator_space (In this check_figures_equal test, the 's' of 'cos' is displaced slightly compared to the reference image, even though 'co' is placed identically. This is strange, because operator kerning should have remained unchanged.)

This is no longer failing; I think this may be one of the rounding issues for the initial character found (and fixed) in

I reported this as a bug back then (#23474). Changing the script rendering logic is what exposed it. After merging a fix (#23482), the test passed even with the updated logic.

I'm only uncertain about the constants; they may have only been chosen to minimize image changes instead of matching TeX.

🎯💯

subdrop / subdrop are quite different from what we have, and sup1 is also a bit bigger, but if we change the code to apply each separately, then it's mostly that superscripts drop by another pixel or so. If we make this change, about 100 test images are affected.

For the fraction metrics (num1, num2, num3, denom1, denom2) there is a quite a bit of a difference in some of them.

I remember being puzzled that using the same constants (from either Computer Modern or Latin Modern—I can't remember which one I had checked) didn't yield neatly aligned denominators. The multiplication by the x-height appears to be the key, which didn't strike me then.

sup1 = 0.79716796875
sub1 = 0.354296875
sub2 = 0.5314453125
supdrop = 0.386108 / 0.430555
Copy link
Contributor

@anntzer anntzer Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get more accurate (effectively fixed-point representation) values from the tfm file, as tftopl prints values after scaling and drops some decimal points, e.g. for cmsy10 I get
slant=262144, space=0, space_stretch=0, space_shrink=0, x_height=451470, quad=1048579, extra_space=0 num1=709370, num2=412858, num3=465286, denom1=719272, denom2=361592, sup1=432949, sup2=380520, sup3=302922, sub1=157286, sub2=259226, supdrop=404864, subdrop=52429, delim1=2506096, delim2=1059062, axis_height=262144
(values need to be scaled by 2**20).

I extended the Tfm class to read these values at anntzer@a360989 if you want to try your hand at it (we don't need to decide now whether we actually want to integrate this functionality into the Tfm class).

Note that this also allows reading the actual definition of 1em and 1ex ("quad" and "x_height"), which should be better than the current approach of trying to guess the values (see @tfpf's comment

I reported this as a bug back then (#23474). Changing the script rendering logic is what exposed it. After merging a fix (#23482), the test passed even with the updated logic.

the linked threads (#23474 (comment)), and https://tex.stackexchange.com/a/98139).
Perhaps also worth fixing this properly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's very useful, and might be something to implement along with stuff needed for #31048. For now, I've just put the numbers you have read directly into the file. They're very close and it only affects one image.

For quad / x_height, I can take a look, but we can also make that change independently from this one, I think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure (though it'll probably again break all images, so it has to be done in the text-overhaul branch too).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I just noticed that fontTools has a TFM parser; I'm not sure how extensive it is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a very quick look it looks complete; you can run python -mfontTools.tfmLib foo.tfm and in particular the params above will be printed (again they are scaled by 2**20 but all the decimal points are given so you can just remultiply back).

@anntzer
Copy link
Contributor

anntzer commented Jan 31, 2026

Vlists containing Hrules don't render vertical spaces correctly. The code for reproduction given in #23763 still works!

It appears to me that this specific issue has been fixed by #30059, can you confirm? (I haven't looked carefully yet at how this interacts with the fraction rendering.)

@QuLogic
Copy link
Member Author

QuLogic commented Feb 3, 2026

Happy to see you're still around @tfpf and thanks for answering any questions we have.

re: the +/-rule issue. Empirically it looks like the +/-rule can be removed by changing how Rules are being rendered; i.e. the following patch appears to work:

If I make this change, it affects 65 test images. Most of them do not contain fractions, but \overline and square roots. In the case of heavily nested roots (image 53), it actually looks better aligned. There are several tips of the ⎷ that are about a pixel above the corresponding bar. I'd need to check the overlines. Since this almost always shifts up rects by one pixel (after snapping, presumably), I guess this is the original origin of the incorrect fraction bar?

@anntzer
Copy link
Contributor

anntzer commented Feb 3, 2026

Happy to see you're still around @tfpf and thanks for answering any questions we have.

Yes, I didn't mention this, but: @tfpf: I'm sorry your (nice!) work didn't go in last time, turns out we needed an even bigger overhaul of many parts to fix all the issues; thanks for coming back to help. I'll also take that opportunity to thank @QuLogic for keeping all the moving parts together and taking care of the more-or-less complete patches I keep throwing around 😅

re: the +/-rule issue. Empirically it looks like the +/-rule can be removed by changing how Rules are being rendered; i.e. the following patch appears to work:

If I make this change, it affects 65 test images. Most of them do not contain fractions, but \overline and square roots. In the case of heavily nested roots (image 53), it actually looks better aligned. There are several tips of the ⎷ that are about a pixel above the corresponding bar. I'd need to check the overlines. Since this almost always shifts up rects by one pixel (after snapping, presumably), I guess this is the original origin of the incorrect fraction bar?

I suspect so? Not that I looked.

@QuLogic QuLogic force-pushed the 22852/mathtext-vertical-align branch from b2afc68 to df479e9 Compare February 3, 2026 10:21
Copy link
Contributor

@anntzer anntzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Various small points remain (reorder/rewrite the num/den clearance calculations to follow Knuth more closely, whether to get x-height before or after shrinking, x-height/axis-height detection) but I'll let @QuLogic decide whether he wants to do that in this PR (self-merge is fine I think?) or open separate issues to track them.

@anntzer
Copy link
Contributor

anntzer commented Feb 3, 2026

By the way, https://www.tug.org/TUGboat/tb27-1/tb86jackowski.pdf has some nice figures explaining fraction layout, and https://www.tug.org/~vieth/papers/bachotex2008/math-font-paper.pdf some more discussion on inferring font parameters (though neither the x-height nor the axis-height...).

@QuLogic
Copy link
Member Author

QuLogic commented Feb 4, 2026

According to The LaTeX Font Catalogue, I don't think DejaVu Sans provides math support in LaTeX, so I'm a bit uncertain how to compare our math constants with it. Opening up the MATH table with fontTools and picking out the values from https://ntg.nl/maps/38/03.pdf it seems that pretty much all of them are 0 and thus useless.

Instead, I've managed to find values in FontForge's TeX tables in a slightly-hidden settings page, then dividing by the xheight in that same setting page (1120), we get:

metric current TeX table TeX / xheight
supdrop 0.4 790.527 0.705827679
subdrop 0.4 102.4 0.091428571
sup1 0.7 845.824 0.7552
sub1 0.3 307.199 0.274284821
sub2 0.5 632.832 0.565028571
num1 1.4 1529.86 1.365946429
num2 1.5 868.352 0.775314286
num3 1.3 970.752 0.866742857
denom1 1.3 1548.29 1.382401786
denom2 1.1 768 0.685714286

supdrop/subdrop are quite different, but that's because they used to be one constant. num2/num3/denom2 all shrunk quite a bit, but the same thing happened with Computer Modern above.

Before, we had:
DejaVu Sans before
and now we get:
DejaVu Sans after

Note that the right side of the figure is usetex=True with \usepackage{arev} for the preamble. These are both from the same lineage of Bitstream Vera Sans, but I'm not sure if they're comparable if DejaVu Sans never had math support, so take those with a grain of salt, especially since the x doesn't really look the same style.

DejaVu Serif has the same metrics, except the xheight is 1063 instead, so I won't post the table over again. Instead, just before and after images:
DejaVuSerif before
DejaVuSerif after

@anntzer
Copy link
Contributor

anntzer commented Feb 4, 2026

There is a dejavu math font: https://www.gust.org.pl/projects/e-foundry/tex-gyre-dejavu-math / https://ctan.org/pkg/tex-gyre-math-dejavu?lang=en but it's only a serif font :/ not sure how well the constants would compare?

@QuLogic
Copy link
Member Author

QuLogic commented Feb 4, 2026

Well, we do have DejaVu Serif as well with almost the same metrics; only the xheight is different at 1063 instead of 1120 (updated above with those images as well.) But it looks like the TeX Gyre fonts are for unicode-math and would require the other LaTeX engines to plug in to our usetex setup, too.

@llohse
Copy link

llohse commented Feb 4, 2026

Well, we do have DejaVu Serif as well with almost the same metrics; only the xheight is different at 1063 instead of 1120 (updated above with those images as well.) But it looks like the TeX Gyre fonts are for unicode-math and would require the other LaTeX engines to plug in to our usetex setup, too.

#31064 enables to use the TeX Gyre fonts from within usetex. Currently, it does not dynamically set the "constants" but uses the defaults, but one could in principle read the otf MATH table.

@QuLogic
Copy link
Member Author

QuLogic commented Feb 4, 2026

For STIX, I again took the TeX tables and divided by xheight (450):

metric current TeX table TeX / xheight
supdrop 0.4 386 0.857777778
subdrop 0.4 50.0002 0.111111556
sup1 0.8 413 0.917777778
sub1 0.3 150 0.333333333
sub2 0.6 309 0.686666667
num1 1.6 747 1.66
num2 1.6 424 0.942222222
num3 1.6 474 1.053333333
denom1 1.6 756 1.68
denom2 1.1 375 0.833333333

It's largely different in the same places as before, I think.
Before:
STIX before
After:
STIX after

And for STIX Sans, it's the same font, so it's the same final metrics, but the metrics were slightly different before:

metric current TeX table TeX / xheight
supdrop 0.4 386 0.857777778
subdrop 0.4 50.0002 0.111111556
sup1 0.8 413 0.917777778
sub1 0.3 150 0.333333333
sub2 0.5 309 0.686666667
num1 1.5 747 1.66
num2 1.5 424 0.942222222
num3 1.5 474 1.053333333
denom1 1.5 756 1.68
denom2 1.1 375 0.833333333

Before:
STIX Sans before
After:
STIX Sans after

@QuLogic QuLogic force-pushed the 22852/mathtext-vertical-align branch 2 times, most recently from a77b250 to df76ad7 Compare February 6, 2026 08:43
@QuLogic
Copy link
Member Author

QuLogic commented Feb 6, 2026

Squashed into fewer commits and rebased on the latest text-overhaul with just the test images that are relevant here. I'll see about merging tomorrow.

tfpf and others added 3 commits February 6, 2026 17:33
As described in *TeX: the Program* by Don Knuth.

New font constants are set to the nearest integral multiples of 0.1 for
which numerators and denominators containing normal text do not have to
be shifted beyond their default shift amounts at font size 30 in display
and text styles. To better process superscripts and subscripts, the
x-height is now always calculated instead of being retrieved from the
font table (which was the case for Computer Modern); the affected font
constants have been changed.

A duplicate test was also fixed in the process.
At the call site
https://github.com/matplotlib/matplotlib/blob/51fbfc4eb0e882ef7e95ceab9777c7047f4db819/lib/matplotlib/_mathtext.py#L1706-L1709
the box should clearly go from `cur_v + off_v` to `cur_v + off_v -
rule_height` (this is why `cur_v` is shifted by `+ rule_height` just
before; also at that point some print debugging indicates that y's go
*downwards*), so `Rule.render` should indeed call `render_rect_filled`
from `y - h` to `y`.
Computer Modern values are taken from `cmsy10.tfm`, and divided by the
x-height in that output to match the scale used in Matplotlib.

DejaVu Sans/Serif and STIX constants are taken from the embedded TeX
table extracted with FontForge.
@QuLogic QuLogic force-pushed the 22852/mathtext-vertical-align branch 2 times, most recently from ce85a59 to 1cd8510 Compare February 6, 2026 23:54
@QuLogic
Copy link
Member Author

QuLogic commented Feb 7, 2026

OK, going to merge, but the text-overhaul branch is still open if anything major comes up.

@QuLogic QuLogic merged commit afdd53a into matplotlib:text-overhaul Feb 7, 2026
33 of 36 checks passed
@github-project-automation github-project-automation bot moved this from Ready for Review to Done in Font and text overhaul Feb 7, 2026
@QuLogic QuLogic deleted the 22852/mathtext-vertical-align branch February 7, 2026 01:33


class STIXSansFontConstants(FontConstantsBase):
# These values are extracted from the TeX table of STIXGeneral.ttf using FreeType,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have said FontForge, not FreeType; I'll fix it in a followup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

6 participants