Skip to content

Debugging timeouts: make tests verbose in CI#31087

Draft
rcomer wants to merge 14 commits intomatplotlib:mainfrom
rcomer:verbose-ci
Draft

Debugging timeouts: make tests verbose in CI#31087
rcomer wants to merge 14 commits intomatplotlib:mainfrom
rcomer:verbose-ci

Conversation

@rcomer
Copy link
Member

@rcomer rcomer commented Feb 5, 2026

This PR is not for merging, just using the CI.

Made pytest verbose in hope of generating clues for diagnosing the subprocess timeout problem #30851. Removed the standard Ubuntu tests because I don't think we ever see timeouts there.

@rcomer
Copy link
Member Author

rcomer commented Feb 5, 2026

Timeouts never come along when you want them!

@rcomer rcomer closed this Feb 5, 2026
@rcomer rcomer reopened this Feb 5, 2026
@rcomer
Copy link
Member Author

rcomer commented Feb 5, 2026

Here we have a timeout for test_font_manager.py::test_fontcache_thread_safe . If I am correct that the tests report in the order that they complete then gw1 was running test_streamplot.py::test_integration_options[png] while gw0 was timing out on test_font_manager.py::test_fontcache_thread_safe .
https://dev.azure.com/matplotlib/matplotlib/_build/results?buildId=47001&view=logs&j=0e51c8a9-4119-5244-b946-ee585b1b50f1&t=e90dd38b-859f-5d2f-90bb-85e3d7473c7a&l=16349

That test is relatively new so could fit with when we started to see CI timeouts.

@rcomer
Copy link
Member Author

rcomer commented Feb 5, 2026

Very aware that I may be clutching at straws / on a wild goose chase / insert idiom of choice here.

@story645
Copy link
Member

story645 commented Feb 5, 2026

Is there a way we could trigger verbose output via commit message? Basically I know this is a draft, but would it be useful to have a permanent flavor of this PR?

@rcomer
Copy link
Member Author

rcomer commented Feb 5, 2026

At this point I don't even know if any of this is useful! 🫣

@timhoffm
Copy link
Member

timhoffm commented Feb 5, 2026

Is there a way we could trigger verbose output via commit message?

What‘s the purpose? Either we want to generally collect information, in which case we just switch on verbose on main. Or we only want to do targeted experiments, in which case a branch in a PR is sufficient.

@story645
Copy link
Member

story645 commented Feb 5, 2026

What‘s the purpose?

For the reason I think Ruth opened this PR? Something is going wonky and it seems CI specific and it'd be helpful to have the verbose output specifically here for debugging the one thing.

@timhoffm
Copy link
Member

timhoffm commented Feb 6, 2026

Ye but what do you need the „trigger from commit message“ for? We can configure this once and commit it. Either here if we want to limit changes to a specific experimental environment, or on main if we want to generally collect data.

@story645
Copy link
Member

story645 commented Feb 6, 2026

Either here if we want to limit changes to a specific experimental environment

I don't know what you mean by this

on main if we want to generally collect data.

I'm guessing we don't want this all the time, only when a PR is breaking in ways where the short messages are unhelpful.

@timhoffm timhoffm changed the title CI: make tests verbose Debugging timeouts: make tests verbose in CI Feb 6, 2026
@timhoffm
Copy link
Member

timhoffm commented Feb 6, 2026

Ah, I think I understand our mutual misunderstanding: I'm focussed on the flaky timeout issue (where toggling verbosity does not help). You are discussing whether toggling verbosity would be a generally desirable debugging tool.

Let's move that discussion out and focus here on the timeouts. You are welcome to open an issue for the general solution if you are interested.

@timhoffm
Copy link
Member

timhoffm commented Feb 6, 2026

@rcomer I believe your ideas and approach is valuable. Obviously, removing that single test didn't cut it.

I noticed here an in the run before that before the timeout, many (>10) tests from the other worker completed. So it's likely not a single long-running blocking task in the other worker. Ideas for further investigation:

  • Run in a single worker to check whether it's fundamentally tied to concurrency.
  • Run the original configuration multiple times, collect logs of failing tests, and investigate more broadly whether we can spot patterns around the failing tests (like many test completions in the other worker, but more broadly: are there recurring tests in the one or other worker etc.)

@rcomer
Copy link
Member Author

rcomer commented Feb 6, 2026

Running in one worker: the first attempt gave no timeouts but there is pretty big variation in how long tests take.

On Azure py312 and py313, test_backend_inline.py::test_ipynb was the slowest at around half a minute, whereas on py311 it only took 7 seconds.

On MacOS 14 both python versions, test_backends_interactive.py::test_other_signal_before_sigint[show-kwargs0-MPLBACKEND=macosx-BACKEND_DEPS=matplotlib.backends._macosx] took slightly more than two minutes https://github.com/matplotlib/matplotlib/actions/runs/21749380505/job/62743194913?pr=31087#step:13:10338
https://github.com/matplotlib/matplotlib/actions/runs/21749380505/job/62743194872?pr=31087#step:13:10332

So maybe the concurrency is a red herring.

Trying again to see what turns up....

@rcomer rcomer closed this Feb 6, 2026
@rcomer rcomer reopened this Feb 6, 2026
@rcomer
Copy link
Member Author

rcomer commented Feb 6, 2026

This time MacOS 15 instead of MacOS 14 has the long-running test_other_signal_before_sigint[show-kwargs0-MPLBACKEND=macosx-BACKEND_DEPS=matplotlib.backends._macosx]
https://github.com/matplotlib/matplotlib/actions/runs/21750941199/job/62748452349?pr=31087#step:13:10336
https://github.com/matplotlib/matplotlib/actions/runs/21750941199/job/62748452338?pr=31087#step:13:10334

Edit: just realised this test is anyway xfailed, so maybe not one to focus on

if sys.platform == "darwin" and target == "show":
# We've not previously had these toolkits installed on CI, and so were never
# aware that this was crashing. However, we've had little luck reproducing it
# locally, so mark it xfail for now. For more information, see
# https://github.com/matplotlib/matplotlib/issues/27984
request.node.add_marker(
pytest.mark.xfail(reason="Qt backend is buggy on macOS"))

@rcomer rcomer closed this Feb 6, 2026
@rcomer rcomer reopened this Feb 6, 2026
@rcomer
Copy link
Member Author

rcomer commented Feb 6, 2026

MacOS 14 and 15 have less capacity (CPU and memory) than any other runners
https://docs.github.com/en/actions/reference/runners/github-hosted-runners#standard-github-hosted-runners-for-public-repositories

In GitHub Actions we set the pytest number of runners to "auto". In this recent PR, pytest chose 3 workers for MacOS and only 2 for Ubuntu.
https://github.com/matplotlib/matplotlib/actions/runs/21737535668?pr=31085

That seems the wrong way around! Maybe we should just fix it at 2?

@rcomer
Copy link
Member Author

rcomer commented Feb 6, 2026

On Ubuntu-arm at the top of this PR, pytest chose 4 workers. There was 1 timeout and 1 UnraiseableException.
https://github.com/matplotlib/matplotlib/actions/runs/21710571566/job/62612794806#step:13:31

@rcomer
Copy link
Member Author

rcomer commented Feb 6, 2026

Well that gave me a failure, but it wasn't a timeout. It does have something to do with subprocesses though.
https://github.com/matplotlib/matplotlib/actions/runs/21762914110/job/62790968375?pr=31087#step:13:20536

The successful runs completed in 16-19 minutes, which I think is pretty consistent with what we get in general. So I don't think fixing at two runners will lose us anything. Re-spinning...

@rcomer
Copy link
Member Author

rcomer commented Feb 6, 2026

Gah! Webagg timeout.

@QuLogic
Copy link
Member

QuLogic commented Feb 6, 2026

Possibly related, but when I run locally in parallel, then lib/matplotlib/tests/test_widgets.py::test_span_selector_animated_artists_callback (which uses Qt) often pauses until I mouse over it. I don't know why it seems to need some interaction to proceed, but if I run it by itself, it's fine and exits almost immediately.

There are also some knock-on effects of running this test with Qt, i.e., #31049; perhaps something is crashing, but not correctly raising due to it?

@rcomer
Copy link
Member Author

rcomer commented Feb 8, 2026

Regardless of the time issue, perhaps increasing the density of subprocess-calling tests might trigger more clues.

@rcomer
Copy link
Member Author

rcomer commented Feb 8, 2026

Huh. Azure only runs 28 tests this way with 165 skips.

@rcomer
Copy link
Member Author

rcomer commented Feb 8, 2026

In a sample of 5 tries, I don't get any timeouts on Azure when limiting to the subprocess tests.

@rcomer rcomer closed this Feb 9, 2026
@rcomer rcomer reopened this Feb 9, 2026
@rcomer rcomer closed this Feb 9, 2026
@rcomer rcomer reopened this Feb 9, 2026
@rcomer rcomer closed this Feb 9, 2026
@rcomer rcomer reopened this Feb 9, 2026
@rcomer
Copy link
Member Author

rcomer commented Feb 9, 2026

With the current HEAD, Azure py312 timed out twice in test_blitting_events[MPLBACKEND=tkagg-BACKEND_DEPS=tkinter]. No other failures on Azure.

Let's see if it comes up a third time.

@rcomer rcomer closed this Feb 9, 2026
@rcomer rcomer reopened this Feb 9, 2026
@rcomer rcomer closed this Feb 9, 2026
@rcomer rcomer reopened this Feb 9, 2026
@rcomer rcomer closed this Feb 9, 2026
@rcomer rcomer reopened this Feb 9, 2026
@rcomer rcomer closed this Feb 9, 2026
@rcomer rcomer reopened this Feb 9, 2026
@rcomer rcomer closed this Feb 9, 2026
@rcomer rcomer reopened this Feb 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants