-
Notifications
You must be signed in to change notification settings - Fork 262
Description
Describe the bug
Hey I experienced a significant performance drop when updating from 0.9.4 to the latest version on cc computation. The first iteration of the cc is little bit slower while for consecutive iteration, the time can increase from several mins to around half an hour. Checked from spark ui, the cpu usage of each executor is almost 0 while super high for the driver. For one stage all the executor can finish the task in seconds while the total time can be half an hour. This might be the algorithm updates or the updates from writing to parquet to the checkpoint.
To Reproduce
Steps to reproduce the behavior:
- ...
- ...
- ...
Expected behavior
System [please complete the following information]:
- OS: e.g. [Ubuntu 18.04]
- Python Version (if applied): [e.g. Python 3.8]
- Spark / PySpark version: [e.g. PySpark 3.5.1] Spark 3.5.4
- GraphFrames version: [e.g. graphframes-0.9.0]
Component
- Scala Core Internal
- Scala API
- Spark Connect Plugin
- PySpark Classic
- PySpark Connect
Additional context
Are you planning on creating a PR?
- I'm willing to make a pull-request