Skip to content

feat: ConnectedComponents to run without prepare #758

@sonalgoyal

Description

@sonalgoyal

Is your feature request related to a problem? Please describe.

In many cases, the edges passed to ConnectedComponnets may already be sorted and come with long identifiers. For example, in Zingg, our edges A-B always have B greater than A. ConnectedComponent's prepare becomes an unnecessary computation in such cases. I am not sure if having no attributes is something that needs to be looked at as well, since there will be many cases where the edges have no attributes.

Describe the solution you would like

If the API could be enhanced to take in parameters which denote if the edges are already deduplicated, have no attributes and have src < dst, the computation in prepare can be avoided.

Component

  • [ X ] Scala Core Internal
  • Scala API
  • Spark Connect Plugin
  • Infrastructure
  • PySpark Classic
  • PySpark Connect

Metadata

Metadata

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions