fix: fix pandas.cut errors with empty bins#1499
Conversation
bigframes/core/reshape/tile.py
Outdated
| window_spec=window_specs.unbound(), | ||
| ) | ||
| op = agg_ops.CutOp(bins, right=right, labels=labels) | ||
| if isinstance(bins, typing.Iterable) and len(as_index) == 0: |
There was a problem hiding this comment.
Could we directly return the [pd.NA] * len(x) result from the elif branch len(list(bins)) == 0?
In that case we don't need the additional if-else branch at the bottom of the function, and the logic looks more straightforward. Plus, we don't need to alter the behavior of CutOp.output_type() in aggregations.py too
There was a problem hiding this comment.
All of the elif branch can turn to the [pd.NA] * len(x) result, such as
- 1st elif for
pd.IntervalIndex.from_tuples([]). - 2nd elif for
[] - 4st elif for
[1]
Because of that,CutOp.output_type()might return different result for each case above.
There was a problem hiding this comment.
My main concern is that the "if" check on line 88 has quite some overlap with line 49 + line 56. It seems we are doing the same check repeatedly.
Anyway, this shouldn't affect the overall functionality.
There was a problem hiding this comment.
Just refactor these if branches for more readable. With this refactor, we also catches another edge cases bug. Thanks!
bigframes/core/reshape/tile.py
Outdated
| window_spec=window_specs.unbound(), | ||
| ) | ||
| op = agg_ops.CutOp(bins, right=right, labels=labels) | ||
| if isinstance(bins, typing.Iterable) and len(as_index) == 0: |
There was a problem hiding this comment.
My main concern is that the "if" check on line 88 has quite some overlap with line 49 + line 56. It seems we are doing the same check repeatedly.
Anyway, this shouldn't affect the overall functionality.
dc098b3 to
d67531a
Compare
Fixes internal issue 403638910 🦕