feat: add DefaultIndexKind.NULL to use as index_col in read_gbq*, creating an indexless DataFrame/Series#662
feat: add DefaultIndexKind.NULL to use as index_col in read_gbq*, creating an indexless DataFrame/Series#662
DefaultIndexKind.NULL to use as index_col in read_gbq*, creating an indexless DataFrame/Series#662Conversation
bigframes/core/blocks.py
Outdated
| """Executes deferred operations and downloads the results.""" | ||
| if len(self.column_ids) == 0: | ||
| raise bigframes.exceptions.NullIndexError( | ||
| "Cannot perform this operation without an index. Set an index using set_index." |
There was a problem hiding this comment.
Let's be more specfic about the operation. As a user I would appreciate knowing why can't do this. Seems a bit obvious that we can't get a pandas index because there isn't an index, but let's spell it out.
There was a problem hiding this comment.
rewrote message
bigframes/session/__init__.py
Outdated
| # Create Default Sequential Index if still have no index | ||
| # ---------------------------------------------------- | ||
|
|
||
| if not index_col and len(index_cols) == 0: |
There was a problem hiding this comment.
This reads odd to me: len(index_cols) == 0 implies not index_col. Could you rephrase what you're trying to check here? I assume you're trying to exclude the DefaultIndexKind enum from this check. Let's be explicit about that.
There was a problem hiding this comment.
yeah, basically, trying to fall back to sequential index if don't have null index, user provided index columns, or metadata-derived index columns. Rewrote the condition though with the code structure, its still a bit weird.
There was a problem hiding this comment.
What are your overall thoughts on testing? How can we be confident that empty/null index works? As we add operations that should support null/empty index, we add tests here and in the usual location?
There was a problem hiding this comment.
Hmm, I think it will be tricky, as null index tests will require manually configured expectations for many tests, as we cannot always compare against pandas (which doesn't have null index). So yeah, I think we will be stuck with a parallel test suite, which will be a bit burdensome to maintain. Being pandas-equivalent has been a huge boon for testing thus far.
bigframes/core/blocks.py
Outdated
| and (sort is False) | ||
| and (block_identity_join is False) |
There was a problem hiding this comment.
Since these are only bool per type checking, we don't have to worry about any false-y values to mess up "not"
| and (sort is False) | |
| and (block_identity_join is False) | |
| and not sort | |
| and not block_identity_join |
There was a problem hiding this comment.
Question: Doesn't block_identity_join imply indexless join is allowed? What's this checking for?
Edit: found it
Apparently block_identity_join is the opposite of what I thought it is. Please add a docstring to this function explaining what these args mean.
There was a problem hiding this comment.
added docstring
Co-authored-by: Tim Sweña (Swast) <swast@google.com>
Co-authored-by: Tim Sweña (Swast) <swast@google.com>
Co-authored-by: Tim Sweña (Swast) <swast@google.com>
|
Test failure |
DefaultIndexKind.NULL to use as index_col in read_gbq*, creating an indexless DataFrame/Series
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕