feat: Use session temp tables for all ephemeral storage by TrevorBergeron · Pull Request #1569 · googleapis/python-bigquery-dataframes

TrevorBergeron · 2025-03-28T21:52:34Z

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

tswast · 2025-04-01T14:52:22Z

tests/system/small/test_bq_sessions.py

    assert result_table.clustering_fields == cluster_cols

    session_resource_manager.close()
-    with pytest.raises(google.api_core.exceptions.NotFound):


FWIW: if you sync to main I made a similar fix in https://github.com/googleapis/python-bigquery-dataframes/pull/1572/files. I think we can keep the pytest.raises and just do the loop.

ok, just merged in and used your version

tswast

Thanks! Just a few nits.

tswast · 2025-04-02T16:02:13Z

bigframes/session/__init__.py

+        anon_dataset_manager = getattr(self, "_anon_dataset_manager", None)
+        if anon_dataset_manager:
+            self._anon_dataset_manager.close()
+
+        if getattr(self, "_session_resource_manager", None):
+            if self._session_resource_manager is not None:
+                self._session_resource_manager.close()


Nit: We can make this a little more internally consistent. If you'd like to use getattr in the if statement, let's use the := (walrus operator) so that we aren't fetching from self more than once.

Suggested change

anon_dataset_manager = getattr(self, "_anon_dataset_manager", None)

if anon_dataset_manager:

self._anon_dataset_manager.close()

if getattr(self, "_session_resource_manager", None):

if self._session_resource_manager is not None:

self._session_resource_manager.close()

if anon_dataset_manager := getattr(self, "_anon_dataset_manager", None):

anon_dataset_manager.close()

if session_resource_manager := getattr(self, "_session_resource_manager", None):

session_resource_manager.close()

Committed my own suggestion. 🤞 hopefully I didn't break anything.

tswast · 2025-04-02T16:04:50Z

bigframes/session/__init__.py

            engine=engine,
            write_engine=write_engine,
        )
-        table = self._temp_storage_manager.allocate_temp_table()


~~Does this mean the table isn't created right away? If so, I think we might need to supply a session ID in the load job, if available.~~

Edit: I see this was moved to read_bigquery_load_job, which makes sense to me. Aside: with more hybrid engine stuff in the future, I can imagine some cases where to_gbq() would be doing a load job to a user-managed table, but I suppose that would probably use a very different job config, anyway.

tswast · 2025-04-02T17:10:23Z

bigframes/session/loader.py


        job_config.labels = {"bigframes-api": api_name}
+        job_config.schema_update_options = [
+            google.cloud.bigquery.job.SchemaUpdateOption.ALLOW_FIELD_ADDITION


I'm curious: why ALLOW_FIELD_ADDITION here, but using WRITE_TRUNCATE for read_gbq_load_job? Might be worthwhile to add some comments.

I just want to make sure the ordering_col does not get overridden, as that is what is clustered.
Might still work with TRUNCATE as well?

tswast · 2025-04-02T17:11:25Z

bigframes/session/loader.py

        self._start_generic_job(load_job)
        table_id = f"{table.project}.{table.dataset_id}.{table.table_id}"

-        # Update the table expiration so we aren't limited to the default 24


~~I assume _storage_manager.create_temp_table handles this in the anonymous dataset case now?~~

Edit: Yes, I see we do set an expiration above:

expiration = ( datetime.datetime.now(datetime.timezone.utc) + constants.DEFAULT_EXPIRATION ) table = bf_io_bigquery.create_temp_table( self.bqclient, self.allocate_temp_table(), expiration, schema=schema, cluster_columns=list(cluster_cols), kms_key=self._kms_key, )

feat: Use session temp tables for all ephemeral storage

fc7f7cb

product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Mar 28, 2025

TrevorBergeron added 2 commits March 28, 2025 22:48

fix issues

5f180aa

fallback to anon dataset

0537b44

product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Mar 31, 2025

fix test_clean_up_via_context_manager

b2b8007

TrevorBergeron marked this pull request as ready for review March 31, 2025 21:03

TrevorBergeron requested review from a team as code owners March 31, 2025 21:03

TrevorBergeron requested a review from drylks-work March 31, 2025 21:03

blunderbuss-gcf bot assigned tswast Mar 31, 2025

TrevorBergeron requested review from tswast and removed request for drylks-work March 31, 2025 21:03

fix flaky test_bq_session_create_temp_table_clustered

85d97d4

tswast reviewed Apr 1, 2025

View reviewed changes

Merge remote-tracking branch 'github/main' into use_temp_session_tables

8b864a5

TrevorBergeron requested a review from tswast April 1, 2025 17:59

tswast approved these changes Apr 2, 2025

View reviewed changes

tswast and others added 3 commits April 2, 2025 12:19

Update bigframes/session/__init__.py

f8482eb

Merge branch 'main' into use_temp_session_tables

f0735bd

add commen on allow_field_addition

ba69412

TrevorBergeron enabled auto-merge (squash) April 2, 2025 18:14

dont cluster uncertain load job schemas

b2f463d

TrevorBergeron merged commit 9711b83 into main Apr 2, 2025
17 of 24 checks passed

TrevorBergeron deleted the use_temp_session_tables branch April 2, 2025 21:54

release-please bot mentioned this pull request Apr 2, 2025

chore(main): release 2.0.0 #1552

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Use session temp tables for all ephemeral storage#1569

feat: Use session temp tables for all ephemeral storage#1569
TrevorBergeron merged 10 commits intomainfrom
use_temp_session_tables

TrevorBergeron commented Mar 28, 2025

Uh oh!

tswast Apr 1, 2025 •

edited

Loading

Uh oh!

TrevorBergeron Apr 1, 2025

Uh oh!

tswast left a comment

Uh oh!

tswast Apr 2, 2025

Uh oh!

tswast Apr 2, 2025

Uh oh!

tswast Apr 2, 2025

Uh oh!

tswast Apr 2, 2025

Uh oh!

TrevorBergeron Apr 2, 2025

Uh oh!

tswast Apr 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

TrevorBergeron commented Mar 28, 2025

Uh oh!

tswast Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TrevorBergeron Apr 1, 2025

Choose a reason for hiding this comment

Uh oh!

tswast left a comment

Choose a reason for hiding this comment

Uh oh!

tswast Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

tswast Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

tswast Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

tswast Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

TrevorBergeron Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

tswast Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tswast Apr 1, 2025 •

edited

Loading