fix: retry 404 errors in Client.query(...)#2135
Conversation
| if job_retry is None: | ||
| future = do_query() | ||
| else: | ||
| future = job_retry(do_query)() |
There was a problem hiding this comment.
Note: even though I add the retry here, we'll still need the one in QueryJob, because BigQuery queries can be started successfully but still fail for a retriablle reason. By adding the retry here, we can retry the queries that fail for a retriable reason.
google/cloud/bigquery/retry.py
Outdated
| # Per https://github.com/googleapis/python-bigquery/issues/2134, sometimes | ||
| # we get a 404 error. In this case, if we get this far, assume that the job | ||
| # doesn't actually exist and try again. | ||
| or isinstance(exc, exceptions.NotFound) |
There was a problem hiding this comment.
Thought: Does this make it so that the queries retry if the table is not found? Or is this only going to happen for query job not found.
There was a problem hiding this comment.
😢 It does hang / retry too much. Tested with
import google.cloud.bigquery
client = google.cloud.bigquery.Client()
job = client.query("SELECT * FROM `swast-scratch.my_dataset.doesntexistnoreally`")
job.result()Need to play a similar trick as with get_job_retry, instead.
There was a problem hiding this comment.
Fixed in latest commit.
|
I confirmed with my colleague that after at |
| get_job_retry = retry.with_predicate( | ||
| lambda exc: isinstance(exc, core_exceptions.NotFound) | ||
| # Reference the original retry to avoid recursion. | ||
| or retry._predicate(exc) |
There was a problem hiding this comment.
@parthea Do you have any thoughts on how stable this will be? I couldn't find any public way to pull out the previous predicate from a Retry object when looking at the https://github.com/googleapis/python-api-core/blob/main/google/api_core/retry/retry_base.py code.
There was a problem hiding this comment.
An alternative to consider: do something like I did with retry versus job_retry and introduce more parameters for job_insert_retry and job_get_retry or something like that.
There was a problem hiding this comment.
Filed https://github.com/googleapis/python-api-core/issues/796, in the meantime I'll pursue an alternative that doesn't require access to private attributes.
| return True | ||
|
|
||
| # Reference the original job_retry to avoid recursion. | ||
| return job_retry._predicate(exc) |
There was a problem hiding this comment.
@parthea likewise here. I ended up using a slightly different pattern here due to the double nesting of retry objects.
|
system-3.12 failure: Edit: Actually, it does appear to be related. It's a different kind of 404 -- trying to query a table in a different location. I'll see if I can disambiguate that from the job 404. Edit2: fixed in 8e20cb0. I use the message to disambiguate job not found from dataset/table not found based on the stack trace in #2134. System test passing locally (and much more quickly). |
|
Kokoro (https://btx.cloud.google.com/invocations/91aed90f-987e-4200-be14-09717b48923f/targets/cloud-devrel%2Fclient-libraries%2Fpython%2Fgoogleapis%2Fpython-bigquery%2Fpresubmit%2Fpresubmit/log) and system-3.12 and snippets tests are passing. Ready for review! |
|
Cover is failing: I'll see if I can update a unit test for this soon. |
| with pytest.raises(DataLoss, match="we lost your job, sorry"): | ||
| client.query("SELECT 1;", job_id=None) | ||
|
|
||
| def test_query_job_rpc_fail_w_conflict_random_id_job_fetch_fails_no_retries(self): |
There was a problem hiding this comment.
Could you help me understand what this test is for? It feels like we are just verifying that QueryJob._begin() and client.get_job() are called.
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #2134 🦕