feat: add dry_run parameter to read_gbq(), read_gbq_table() and read_gbq_query()#1674
feat: add dry_run parameter to read_gbq(), read_gbq_table() and read_gbq_query()#1674
read_gbq(), read_gbq_table() and read_gbq_query()#1674Conversation
👎 That's a bit misleading. There are some code paths that do fallback to query (e.g. if max_results) is set. Those should have a dry run because they do immediately run a query. But for a deferred operation, I don't think dry run makes sense. Instead, let's populate what we can from the table metadata and have some indicator that no query is actually run. |
Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com>
Sounds good. Code updated. Now read_gbq_table dry run looks like this: https://screenshot.googleplex.com/AHaxiSsafniVFRN |
bigframes/session/dry_runs.py
Outdated
| col_dtypes = dtypes.bf_type_from_type_kind(table.schema) | ||
| index.append("tableColumnCount") | ||
| values.append(len(col_dtypes)) | ||
| index.append("tableColumnTypes") |
There was a problem hiding this comment.
It's not super easy for end user too predict if something will result in a query or just read the table directly. Could we try to align these names so that they don't need as much logic to handle one case over the other?
read_gbq(), read_gbq_table() and read_gbq_query()
If a table reference is fed toread_gbq()with dry_run set to True, we will useSELECT * FROM {table_ref}for dry runFor
read_gbq(), andread_gbq_table()calls that do not ultimately lead to SQL conversions, we use the table metadata for dry run stats report.