feat: support array output in remote_function#1057
Conversation
This is feature request to support use cases like creating custom feature vectors, embeddings etc.
…tr array outputs
|
|
||
| # if the output is an array, reconstruct it from the json serialized | ||
| # string form | ||
| if bigframes.dtypes.is_array_like(func.output_dtype): |
There was a problem hiding this comment.
Do we actually handle any array-like dtype?
There was a problem hiding this comment.
Um, in this PR we are looking to support types like list[int] on the output side? Or I didn't get you?
|
|
||
| # if the output is an array, reconstruct it from the json serialized | ||
| # string form | ||
| if bigframes.dtypes.is_array_like(func.output_dtype): |
There was a problem hiding this comment.
seems the code within this block assume not just array_like, but specifically that it is a pyarrow list_ type
There was a problem hiding this comment.
That's exactly what the array_like implementation checks?
python-bigquery-dataframes/bigframes/dtypes.py
Lines 301 to 304 in 5a2731b
There was a problem hiding this comment.
eh, probably fine then, I don't really see array_like definition expanding anytime soon
bigframes/functions/_utils.py
Outdated
| return None | ||
|
|
||
| try: | ||
| python_output_type = eval(output_type) |
There was a problem hiding this comment.
eval always makes me a bit uncomfortable - can we do this in a more constrained way?
There was a problem hiding this comment.
removed eval in the latest patch, PTAL
bigframes/functions/_utils.py
Outdated
| if typing.get_origin(python_output_type) is list: | ||
| python_output_type_ser = repr(python_output_type) | ||
| else: | ||
| python_output_type_ser = python_output_type.__name__ |
There was a problem hiding this comment.
shoudl we bother with non-list types right now?
There was a problem hiding this comment.
throwing error for non-array and not-supported-array types in the latest patch, PTAL
* feat: support array output in `remote_function` This is feature request to support use cases like creating custom feature vectors, embeddings etc. * add multiindex test * move array type conversion to bigquery module, test multiindex * add `bigframes.bigquery.json_extract_string_array`, support int and str array outputs * increase cleanup rate * update input and output types doc * support array output in DataFrame.apply * support read_gbq_function on a remote function created for array output * fix the json_set after variable renaming * add tests for output_type in read_gbq_function * temporarily exclude system 3.9 tests and include 3.10 and 3.11 * Revert "temporarily exclude system 3.9 tests and include 3.10 and 3.11" This reverts commit 2485aa3. * add more info in the unexpected exception * more debug info * use unique routine name across tests * Revert "more debug info" This reverts commit 86fe316. * Revert "add more info in the unexpected exception" This reverts commit fe010cb. * support array output in binary remote function operations * support array output in nary remote function operations * preserve array output type in function description to avoid explit output_type in read_gbq_function * fix one failing read_gbq_function test * make test parameterization order deterministic * fix sorting of types for mypy * remove test parameterization with sorting inside * include partial ordering mode testing for read_gbq_function * add remote function array out test in partial ordering mode * avoid repr-eval for output type serialization/deserialization * remove unsupported scenarios system tests, use common exception for unsupported
* feat: support array output in `remote_function` This is feature request to support use cases like creating custom feature vectors, embeddings etc. * add multiindex test * move array type conversion to bigquery module, test multiindex * add `bigframes.bigquery.json_extract_string_array`, support int and str array outputs * increase cleanup rate * update input and output types doc * support array output in DataFrame.apply * support read_gbq_function on a remote function created for array output * fix the json_set after variable renaming * add tests for output_type in read_gbq_function * temporarily exclude system 3.9 tests and include 3.10 and 3.11 * Revert "temporarily exclude system 3.9 tests and include 3.10 and 3.11" This reverts commit 2485aa3. * add more info in the unexpected exception * more debug info * use unique routine name across tests * Revert "more debug info" This reverts commit 86fe316. * Revert "add more info in the unexpected exception" This reverts commit fe010cb. * support array output in binary remote function operations * support array output in nary remote function operations * preserve array output type in function description to avoid explit output_type in read_gbq_function * fix one failing read_gbq_function test * make test parameterization order deterministic * fix sorting of types for mypy * remove test parameterization with sorting inside * include partial ordering mode testing for read_gbq_function * add remote function array out test in partial ordering mode * avoid repr-eval for output type serialization/deserialization * remove unsupported scenarios system tests, use common exception for unsupported
* feat: support array output in `remote_function` This is feature request to support use cases like creating custom feature vectors, embeddings etc. * add multiindex test * move array type conversion to bigquery module, test multiindex * add `bigframes.bigquery.json_extract_string_array`, support int and str array outputs * increase cleanup rate * update input and output types doc * support array output in DataFrame.apply * support read_gbq_function on a remote function created for array output * fix the json_set after variable renaming * add tests for output_type in read_gbq_function * temporarily exclude system 3.9 tests and include 3.10 and 3.11 * Revert "temporarily exclude system 3.9 tests and include 3.10 and 3.11" This reverts commit 2485aa3. * add more info in the unexpected exception * more debug info * use unique routine name across tests * Revert "more debug info" This reverts commit 86fe316. * Revert "add more info in the unexpected exception" This reverts commit fe010cb. * support array output in binary remote function operations * support array output in nary remote function operations * preserve array output type in function description to avoid explit output_type in read_gbq_function * fix one failing read_gbq_function test * make test parameterization order deterministic * fix sorting of types for mypy * remove test parameterization with sorting inside * include partial ordering mode testing for read_gbq_function * add remote function array out test in partial ordering mode * avoid repr-eval for output type serialization/deserialization * remove unsupported scenarios system tests, use common exception for unsupported
* feat: support array output in `remote_function` This is feature request to support use cases like creating custom feature vectors, embeddings etc. * add multiindex test * move array type conversion to bigquery module, test multiindex * add `bigframes.bigquery.json_extract_string_array`, support int and str array outputs * increase cleanup rate * update input and output types doc * support array output in DataFrame.apply * support read_gbq_function on a remote function created for array output * fix the json_set after variable renaming * add tests for output_type in read_gbq_function * temporarily exclude system 3.9 tests and include 3.10 and 3.11 * Revert "temporarily exclude system 3.9 tests and include 3.10 and 3.11" This reverts commit 2485aa3. * add more info in the unexpected exception * more debug info * use unique routine name across tests * Revert "more debug info" This reverts commit 86fe316. * Revert "add more info in the unexpected exception" This reverts commit fe010cb. * support array output in binary remote function operations * support array output in nary remote function operations * preserve array output type in function description to avoid explit output_type in read_gbq_function * fix one failing read_gbq_function test * make test parameterization order deterministic * fix sorting of types for mypy * remove test parameterization with sorting inside * include partial ordering mode testing for read_gbq_function * add remote function array out test in partial ordering mode * avoid repr-eval for output type serialization/deserialization * remove unsupported scenarios system tests, use common exception for unsupported
* feat: support array output in `remote_function` This is feature request to support use cases like creating custom feature vectors, embeddings etc. * add multiindex test * move array type conversion to bigquery module, test multiindex * add `bigframes.bigquery.json_extract_string_array`, support int and str array outputs * increase cleanup rate * update input and output types doc * support array output in DataFrame.apply * support read_gbq_function on a remote function created for array output * fix the json_set after variable renaming * add tests for output_type in read_gbq_function * temporarily exclude system 3.9 tests and include 3.10 and 3.11 * Revert "temporarily exclude system 3.9 tests and include 3.10 and 3.11" This reverts commit 2485aa3. * add more info in the unexpected exception * more debug info * use unique routine name across tests * Revert "more debug info" This reverts commit 86fe316. * Revert "add more info in the unexpected exception" This reverts commit fe010cb. * support array output in binary remote function operations * support array output in nary remote function operations * preserve array output type in function description to avoid explit output_type in read_gbq_function * fix one failing read_gbq_function test * make test parameterization order deterministic * fix sorting of types for mypy * remove test parameterization with sorting inside * include partial ordering mode testing for read_gbq_function * add remote function array out test in partial ordering mode * avoid repr-eval for output type serialization/deserialization * remove unsupported scenarios system tests, use common exception for unsupported
* feat: support array output in `remote_function` This is feature request to support use cases like creating custom feature vectors, embeddings etc. * add multiindex test * move array type conversion to bigquery module, test multiindex * add `bigframes.bigquery.json_extract_string_array`, support int and str array outputs * increase cleanup rate * update input and output types doc * support array output in DataFrame.apply * support read_gbq_function on a remote function created for array output * fix the json_set after variable renaming * add tests for output_type in read_gbq_function * temporarily exclude system 3.9 tests and include 3.10 and 3.11 * Revert "temporarily exclude system 3.9 tests and include 3.10 and 3.11" This reverts commit 2485aa3. * add more info in the unexpected exception * more debug info * use unique routine name across tests * Revert "more debug info" This reverts commit 86fe316. * Revert "add more info in the unexpected exception" This reverts commit fe010cb. * support array output in binary remote function operations * support array output in nary remote function operations * preserve array output type in function description to avoid explit output_type in read_gbq_function * fix one failing read_gbq_function test * make test parameterization order deterministic * fix sorting of types for mypy * remove test parameterization with sorting inside * include partial ordering mode testing for read_gbq_function * add remote function array out test in partial ordering mode * avoid repr-eval for output type serialization/deserialization * remove unsupported scenarios system tests, use common exception for unsupported
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
remote_function: screen/5rMtCZVaUYKdqxPSeries.apply: screen/9HkKMuWxMvbbPgfDataFrame.apply: screen/BoXH9A7d4hGpETuFixes internal issue 298876217 🦕