fix: remove pre-caching of remote function results#1028
Conversation
This will save redundant remote function execution as reported in b/370088754. The trade-off is that any remote function integration issues will be caughts only at a later point through a usage triggered sql execution. Why the caching is not working as indended in the reported use case in the bug? as per TrevorBergeron@ "the cached execution is useless when it is implicitly joined back to the base dataframe"
|
So after the change, when does the execution happens? or if at the later stage, if the user calls exec APIs (such as repr) multiple times, will the function be executed multiple times? |
|
Is there a way that we can add test to ensure that with multiple operations, there won't be redundant function calls? |
It would be the latter. After this change the behavior would be same as the standard bigframes behavior - if the result is cached it won't be executed, else it would be. |
It's tied to caching. Chatted with @TrevorBergeron that caching is not a guarantee at this point but more like an internal optimization. Asserting how many times a function was called is a bit of a stretch, just like asserting how many number of sql jobs were run in an operation, so I'd skip adding a test. Let me know if you have s strong opinion otherwise. |
This will save redundant remote function execution as reported in b/370088754. The trade-off is that any remote function integration issues will be caughts only at a later point through a usage triggered sql execution. Why the caching is not working as indended in the reported use case in the bug? as per TrevorBergeron@ "the cached execution is useless when it is implicitly joined back to the base dataframe"
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes internal issue 370088754 🦕