feat: to_datetime supports utc=False for string inputs#579
Conversation
bigframes/core/tools/datetimes.py
Outdated
| f"Customized formats are not supported for string inputs when utc=False. Please set utc=True if possible. {constants.FEEDBACK_LINK}" | ||
| ) | ||
|
|
||
| assert not utc |
There was a problem hiding this comment.
I think we can trust the if condition on this one
bigframes/core/tools/datetimes.py
Outdated
| # Cast to DATETIME shall succeed if all inputs are tz-naive. | ||
| if not result.isnull().any(): | ||
| return result |
There was a problem hiding this comment.
We are probably ok with nulls in the output in the locations where nulls were in the input?
There was a problem hiding this comment.
We are using a safe_cast here to test out if all inputs are timezone-naive. Null output indicates that the time strings are timezone-aware.
There was a problem hiding this comment.
It seems that a null input becomes NaT in pandas, rather than failing the whole operation. So I think None->None is fine, I think the problem is when you have String->None indicating the cast from string to date failed.
There was a problem hiding this comment.
Good catch! And thank you for the fix!!
bigframes/core/tools/datetimes.py
Outdated
| # Verify if all the inputs are in UTC. | ||
| all_utc = arg._apply_unary_op( | ||
| ops.EndsWithOp( | ||
| pat=("Z", "-00:00", "+00:00", "-0000", "+0000", "-00", "+00") | ||
| ) | ||
| ).all() |
There was a problem hiding this comment.
Just an optimization, but at this point we have done a few queries in serial, I wonder if it would be faster to do this check at the same time as the NULL check.
| @@ -794,14 +795,27 @@ def isin_op_impl(x: ibis_types.Value, op: ops.IsInOp): | |||
| @scalar_op_compiler.register_unary_op(ops.ToDatetimeOp, pass_op=True) | |||
| def to_datetime_op_impl(x: ibis_types.Value, op: ops.ToDatetimeOp): | |||
There was a problem hiding this comment.
I do worry this operation is getting a bit too complicated. Do the type rules reflect the fact that a DateTime will be returned for utc==False?.
There was a problem hiding this comment.
Will work on the refactor in a separate PR :-)
| # The default unit is set to "ns" (nanoseconds) for consistency | ||
| # with pandas, where "ns" is the default unit for datetime operations. | ||
| unit = op.unit or "ns" | ||
| x = numeric_to_datatime(x, unit) |
There was a problem hiding this comment.
I assume this should be numeric_to_datetime?
Fixes internal b/330575291 🦕