feat: Add UUID and TIME_UUID as feature types (#5885)#5951
feat: Add UUID and TIME_UUID as feature types (#5885)#5951soooojinlee wants to merge 12 commits intofeast-dev:masterfrom
Conversation
1d4cd01 to
4a5c932
Compare
|
@soooojinlee , thanks so much for putting this together! Can you rebase to bring this PR up to date? |
2c56521 to
1fd106a
Compare
26198a6 to
525ac72
Compare
cb6dd44 to
54c2eae
Compare
2dd648a to
b280364
Compare
|
@soooojinlee , can you also add support for UUID_SET and TIME_UUID_SET? |
|
@soooojinlee , can you also update the docs with the newly supported types? |
180434b to
41cfe7e
Compare
I added UUID_SET / TIME_UUID_SET support and updated the type system documentation as requested in here 41cfe7e |
3bf640a to
dceaa46
Compare
franciscojavierarceo
left a comment
There was a problem hiding this comment.
one small nit pick for readability please
aca5741 to
17aebe6
Compare
|
I addressed all feedback in 17aebe6 |
There was a problem hiding this comment.
🟡 ODFV sample input uses list value when singleton=True, contradicting singleton contract
OnDemandFeatureView._construct_random_input(singleton=True) is intended to produce singleton (scalar) sample values, but the fallback default_value is set to [None] when singleton is True.
Actual: for any feature type that isn’t present in sample_values, the generated input will contain a list ([None]) even in singleton mode.
Expected: in singleton mode, the fallback should be a scalar None.
Root Cause / Impact
In sdk/python/feast/on_demand_feature_view.py:1031-1035, sample_values is converted to scalars when singleton is True:
sample_values = {k: v[0] for k, v in sample_values.items()}
…but immediately after, the fallback is inverted:
default_value = None if not singleton else [None]
So any missing mapping yields a list in singleton mode, which can break UDF schema inference or type checking for singleton-mode transforms (inputs become list-typed unexpectedly).
(Refers to lines 1031-1035)
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
I think this is not part of my changes. should I open a seperate issue to track this??
Signed-off-by: soojin <soojin@dable.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: soojin <soojin@dable.io>
Signed-off-by: soojin <soojin@dable.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: soojin <soojin@dable.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: soojin <soojin@dable.io>
Add uuid_val, time_uuid_val, uuid_list_val, time_uuid_list_val as dedicated oneof fields in the Value proto message, replacing the previous reuse of string_val/string_list_val. This allows UUID types to be identified from the proto field alone without requiring a feature_types side-channel. Backward compatibility is maintained for data previously stored as string_val. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: soojin <soojin@dable.io>
Signed-off-by: soojin <soojin@dable.io>
Signed-off-by: soojin <soojin@dable.io>
Signed-off-by: soojin <soojin@dable.io>
Signed-off-by: soojin <soojin@dable.io>
Add Set(Uuid) and Set(TimeUuid) as feature types with full roundtrip support, backward compatibility, and documentation for all UUID types. Signed-off-by: soojin <soojin@dable.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ype mappings Keep PDF_BYTES=30 and IMAGE_BYTES=31 at their upstream values instead of renumbering them. Shift UUID types to 32-37 in both proto and Python enum. Also add missing SET type entries in _convert_value_type_str_to_value_type(), convert_array_column(), and _get_sample_values_by_type() for completeness. Signed-off-by: soojin <soojin@dable.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The comment claimed Sets do not support UUID/TimeUuid but the code intentionally allows them. Updated to reflect actual behavior. Signed-off-by: soojin <soojin@dable.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…o top Signed-off-by: soojin <soojin@dable.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
17aebe6 to
cc94cba
Compare
What this PR does / why we need it:
Adds
UUIDandTIME_UUIDas native Feast feature types, resolving #5885. Currently UUID values must be stored as STRING, which loses type semantics, prevents backend-specific features (e.g. Cassandra timeuuid range queries), and makes PostgreSQLuuidcolumns infer as STRING. This PR enables users to declare UUID features withField(name="user_id", dtype=Uuid)and receiveuuid.UUIDobjects fromget_online_features().to_dict().Design Decisions
Why two types (UUID vs TIME_UUID)?
The issue author explicitly requested distinguishing time-based UUID (uuid1) and random UUID (uuid4). Both serialize
identically to
stringin proto, but separate types allow expressing intent in feature definitions and enable future backend-specific optimizations.Why dedicated proto fields (
uuid_val,time_uuid_val)?Following the pattern established by SET types (PR #5888) and UNIX_TIMESTAMP (which reuses
int64/Int64List), we add dedicated oneof fields that reuse existing proto scalar types (stringandStringList). This allowsWhichOneof("val")to identify UUID types directly from the proto message, without requiring a side-channel.Backward compatibility for data stored before this change:
OnlineResponseaccepts an optionalfeature_typesdict. When data was previously stored asstring_val, this metadata enablesfeast_value_type_to_python_type()to convert it touuid.UUID. New materializations useuuid_val/time_uuid_valand are identified automatically.Changes
Value.proto, generated*_pb2.py/*_pb2.pyiUUID=30,TIME_UUID=31,UUID_LIST=32,TIME_UUID_LIST=33toValueType.Enum; adduuid_val,time_uuid_val,uuid_list_val,time_uuid_list_valtoValue.oneofvalue_type.py,types.pyUUID,TIME_UUID,UUID_LIST,TIME_UUID_LISTenums andUuid/TimeUuidaliasestype_map.pystring_valtouuid_val; addPROTO_VALUE_TO_VALUE_TYPE_MAPentries for UUID fieldsonline_response.py,online_store.py,feature_store.py,utils.pyfeature_typesmetadata for backward-compatible deserializationon_demand_feature_view.pyBackward Compatibility
string_valstill deserializes correctly via thefeature_typesside-channeluuid_val/time_uuid_valproto fieldsfeast_value_type_to_python_type(v)withoutfeature_typenow returnsuuid.UUIDforuuid_valfields (previously returned plain string forstring_val)ValueType.UUID(previouslyValueType.STRING)Tests
test_types.py: Uuid/TimeUuid ↔ ValueType bidirectional conversion, Array typestest_type_map.py: Proto roundtrip withuuid_val,uuid.UUIDobject return, backward compatibility forstring_val, UUID list roundtrip, PostgreSQL mapping