feat: allow multiple columns input for llm models by GarrettWu · Pull Request #998 · googleapis/python-bigquery-dataframes

GarrettWu · 2024-09-18T00:31:11Z

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

sycai · 2024-09-20T18:29:42Z

bigframes/ml/llm.py

        Args:
            X (bigframes.dataframe.DataFrame or bigframes.series.Series):
-                Input DataFrame or Series, which contains only one column of prompts.
+                Input DataFrame or Series, can contain one or more columns. If multiple columns in the DataFrame, it must contain a "prompt" column for prediction.


nit: "If multiple columns are in the DataFrame, they must ..." and for other docs too

"it" refers to the DataFrame. Can add "are" in "If multiple columns are in the DataFrame"

sycai · 2024-09-20T18:31:32Z

tests/system/small/ml/test_llm.py

+    assert "text_embedding" in df.columns
+    series = df["text_embedding"]
+    value = series[0]
+    assert len(value) == 768


nit: maybe we could coalesce line 323 - 325 into a single line?
assert len(df[..][0]) == 768

sure, actually I'll rewrite the tests. Also some are already removed in a recent PR.

tests/system/small/ml/test_llm.py

shobsi · 2024-09-23T19:15:44Z

bigframes/ml/llm.py

-        # BQML identified the column by name
-        col_label = cast(blocks.Label, X.columns[0])
-        X = X.rename(columns={col_label: "prompt"})
+        if len(X.columns) == 1:


I think we should make another check in the else clause - that the multi-column input does have a "prompt" column. Also add negative test for that scenario

@tswast had a suggestion that we shouldn't do much client side checks. I'm trying to follow: if the error message is meaningful to the user, then rely on server side checks. Otherwise we have to wrap server error messages or return client side error messages.

feat: allow multiple columns input for llm models

183f995

GarrettWu self-assigned this Sep 18, 2024

product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Sep 18, 2024

GarrettWu added 3 commits September 18, 2024 17:57

fix

44e2a0f

Merge remote-tracking branch 'github/main' into garrettwu-cols

03599a8

add tests

24cc6a6

GarrettWu requested review from shobsi and sycai September 20, 2024 18:25

GarrettWu marked this pull request as ready for review September 20, 2024 18:25

GarrettWu requested review from a team as code owners September 20, 2024 18:25

sycai requested changes Sep 20, 2024

View reviewed changes

shobsi reviewed Sep 23, 2024

View reviewed changes

GarrettWu added 2 commits September 23, 2024 22:13

Merge remote-tracking branch 'github/main' into garrettwu-cols

ba1060a

resolve comments

b39cde8

GarrettWu requested review from shobsi and sycai September 23, 2024 22:39

sycai approved these changes Sep 23, 2024

View reviewed changes

GarrettWu and others added 2 commits September 24, 2024 11:20

Merge branch 'main' into garrettwu-cols

87ff74a

Merge remote-tracking branch 'github/main' into garrettwu-cols

4b81d99

GarrettWu enabled auto-merge (squash) September 24, 2024 18:58

GarrettWu added 4 commits September 24, 2024 20:33

Merge remote-tracking branch 'github/main' into garrettwu-cols

ea75981

fix tests

3a8d56b

fix mypy

49f648a

fix test

e2ec0b1

GarrettWu merged commit 2fe5e48 into main Sep 25, 2024

GarrettWu deleted the garrettwu-cols branch September 25, 2024 18:52

release-please bot mentioned this pull request Sep 25, 2024

chore(main): release 1.20.0 #1017

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: allow multiple columns input for llm models#998

feat: allow multiple columns input for llm models#998
GarrettWu merged 12 commits intomainfrom
garrettwu-cols

GarrettWu commented Sep 18, 2024

Uh oh!

sycai Sep 20, 2024

Uh oh!

GarrettWu Sep 23, 2024

Uh oh!

GarrettWu Sep 23, 2024

Uh oh!

sycai Sep 20, 2024

Uh oh!

GarrettWu Sep 23, 2024

Uh oh!

GarrettWu Sep 23, 2024

Uh oh!

Uh oh!

shobsi Sep 23, 2024 •

edited

Loading

Uh oh!

GarrettWu Sep 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

GarrettWu commented Sep 18, 2024

Uh oh!

sycai Sep 20, 2024

Choose a reason for hiding this comment

Uh oh!

GarrettWu Sep 23, 2024

Choose a reason for hiding this comment

Uh oh!

GarrettWu Sep 23, 2024

Choose a reason for hiding this comment

Uh oh!

sycai Sep 20, 2024

Choose a reason for hiding this comment

Uh oh!

GarrettWu Sep 23, 2024

Choose a reason for hiding this comment

Uh oh!

GarrettWu Sep 23, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

shobsi Sep 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GarrettWu Sep 23, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shobsi Sep 23, 2024 •

edited

Loading