Allow setting write.parquet.row-group-limit#1016
Conversation
And update the docs
5b91696 to
46afeaf
Compare
|
LGTM @Fokko - merging in the change from main to resolve the conflict on the doc |
…o/iceberg-python into fd-allow-setting-max-row-group-size
|
Also threw in a test here 👍 |
| | -------------------------------------- | --------------------------------- | ------- | ------------------------------------------------------------------------------------------- | | ||
| | `write.parquet.compression-codec` | `{uncompressed,zstd,gzip,snappy}` | zstd | Sets the Parquet compression coddec. | | ||
| | `write.parquet.compression-level` | Integer | null | Parquet compression level for the codec. If not set, it is up to PyIceberg | | ||
| | `write.parquet.row-group-limit` | Number of rows | 1048576 | The upper bound of the number of entries within a single row group | |
There was a problem hiding this comment.
@Fokko @sungwy Thanks, I believe this has resolved my issue #1012 as well.
However, I would like to remind you that this option already exists in the doc, right after write.parquet.dict-size-bytes, the UI doesn't allow me to leave a comment there, so please expand the collapsed area to see it.
Additionally, I'm kind of curious as to why the default value used this time is significantly larger than the previous one?
There was a problem hiding this comment.
Thank you for flagging this @zhongyujiang - I'll get the second one below with the older default value removed.
To my understanding the new value is the correct default value that matches the default in the PyArrow ParquetWriter: https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetWriter.html
* Allow setting `write.parquet.row-group-limit` And update the docs * Add test * Make ruff happy --------- Co-authored-by: Sung Yun <107272191+sungwy@users.noreply.github.com>
* Allow setting `write.parquet.row-group-limit` And update the docs * Add test * Make ruff happy --------- Co-authored-by: Sung Yun <107272191+sungwy@users.noreply.github.com>
* Allow setting `write.parquet.row-group-limit` And update the docs * Add test * Make ruff happy --------- Co-authored-by: Sung Yun <107272191+sungwy@users.noreply.github.com>
* Allow setting `write.parquet.row-group-limit` And update the docs * Add test * Make ruff happy --------- Co-authored-by: Sung Yun <107272191+sungwy@users.noreply.github.com>
And update the docs
Fixes #1013