Make s3.request_timeout configurable#1568
Conversation
Fokko
left a comment
There was a problem hiding this comment.
Thanks @metadaddy for adding this, I left one comment regarding S3FS, apart from that it looks good to me 👍
| client_kwargs["connect_timeout"] = float(connect_timeout) | ||
|
|
||
| if request_timeout := self.properties.get(S3_REQUEST_TIMEOUT): | ||
| client_kwargs["request_timeout"] = float(request_timeout) |
There was a problem hiding this comment.
| if request_timeout := self.properties.get(S3_REQUEST_TIMEOUT): | ||
| client_kwargs["request_timeout"] = float(request_timeout) |
There was a problem hiding this comment.
| | s3.region | us-west-2 | Configure the default region used to initialize an `S3FileSystem`. `PyArrowFileIO` attempts to automatically resolve the region for each S3 bucket, falling back to this value if resolution fails. | | ||
| | s3.proxy-uri | <http://my.proxy.com:8080> | Configure the proxy server to be used by the FileIO. | | ||
| | s3.connect-timeout | 60.0 | Configure socket connection timeout, in seconds. | | ||
| | s3.request-timeout | 60.0 | Configure socket read timeouts on Windows and macOS, in seconds. | |
There was a problem hiding this comment.
I couldn't find a Java equivalent, so I'm fine with introducing this one 👍
There was a problem hiding this comment.
i found connect-timeout which i think is different from request-timeout
https://github.com/apache/iceberg-go/blob/4b645d698fffaa99c235f54bf33f4340a4414bc5/io/s3.go#L47-L53
1675f74 to
87fcad5
Compare
|
Hi @Fokko - I implemented and pushed your suggested correction. Thanks! |
|
Looks like theres a lint issue, can you make |
87fcad5 to
3d53f42
Compare
|
@kevinjqliu Ah - it wanted imports in alphabetical order - I'd just inserted |
| | s3.region | us-west-2 | Configure the default region used to initialize an `S3FileSystem`. `PyArrowFileIO` attempts to automatically resolve the region for each S3 bucket, falling back to this value if resolution fails. | | ||
| | s3.proxy-uri | <http://my.proxy.com:8080> | Configure the proxy server to be used by the FileIO. | | ||
| | s3.connect-timeout | 60.0 | Configure socket connection timeout, in seconds. | | ||
| | s3.request-timeout | 60.0 | Configure socket read timeouts on Windows and macOS, in seconds. | |
There was a problem hiding this comment.
i found connect-timeout which i think is different from request-timeout
https://github.com/apache/iceberg-go/blob/4b645d698fffaa99c235f54bf33f4340a4414bc5/io/s3.go#L47-L53
|
Thanks for working on this @metadaddy, and thanks @kevinjqliu for the review 🙌 |
|
@Fokko / @kevinjqliu Any plans for a release in the near future? It's been a while since 0.8.1, and I'd like to be able to use a mainline version of PyIceberg in my app, rather than my patch. Thanks! |
|
@metadaddy we're getting ready for the 0.9.0 as we speak :) we also recently added nightly build on testpypi if you want to give that a try https://test.pypi.org/project/pyiceberg/ |

Similarly to #218, we see occasional timeout errors when writing data to S3-compatible object storage:
[I don't believe the issue is specific to the fact that I'm using Backblaze B2 rather than Amazon S3 - I saw references to similar error messages with the latter as I was researching this issue.]
The issue happens when the underlying
PUToperation takes longer than the request timeout, which is set to a default of 3 seconds in the AWS C++ SDK used by Arrow via PyArrow.The changes in this PR allow configuration of
s3.request_timeoutwhen working directly or indirectly withpyiceberg.io.pyarrow.PyArrowFileIO, just as #218 allowed configuration ofs3.connect_timeout.For example, when creating a catalog: