Conversation
|
Review requested:
|
Concurrency ControlFor concurrency control, SQLite provides a few options. Multi-threaded seems to be a good fit.
We just need a way to guarantee that no two threads will be using |
I wonder if, for the first version of this API if the
|
ac01d39 to
ed659be
Compare
dbf9d40 to
4ffdccf
Compare
6561e49 to
9af39ec
Compare
9af39ec to
ed24a9b
Compare
ed24a9b to
710f3b5
Compare
4bf5609 to
8ce4e74
Compare
8ce4e74 to
081d7c5
Compare
e46021e to
5e6d3a2
Compare
5e6d3a2 to
7acd4eb
Compare
|
having implemented worker thread offloading i can state that there's a significant amount of gnarly lifecycle plumbing involved, especially if one wants to support transactions |
Yeah. I will build a solution based on the authorizer, and let's see what you think about it. |
I think the API should reflect this. If it is named Maybe calling it something like Or maybe |
| 'SQLITE_ENABLE_RBU', | ||
| 'SQLITE_ENABLE_RTREE', | ||
| 'SQLITE_ENABLE_SESSION', | ||
| 'SQLITE_THREADSAFE=2', |
There was a problem hiding this comment.
Is there a performance penalty for purely single-thread use cases?
If so, would it make sense to allow setting the threading mode at runtime?
There was a problem hiding this comment.
I don't know for sure about any performance penalti. But as discusses in the issue, due to the node.js nature, this will work fine for sync. Better-sqlite uses it.
There was a problem hiding this comment.
Understood. Thanks! This can be closed.
If anything I think this will speed up single-threaded use cases, because apparently by default SQLite uses internal serialization. Did not test it though.
There was a problem hiding this comment.
(I was going to post this on the main thread, but this is the context that matters, so I'll add it here) (edit: I rewrote for brevity):
Synchronous node:sqlite operations can run while your new async operations are executing. This will cause concurrent access to the same SQLite connection from multiple threads. This violates SQLITE_THREADSAFE=2 requirements and can cause db corruption undefined behavior (I could have sworn I read about this on the how-to-corrupt page but I don't see it now.)
SQLITE_THREADSAFE=2 requires no connection be used by multiple threads simultaneously (docs). Currently, nothing prevents sync methods (main thread) from running while async operations are on the thread pool.
Suggested fix: throw from sync methods when has_running_task_ is true:
diff --git a/src/node_sqlite.cc b/src/node_sqlite.cc
--- a/src/node_sqlite.cc
+++ b/src/node_sqlite.cc
@@ -1257,6 +1257,8 @@ void Database::Prepare(const FunctionCallbackInfo<Value>& args) {
ASSIGN_OR_RETURN_UNWRAP(&db, args.This());
Environment* env = Environment::GetCurrent(args);
THROW_AND_RETURN_ON_BAD_STATE(env, !db->IsOpen(), "database is not open");
+ THROW_AND_RETURN_ON_BAD_STATE(env, db->has_running_task_,
+ "Cannot call sync methods while async operation is running");Same check needed in Exec() (sync path), Function(), Aggregate(), SetAuthorizer(), and LoadExtension(). Tested locally — works as expected.
(SQLITE_THREADSAFE=1 would also prevent undefined behavior, but the main thread would silently block on SQLite's internal mutex until the async op completes, defeating the purpose of async.)
There was a problem hiding this comment.
Thank you for the in-depth examples. I'd like to mention another option (which I thought has been discussed in the issue, but maybe we were talking past each other):
Have one sqlite connection for synchronous access and maintain a pool of (seperate) sqlite3 db connections for asynchronous access. For each asynchronous access one connection from the pool would be exclusively used. The main difficulty with this approach is obviously db.prepare(). One would have to track which db connection owns the prepared statement instance and only dispatch operations from that prepared statement object to its owning db connection. Another drawback of that design is the inherent potential of SQLITE_BUSY failures for concurrent write access.
There was a problem hiding this comment.
Yeah, if you don’t want to cross the streams, you need multiple database instances opened. Rather than transparently managing this magically begins the scenes in a pool, though, I think it may be less surprising to actually push the complexity of managing per -database handles to the user—they can figure out what makes sense in their app, and when one query wedges another, it’s their fault.
The only viable solution is not support custom functions with the async mode. Basically you would need to move the computation back to the main thread, causing a possible deadlock situation. |
That sounds good to me |
1f44af2 to
45baba7
Compare
|
Here's the state of this PR by now.
|
All of them are good options but the argument of consistency is very strong. Looking at other modules with the perspective of what is exported. Zlib exports functions with Maybe this can be a turning point? I saw that @mcollina added the |
|
I don't find the consistency argument very strong, because as far as I am aware there is no precedent of postfixing a class with I also don't think creating a separate In addition, we are staying quite close to the SQLite API, which I think it a good thing to do. The current I wonder what you think about this API and if it would be feasible. |
That's why I mentioned the perspective. If you look at what is exported, they follow this rule.
I like your proposal. It brings flexibility, with the If we all agree we can drop the The machinery will still have complexity because that's how it is. Every work must be splitted to run part on thread pool (SQLite stuff) part on main thread (object creation like arrays and objects, and promise resolving). |
Agreed. I thought this would make it possible to use single threaded mode for individual database connections, but this is not possible as pointed out by @BurningEnlightenment. Glad you like it though!
Yes, but this API makes it clear that it not 'just' a SQLite database handle. |
| } else { | ||
| ThreadPoolWork* next_work = task_queue_.front(); | ||
| task_queue_.pop(); | ||
| next_work->ScheduleWork(); |
There was a problem hiding this comment.
As mentioned before, setting the thread safety build flag to 1/serializable is pretty much guaranteed to be at least as efficient as building our own scheduling mechanism, so I'd remove this code and instead do that
| ASSIGN_OR_RETURN_UNWRAP(&db, args.This()); | ||
| Environment* env = Environment::GetCurrent(args); | ||
| THROW_AND_RETURN_ON_BAD_STATE(env, !db->IsOpen(), "database is not open"); | ||
| db->is_closing_.store(true, std::memory_order_release); | ||
| db->FinalizeStatements(); |
There was a problem hiding this comment.
If I'm reading this correctly, FinalizeStatements() and DeleteSessions() run before the has_running_task_ check at line 1238. Both call into SQLite (sqlite3_finalize, sqlite3session_delete) accessing connection_ on the main thread, while the thread pool may be executing sqlite3_exec(connection_).
The sqlite3_close_v2 is correctly deferred, but these cleanup calls should be deferred too - either move them after the has_running_task_ check, or into MaybeCloseConnection().
mcollina
left a comment
There was a problem hiding this comment.
I would actually prefer if sync and async code paths would not cross. In other words, if I opened a db asynchronously, I would not be able to block the main thread on it. This distinction would prevent all sort of user problems.
|
I'm not even convinced we need a pool. That is not what node-sqlite3 was doing, wasn't it? I would keep |
|
Just a single thread? Not sure about node-sqlite3 but that would simplify things a lot. |
|
Edit: if Node.js/libuv had lightweight threads, the resource management simplicity of one thread per FWIW, node-sqlite3 uses libuv mutexes and |
|
Disclosure: this comment was drafted with AI assistance and reviewed/edited by me. I dug into TryGhost/node-sqlite3 (current
1) Addon-level scheduler: when work is queued vs dispatchedA Scheduling is done in Roughly:
Meaning:
Clarification of terms (important):
So queueing here is not just about thread-safety; it enforces operation semantics and lifecycle invariants. 2) SQLite mutexes: protecting the shared connection internalsEven with the addon queue, work still executes on threadpool workers and may touch the same connection handle.
Additionally, statement worker code explicitly acquires the per-DB SQLite mutex ( That protects internal SQLite connection/statement state and avoids races while extracting status/error data. Why both layers are needed
So:
That combination is what allows one |
The |
|
Having helped maintain the better-sqlite3 package for several years, I can't remember the last issue we had that was from a "code correctness"/thread/lock/mutex related defect (it's all build/compilation issues because it's not using n-api!). The abundant issue list for Also: the performance hit from bad cache locality (like @BurningEnlightenment mentioned) and all the mutex juggling results in 4X to 10X performance hits for some functionality: https://github.com/photostructure/node-sqlite/tree/main/benchmark#performance-results |
|
Appreciate all your reviews here. I changed configuration of threads back to 1 (and a few other things in my local branch). The thing is as @mcollina has pointed out. It this thing even needed? The more I worked on it the less it seemed useful. I know there are some use cases. But for those, people can use worker threads? Having this working on a threadpool would be having a database connection with less capabilities (it won't have an |
|
@geeksilva97 Agreed: I think brief example code on how to run long running queries on a separate node:worker_threads instance would be the most conservative and most expeditious way forward (but I'm just shouting from the peanut gallery here). My main hope was to highlight the substantial complexity that the node-sqlite3 package tackled (and how Joshua's better-sqlite3 and the current node:sqlite sync approach avoids it!) |
Closes #54307
This PR implements an async API for
node:sqlitemodule. So far, it contains a very minimal implementation ofexecmethod, misses some tests, docs and refactoring but it is good enough to share the whole theory I have for it; with that, anybody can share thoughts about it.Design
On C++ land, I plan to have the
Databaseclass determine whether the operations will be asynchronous.Public API
For the public API, I plan to have classes such as
Database,Statement, etc., as counterparts to' DatabaseSync', ' StatementSync', and so on.