'ArangoDB - Slow query performance on cluster


I have a query that compares two collections and finds the "missing" documents from one side. Both collections (existing and temp) contain about 250K documents.

FOR existing IN ExistingCollection
    LET matches = (
        FOR temp IN TempCollection
            FILTER temp._key == existing._key
            RETURN true
    FILTER LENGTH(matches) == 0
    RETURN existing

When this runs in a single-server environment (DB and Foxx are on the same server/container), this runs like lightning in under 0.5 seconds.

However, when I run this in a cluster (single DB, single Coordinator), even when the DB and Coord are on the same physical host (different containers), I have to add a LIMIT 1000 after the initial FOR existing ... to keep it from timing out! Still, this limited result returns in almost 7 seconds!

Looking at the Execution Plan, I see that there are several REMOTE and GATHER statements after the LET matches ... SubqueryNode. From what I can gather, the problem stems from the separation of the data storage and memory structure used to filter this data.

My question: can this type of operation be done efficiently on a cluster?

I need to detect obsolete (to be deleted) documents, but this is obviously not a feasible solution

Tags ArangoDB
Đã hỏi - admin
16 ngày 4 giờ