-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Conversation
When importing a directory of files, after every block is added we pin it and flush pins to the repo. This is really slow and if we are adding lots of files, the pinset is out of date almost immediately so this PR debounces flushing pins until after we've stopped adding things. A call to `pinManager.flushPins` will return a promise, and that promise will resolve once the pins have been flushed, but it will first wait 100ms and see if any subsequent flushes have been requested in that time - if so that promise will end early and return the promise created by the later flush request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way we can trigger this on the leading edge as well as the trailing edge? Something like if there's not flush running or debouncing then do flush otherwise be debounced.
I'm just noticing that the minimum time this call will take will now be 100ms + time(flush()), so if we call it just once in a 100ms period then we take the debounce hit even though we didn't debounce any calls.
this._flushingPins = { | ||
controller, | ||
promise: delay(100) | ||
.then(async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come this call has a .then().catch()
? We should stick to await
as much as possible.
promise: (async () => {
await delay(100)
if (controller.signal.aborted) return
// ...
})()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personal preference I guess? I've always found IIFEs a bit, well iffy to look at.
|
||
this.log(`Flushed pins with root: ${rootCid}`) | ||
}) | ||
.catch(err => this.log(`Could not flush pins: ${err}`)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please let us not smother the error!
Secondly, if the flush does fail, any debounced calls succeed...and they succeed before the flush has been performed so there's no way to know if an error happens.
Could we use something like debounce-promise
? It would mean the debounced calls would have to wait for the debounce run so perhaps this wouldn't exhibit as much perf improvement but would still be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. If I am importing files A, B and C, and we block on flushing pins after importing A, then B will not start to import until A has finished and the pins have flushed, debounced or otherwise so there's no benefit to be gained.
This PR works by doing the flushing in another async context but as you point out, there's no way to know if an error occurred, because the only way to know is to block until the operation has completed, which means the next file in the list will not get imported until the pins have flushed.
Thinking about this more, I'm not sure it's a great idea - the errors aren't returned to the calling context and I don't know what will happen if the repo is closed while the debounce timeout is running. I'm going to have a go at storing pins in the datastore (see #2197), that way we won't have to flush them or do any of these complicated DAG manipulations. |
Going to close this as refactoring how pins are stored is a better way forwards. |
When importing a directory of files, after every block is added we pin it and flush pins to the repo. This is really slow and if we are adding lots of files, the pinset is out of date almost immediately so this PR debounces flushing pins until after we've stopped adding things.
A call to
pinManager.flushPins
will create a promise that first waits 100ms to see if any subsequent flushes have been requested in that time - if so that promise will end early and the later flush request will flush the pins.