-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preload cache in the background #779
Comments
Hoping @roshanshariff can weigh in on this, but will also note this is the kind of example we've previously discussed as likely better handled by #681. @aikrahguzar had a promising prototype, but there were still issues to sort out IIRC. |
Sweet -- thanks for taking a look. #681 seems like a nice alternative approach. There are some rough edges in the emacs-async approach that might make it hard to justify this feature in the main citar package. I'll keep an eye on both these issues (and let me know if you need anything). |
We considered implementing this, but there is a big bottleneck with the emacs-async approach. It spawns an emacs sub-process to parse the bibliography file into a hash table containing the strings needed by citar, but then the subprocess needs to serialize that hash table as text and send it over a pipe to the main emacs process, which then parses the text back into a hash table. I'm guessing this latter serialization/deserialization is taking ~30 seconds for you. So, in the end, you would still end up with a ~2 minute parse time (during which emacs is responsive and an inferior emacs process is working in the background) and then 30 seconds where it's unresponsive, receiving the parsed data back. This is an improvement, but it's still very undesirable and I'm not sure whether it's worth the extra complexity. I think a better solution in the long run would be to use emacs' multithreading support; we could do the parsing in a separate thread (with no user interaction or buffer modification), and the resulting hash table would be immediately accessible because it's in the same memory space. But, at the time, I was put off by the apparent immaturity of emacs multithreading. I'm not sure if the situation has improved in emacs 29. I don't have the time to experiment with it again right now, but it would be good to get a status update. Yet another option would be to make parsebib's parsing interruptible: it could do its parsing inside a |
Where I left off had sort of a version of this: it would start with a temporary buffer in which entries were inserted. At each call of the completion table the buffer was searched with the first word of the query and limited number of matches were parsed removed from the buffer. The parsed entries were added to the cache. The cache was then passed as the collection. So in the beginning the search results were not the optimal ones and gradually as more searches were done the amount of cached entries increased and so the results became better but on the other hand some matches were presented immediately without wait. |
Ah, interesting approach @aikrahguzar. Do you know whether it's possible to stream more entries in (i.e update the collection as entries are parsed) with this approach, even without the user updating the query string? I know that consult does this for things like |
I don't think this is can be done just with completing-read, some minibuffer hacking will be required. I think for consult this done by |
Thanks for looking into this! Re. performance issues with the async prototype:
I figured out what the bottleneck was: the format string (and preformatted strings) weren't surviving async.el's serialisation. This meant that when citar checked the bibliography, it needed to regenerate the preformatted strings. I've tweaked my personal config to fix that problem (and updated the gist). With these fixes, it takes around 10 seconds before the first render after calling
Agreed. I think the issues I've had with serialisation and deserialisation point towards emacs mutithreading being a more stable and performant option than using async-emacs. And it looks like the implementation should be simpler than the current async prototype. Re. programmed completion: It seems like programmed completion is the way to go! I think my performance tests above demonstrate this quite well. Even with the cache loaded and the preformatted strings populated, What do you think of this approach?
(With the caveat that I don't really know much about bibtex or citar) I think this combination might be optimal because if we were to selectively parse the bibtex file using programmed completion, this may cause issues for bibliographies that use cross-refs? |
I'll defer to @roshanshariff and @aikrahguzar on the technical details, but just note: we don't only do bibtex/biblatex; we also support CSL JSON. |
Is your feature request related to a problem? Please describe.
I use a precompiled bibliography file that is pretty hefty and tends to cause performance issues in the emacs packages I've tried. Once citar has filled its cache, it tends to be relatively snappy. But the first time I use it in an editing session it blocks emacs for around 2 minutes.
Describe the solution you'd like
It would be sweet if citar loaded the bibliography cache in the background when a new buffer is opened. That way the first time I try to insert/modify a citation it should be more responsive.
Describe alternatives you've considered
N/A
Additional context
N/A
I've got a working prototype of this in my config.
It uses emacs-async to run a batch of
(citar--update-bibliography bib)
calls in the background (triggered bylatex-mode-hook
). When you next try to use citar, say by callingcitar-insert-edit
, it will check for an in-progress background task, wait for it to complete (if it hasn't already) and initialise citar's cache with the preprocessed bib objects, then continue.Let me know if you want this feature for the package and I can put together a pull request.
(Note that this implementation probably needs quite a bit of work. E.g. the first call to
citar-insert-edit
is still a little slow for me. It takes around 30 seconds (compared to around 5 seconds for follow-up calls). I'm not sure of the cause).The text was updated successfully, but these errors were encountered: