Drop Solr collections before re-creating them in manage.py reindex
Closes #1380 (closed)
I ran reindex --drop --all
in preprod and had 2 out of the 3 corpora successfully reindexed. The third one only worked after running reindex again. I tried to inspect the Solr collection in preprod and still cannot reproduce anything locally, so I looked at the code and made some guesses.
For some unknown reason, we were creating or updating the collection with .setup()
before dropping it, which makes no sense. I also noticed that reindex --drop --setup --all
, which should nuke all the collections and recreate empty ones, only worked on the first corpus because of a return
, so I made a few changes. I also updated Indexer.drop
so that it does not care if the collection is already gone.
I thought that one reason why I might not be able to reproduce is because my Solr instance is too small, so I tried to slow it down artificially using some resource restrictions in Docker Compose and index a larger corpus, still without any success. I could not find anything in the Solr docs that mention that dropping the index might not be actually finished once we get a HTTP 200 (like some sort of stale read) and have no way to verify it anyway, so this MR is my best guess at what is happening.