Skip to content

Drop Solr collections before re-creating them in manage.py reindex

Erwan Rouchet requested to merge fix-reindex-drop into master

Closes #1380 (closed)

I ran reindex --drop --all in preprod and had 2 out of the 3 corpora successfully reindexed. The third one only worked after running reindex again. I tried to inspect the Solr collection in preprod and still cannot reproduce anything locally, so I looked at the code and made some guesses.

For some unknown reason, we were creating or updating the collection with .setup() before dropping it, which makes no sense. I also noticed that reindex --drop --setup --all, which should nuke all the collections and recreate empty ones, only worked on the first corpus because of a return, so I made a few changes. I also updated Indexer.drop so that it does not care if the collection is already gone.

I thought that one reason why I might not be able to reproduce is because my Solr instance is too small, so I tried to slow it down artificially using some resource restrictions in Docker Compose and index a larger corpus, still without any success. I could not find anything in the Solr docs that mention that dropping the index might not be actually finished once we get a HTTP 200 (like some sort of stale read) and have no way to verify it anyway, so this MR is my best guess at what is happening.

Merge request reports

Loading