Remove unsupported characters from PDF text
https://redmine.teklia.com/issues/10933
build_transcription should filter out null characters (\0) as well as any character within the U+D800-U+DFFF range (UTF-16 surrogate characters), as they are not allowed anywhere in the Arkindex API. If, after trimming whitespace, the resulting text is an empty string, then this function should return None to skip the transcription.