This is just a small fix for you guys facing the following error while consuming new documents in paperless. Due to a vulnerability, the NLTK tokenizer had to be bumped to a recent version. But there seems to be a problem with bare metal installations.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
Doc.pdf: The following error occurred while storing document Doc.pdf after parsing: 
**********************************************************************
  Resource punkt_tab not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt_tab')
  
  For more information see: https://www.nltk.org/data.html

  Attempted to load tokenizers/punkt_tab/german/

  Searched in:
    - PosixPath('/usr/share/nltk_data')
**********************************************************************

To get the system up and running again, you could install the missing module manually:

1
python3 -m nltk.downloader -d "/usr/share/nltk_data" punkt_tab