This is just a small fix for you guys facing the following error while consuming new documents in paperless.
Due to a vulnerability, the NLTK tokenizer had to be bumped to a recent version.
But there seems to be a problem with bare metal installations.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| Doc.pdf: The following error occurred while storing document Doc.pdf after parsing:
**********************************************************************
Resource punkt_tab not found.
Please use the NLTK Downloader to obtain the resource:
>>> import nltk
>>> nltk.download('punkt_tab')
For more information see: https://www.nltk.org/data.html
Attempted to load tokenizers/punkt_tab/german/
Searched in:
- PosixPath('/usr/share/nltk_data')
**********************************************************************
|
To get the system up and running again, you could install the missing module manually:
1
| python3 -m nltk.downloader -d "/usr/share/nltk_data" punkt_tab
|