I have bunch of textbooks, and a lot of lecture notes and notes from colleagues, all in PDF format. What is a good way to classify, manage, store, and read these PDF files? I am trying calibre-web, but it seems difficult to find applications to connect to it.
I believe this new project should hit your need quite well!
Papra is quite new in the selfhosted sphere but a welcome addition. Yet to test it myself but it sounds and looks very promising > https://github.com/papra-hq/papra
I’m a big fan of Docspell, there’s lots of ways to import docs in (watching a folder, watching an e-mail account, etc), and it plays really well with my IdP instance over OIDC.
Maybe have a look at pdfding. It isn’t to bloated and pretty simple, but has all the features that are essential for me. I like it.
Not sure what your preferred platform is but I’ve had great success connecting to my Calibre-web site with Yomu on iOS.
Contrary to the others here,while I love Paperless,using it for textbooks and notes only worked “somewhat” for me - it becomes quite clunky after a while.
Personally I would rather go with Calibre if I were you if you have more textbooks than notes. Even for notes, they can be attached as well and better organised than Paperless.
(And don’t get me wrong paperless is awesome and I use it heavily)
What’s wrong with just folders and file names?
Benefit of data managers: Tags for easier searching and grouping, grouping that is folder agnostic, easily choosing a thumbnail, and in text focused ones you can usually search the content of the files from one location and easily look through the results for the correct one, etc.
Paperless-ngx! https://github.com/paperless-ngx/paperless-ngx
Paperless-ngx is great, but it is particularly bad at handling PDF documents. Roughly half my documents just won’t import.
https://github.com/paperless-ngx/paperless-ngx/issues/3933
https://www.reddit.com/r/selfhosted/comments/yfjxww/paperlessngx_not_all_pdf_files_can_be_imported/
I third this! I saw title and came to say.
It’s actively being developed still, I get emails like once every 1–3 weeks, sometimes more. Sometimes less.
I use docker desktop for this. I also lowkey learned how to set up a multi-database for this at one point, but kinda stopped after I got it working. More to see if I could.
I also tried bare metal building this, but had shit luck. It’s been a couple years though. Docker just makes it easy as hell.
I still keep all the originals separate just in case, and the tool can help you make multiple copies too (like PDF-A). I’ve never needed to go back and use those though, as Paperless just works so well once you get the hang of it and how you want your data stored.
I picked a structure that kind of lets me find stuff easily even if the tool is not running (like just by folder structures).
I’ve yet to make this online available for obvious reasons. But it would be nice to be able to pull up pretty much any document you need, any time.
Any suggestions on safe web access quickly from a phone might be helpful (WireGuard maybe?) if you have them.
Tailscale is how I access my server. I’ve got a domain name that points to the internal tailscale IP address, but that’s not really necessary
For remote access, wireguard is great. You can access stuff via their internal addresses.
I second this. Using this for about half an year as my full document store, letters, anything.
Search is great, lovin it
All great recommendations here. But I’ve heard good things about PdfDing. I haven’t used it myself but have followed development since the developer is quite active.
Not quite the correct application but linkwarden would work. Stores all your links but also backs everything up via html, plain text, and pdf. You can categorize content and tag content. Then there are filter and search tools.
You can just give it PDFs and it will import them over as well. Only saves them as a pdf but still would work.
I’m guessing this is not the best approach but wanted to give you options.
As a card-carrying librarian, I recommend using Zotero as a client with a WebDAV backend (I use Nextcloud).
If you’re studying or writing anything in which you need to cite your sources, Zotero is excellent and has integrations with many word processors. I’m pretty sure it can output your references as BibTeX if you’re in one of the disciplines that uses LaTeX.
I’m not even a librarian but pshh, I still got a card. They give them out to anyone, you know.
Not self hosted necessarily, but TagStudio is an interesting project worth keeping an eye on https://docs.tagstud.io/
Maybe paperless-ngx can be a solution for this.
https://github.com/paperless-ngx/paperless-ngxSeconded, for the second time!
Paperless is very easy to install and maintain. I use it for both scientific pdfs and random receipts. It’s easy to keep them separate
StirlingPDF ? Website : https://www.stirlingpdf.com/ GitHub : https://github.com/Stirling-Tools/Stirling-PDF
StirlingPDF is great, but more of a PDF editor.
OP wants something to store and manage his PDF’s.
You’re right I was certain it was doing both editing and managment but my memory played me
Not self-hosted so I doesn’t really answer your question… However, if you’re still a student consider the switch to Zotero.
Things you can self-host though, to make your books available everywhere, is some webdav sever to link your books directly to zotero and access them on every device.
If you’re serious about book reading and study, nothing beats Zotero !
You can also actually not self-hosting the database but have your documents hosted on some WebDAV server you control.
I use Ubooquity.