Aussie living in the San Francisco Bay Area.
Coding since 1998.
.NET Foundation member. C# fan
https://d.sb/
Mastodon: @dan@d.sb

  • 0 Posts
  • 444 Comments
Joined 3 years ago
cake
Cake day: June 14th, 2023

help-circle

  • If your scanner supports scanning to a network share, install Samba on your Pi and share the paperless-ngx incoming directory. My ScanSnap iX1600 supports this, but I’m not familiar with other models. I had to configure the scanner using the Windows app to add the SMB details, but once it’s configured, it works without a computer attached.

    Paperless-ngx also supports email. You can set up a separate email account for it, then forward it any documents you want to keep to it.

    For documents you need to keep a physical copy of, use ASNs (archive serial numbers) to correlate the physical and virtual copy. You can use QR code stickers to automatically set the ASN in paperless-ngx. I posted a nested comment with more details about this.

    Consider using paperless-ai to use an LLM to tag and title your scanned documents automatically. It needs a webhook to be configured. Consider a local model if possible, and if you want to use a hosted model, review the provider’s privacy policy to ensure they do NOT train the AI on user content.


  • And file away your scanned papers separately,

    I’d recommend using ASN (archive serial numbers) for documents you store a physical copy of, following the recommended flow

    I printed ASN QR code stickers, using the smallest Avery labels I could find (Avery 5267 in the USA, L4731REV-25 in Europe) along with their free online design app.

    For documents I want to keep, I stick a QR code sticker on them before scanning. Paperless-ngx automatically detects the QR code and sets the ASN. I then file it away in a folder that’s sorted by ASN. When I need to find the physical copy again, I first look in Paperless to find the ASN, then find the document in the folder (pretty quick since all documents are sorted).

    You’ll need to set the following settings:

    PAPERLESS_CONSUMER_ENABLE_BARCODES=true
    PAPERLESS_CONSUMER_ENABLE_ASN_BARCODE=true
    PAPERLESS_CONSUMER_BARCODE_SCANNER=zxing
    



  • This is how all language package managers work, unfortunately

    npm does actually support signing and provenance (tracking how the package was built), so in some ways it can be more secure than other package managers. https://docs.npmjs.com/generating-provenance-statements

    If you use one of the CI/CD systems they support (currently Github Actions and Gitlab CI), it can attach a signed attestation to the package stating the commit hash that was used to build the package, along with the steps taken to build it. This is combined with trusted packaging using OpenID Connect with short-lived tokens that are only obtainable in the correct CI environment, rather than using access tokens or username and password.

    It only supports some CI systems because they have to guarantee that the connection between the CI system and npm is secure.

    Some of the recent issues have been attacks on the CI system, rather than npm itself. For example, a Github Action that’s only supposed to run for commits to the main branch, but unintentionally runs for some subset of pull requests too.

    Of course, all this stuff is optional, and pushing to npm directly from a developer’s computer still works and is still not verifiable at all.

    I think the best approach is what Flathub/Flatpak, F-Droid (Android) and Composer/Packagist (PHP) do. You provide your repository URL, and they build the code on their end. Packages are always guaranteed to be built from code in the repo.

    Debian Linux is also moving towards requiring repeatable builds, meaning that a package built from source should be byte-for-byte identical to the package in the repo.






  • Companies sometimes sell their own first-party data, but not nearly as often as people think. If a company has data that other companies don’t have, a lot of the time they’ll want to keep it for themselves, since it can give them a competitive advantage over other platforms.

    If Amazon knows what movies and TV shows you like, they’re going to use that data to improve ad performance on their own platforms - suggested content on Prime Video, product ads on Amazon, etc. They’re not going to give it to some other company to use.

    The one major exception to that are data brokers. These are companies that only exist to sell data. These are less well known companies. They often use public data and combine it with things like supermarket loyalty data and purchase history.


  • For a beginner, I’d probably stick to Github initially, just because there’s so many guides and tutorials on how to use it, and their free plan is still pretty generous.

    A lot of the knowledge is transferable though. If you do want to try something else, Codeberg is pretty good for open-source.

    To just learn about Git, you don’t even need a host like Github or Codeberg. You can have a Git repo just on your computer, and still get a bunch of the benefits of source control - a full history of everything, separate branches and worktrees so you can have multiple incomplete changes and switch between them, etc.




  • dan@upvote.autoSelfhosted@lemmy.worldPaperless
    link
    fedilink
    English
    arrow-up
    8
    ·
    14 days ago

    I felt like a grown up once I got my paperless-ngx setup up and running.

    I have a Scansnap ix1600 scanner. Everything is automated once I insert a document and click the button to scan it.

    1. Scanned documents are saved to an SMB share on my home server - it’s a built-in feature on the scanner.
    2. Paperless-ngx is watching that folder and grabs the files.
    3. Paperless-ai uses AI to add metadata to document (title, tags, correspondent).

    For documents I need to keep a physical copy of, I give each document a consecutive ASN (archive serial number) using QR code stickers. When importing the document, paperless-ngx sees the barcode and attached the correct archive number to the document.

    If I need to find the physical copy, I first find it in Paperless-ngx, look at the archive number, then look in a folder where the documents are arranged by archive number. Easy.





  • To get started, I’d say to get a cheap block account from the Reddit Usenet deals wiki: https://www.reddit.com/r/usenet/wiki/providerdeals/. A block account gives you a fixed amount of download (1TB, 2TB, whatever) that lasts indefinitely. If you use it just for music or books (for example), one block could last you a very long time. If you find yourself needing more data, you can get a monthly subscription with unlimited data.

    You also need an indexer, which is how you search for content. DrunkenSlug, NZBGeek, and NZBPlanet are popular. These cost money, but sometimes they have a lifetime plan where you just pay once. Sometimes they have open registration, but other times you need to get an invite from an existing user. There’s free indexers like NZBKing, but they’re often full of junk, and lack encrypted content.

    SABnzbd is the most popular downloader software. It’s free and open-source.

    • Add account to SAB.
    • Search for what you want on the indexer.
    • Download the nzb file (points to where the files are located on Usenet) and add it to SAB to download the contents.

    I think that’s it for the basics. There’s more to it - different backbones have different data so one provider might have data that a different provider is missing , you can fully automate downloads with Lidarr/Radarr/Sonarr/Readarr, you can aggregate results from multiple indexers using NZBHydra/Prowlarr - but you can figure that out as you go :)