• flamingos-cant (hopepunk arc)@feddit.uk
    link
    fedilink
    English
    arrow-up
    2
    ·
    3 days ago

    Eh, I don’t think it’s that surprising. Getting a list of comments on a post vs getting them from a search term are very similar operations, so it doesn’t make too much sense for these to have different queries in the backend. One thing you could do, but no client to my knowledge does, is add a search bar to a post that searches through the comments only within that thread.

    Everything in the backend uses the same sorting as the posts do on that page except comments, which is frustrating. Comments do need a different sort enum as there are some options that don’t apply to comments (scaled, new comments, etc.), but yeah the fact the top options don’t work for comment search when they should is opaque and not user friendly.

    I can’t wait for 1.0 to actually come out because I feel like a broken record, but this is fixed there.

    • Zagorath@aussie.zone
      link
      fedilink
      English
      arrow-up
      1
      ·
      5 hours ago

      Eh, I don’t think it’s that surprising. Getting a list of comments on a post vs getting them from a search term are very similar operations, so it doesn’t make too much sense for these to have different queries in the backend

      Sure, but one would have thought that the ordering in a search is fundamentally different from the ordering in other places. Because you want something that contains the words you’ve searched for near each other to appear ahead of a post that has those words scattered at random because it’s a 500 word essay. You want exact word matches prioritised ahead of entirely unrelated words that include the same characters. Like “enum” should turn up your comment, but rank a comment that contains the text “renumbers” much more lowly. A particularly smart search page might keep “enumerate” high while rejecting “renumbers”, though.

      Of course, it’s true that at least in the current latest release, Lemmy fails at all of this. I hope 1.0 is at least fixing some of it?

      • flamingos-cant (hopepunk arc)@feddit.uk
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        3 hours ago

        This doesn’t have anything to do with sort ordering though, which is based on time and votes. Text search is just a filter on top of sorting.

        You want exact word matches prioritised ahead of entirely unrelated words that include the same characters. Like “enum” should turn up your comment, but rank a comment that contains the text “renumbers” much more lowly. A particularly smart search page might keep “enumerate” high while rejecting “renumbers”, though.

        Of course, it’s true that at least in the current latest release, Lemmy fails at all of this. I hope 1.0 is at least fixing some of it?

        How Lemmy does text search is via pg_trgm which works by breaking down both the content text and search text into trigram* and if the content contains enough of the search trigrams, it’s considered to match the search term.

        * A trigram is just a 3 character ‘words’, for example the trigram of ‘enum’ is {" e"," en",enu,num,"um "}.

        What you’re describing is closer to a tsvector, so you could open up an issues on Lemmy’s GitHub to move from trigram to tsvector. One advantage trigrams have though is that they’re language agnostic while tsvectoss need both a dictionary and to know the language (thankfully, Lemmy already has this info via the language setting, though the way it’s stored will need to be changed to accommodate this). But tsvectors does provide much more intuitive language matching, like what you outlined.