One thing Reddit dominates on is search results. I’m looking things up and seeing so many links to reddit, which I guess is going to help keep that place relevant (unless those subreddits stay dark).

I wondered how Lemmy and this fed thingy stuff all works for that? With more posts can we expect to see people arriving through search results?

  • wpuckering@lm.williampuckering.com
    link
    fedilink
    English
    arrow-up
    4
    ·
    edit-2
    1 year ago

    There’s a lot of things that factor into the answer, but I think overall it’s gonna be pretty random. Some instances are on domains without “Lemmy” in the name, some don’t include “Lemmy” in the site name configuration, and in the case of some like my own instance, I set the X-Robots-Tag response header such that search engines that properly honor the header won’t crawl or index content on my instance. I’ve actually taken things a step further with mine and put all public paths except for the API endpoints behind authentication (so that Lemmy clients and federation still work with it), so you can’t browse my instance content without going through a proper client for extra privacy. But that goes off-topic.

    Reddit was centralized so could be optimized for SEO. Lemmy instances are individually run with different configuration at the infrastructure level and the application configuration level, which if most people leave things fairly vanilla, should result in pretty good discovery of Lemmy content across most of these kinds of instances, but I would think most people technical enough to host their own instances would have deviated from defaults and (hopefully) implemented some hardening, which would likely mess with SEO.

    So yeah, expect it to be pretty random, but not necessarily unworkable.

    • melonpunk@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Great answer, thanks.

      I’m not hugely familiar with SEO, but I seem to remember there could be a penalty applied to content that is duplicated as it’s seen as spammy. I might be wrong on how this works though, and it could be based around only content pasted within a single domain.

      I just wonder how search engines will deal with seeing the same content across a lot of instances in terms of ranking and noise.

    • OrangeSlice@lemmy.mlM
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Easily the best answer here, I think the people who think it will work “just like Reddit” are unfamiliar with federation still, and aren’t used to thinking things through in those terms.

      Not to mention that Google results in general have been pretty trash for a couple years now. I don’t expect fediverse content to be prominent for some time unless there is a dedicated service that indexes everything.

      • itty53@vlemmy.net
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        I mean why couldn’t there be a dedicated service that indexes everything? Whoever makes it and gets it working in a user friendly manner is going to have a significant level of control on the content that is shown in the results. If you don’t want it, it isn’t indexed. I don’t have to stretch the imagination to think of parties that have good reason to want to be first to do that across Activity Pub as a whole. Mastodon is already a big frontrunner in that regard.

      • wpuckering@lm.williampuckering.com
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        1 year ago

        As a general rule, I prevent all of my self-hosted services that are directly exposed to the Internet from being crawled or indexed by search engines. Any service I do expose publicly to the Internet is of course behind proper authentication and is secured using modern best practices and standards, but lowering the visibility and odds of someone stumbling onto services they have no use for, and potentially trying to exploit them, is less likely to happen if they aren’t presented front and center in a search result. I wouldn’t say it’s a proper security measure by any means (obscurity has nothing to do with real security), but blending into the crowd or taking a seat at the back of the room draws less attention to yourself if you don’t care to be the first target in someone’s sights.

        So why do I expose any of my self-hosted services to the Internet in the first place, rather than access them exclusively via VPN? For me there’s a few reasons:

        • Ease of Access - I want the ability to instantly share usage of specific services that I host with friends and family over the Internet, and I can’t expect them to do so over VPN, even if I were to offer to help them get set up
        • Performance - I use Cloudflare Tunnels to expose my services (no open router ports, ever), so that allows me to use Cloudflare’s CDN for caching static assets such as immutable images, CSS, Javascript, and I’ve extensively tweaked my Cache Rules to squeeze the most of out it
        • Security - Cloudflare secures my services with their built-in tooling, and I can use Cloudflare Access if I want to limit access further to specific users by means of accounts they already have, such as Google or various social media account providers

        …And there are more reasons I could get into, and I could easily expand on the ones above, but I’ll leave it there.

        Of course having all of my external traffic flow through Cloudflare means there’s no expectation of data privacy for any payload traversing in and out of my services, but I’ve decided that I’m okay with that for the other benefits I get out of Cloudflare. Nothing’s truly free, right?

        But to answer your original question more specifically, and with the context above in mind, why actively work against indexing in the case of my Lemmy instance? Well, I’m the only user on my instance. I only use it as a home server for my account. That means I’m not creating any communities on it, and there’s no content actually originating from my instance proper. Anybody who would end up coming across my instance, if they were to browse, would see content which originates from other instances, and only content from the time that I set up my server and began federating with those other servers and onward. They wouldn’t see every comment from posts that pre-dated my federation, so it would be an incomplete view. They would be better off going directly to the server that originated the content. They could of course do that by following the permalink from my own server, but it’s an extra hop. It might arguably be better in this case if I just remove my server entirely from any possible search results so that if the originating instance is indexable, its content shows up in the results and mine don’t. That would probably be a better user experience for users trying to find Lemmy content via search engines, they’d hopefully land in the originating instance sooner than later.

        Long answer, but I wanted to give as much insight and clarity into why I do what I do. Happy to answer any more questions!

  • kadu@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    1 year ago

    One thing to keep in mind is that Google currently penalizes links that don’t end in the common top domains like “.com”, “.org” and similar. So something like lemmy.world, if indexed, will rank lower than a site ending in .com with the same keyword density.

    • SkyNTP@lemmy.ml
      link
      fedilink
      English
      arrow-up
      4
      ·
      edit-2
      1 year ago

      Let Google be irrelevant. It kind of already is there in the absence of Reddit.

      The nerds always blaze a trail when boring old entrenched media ruins good things. In this case the thing being ruined is a search engine that makes the critical mistake of assuming a traditionally “prestigious” .com equates value. Fuck the old establishment, it’s time to ditch decrepit big tech and remake the internet the way it was meant to be. It’s time to reinvent how we share and discover content.

      • CalcProgrammer1@lemmy.ml
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 year ago

        Embracing the Fediverse now does pretty much feel like “taking back the Internet”. It reminds me of the early days and that’s an amazing thing. Tired of the over commercialized hellscape the Internet has become over the past decade and a half.

    • Briongloid@aussie.zone
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 year ago

      Google went from being the most important website on the internet to being more and more useless, it’s amazing seeing such a massive company go downhill. But they have so much money that they’ll be able to stay big forever from capital alone.

      • gun@lemmy.ml
        link
        fedilink
        English
        arrow-up
        0
        ·
        1 year ago

        What do you use as a search engine instead of Google? I feel like I’ve tried everything, but always end up back at Google search.

        • Ministar@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          1 year ago

          Been using Ecosia and so far its been very good. I did not have a need to use Google once.

        • crt0o@lemm.ee
          link
          fedilink
          English
          arrow-up
          2
          ·
          1 year ago

          I’ve been using DuckDuckGo for about a year now, the results still aren’t as good as google, but not having to look at ads and the better privacy outweigh that for me. It really has improved a lot over the last few years.

  • dan@lemm.ee
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    1 year ago

    My guess is just that Reddit happily lets search engines crawl it, so that content is well-indexed, and because Reddit threads are often linked to from elsewhere the site is considered good quality.

    I’d imagine Lemmy would eventually get to the same point naturally if enough information is shared here. At least, assuming it doesn’t block search engines.

    Hmm although I don’t really understand how federation will fit with that, given it basically means the same content is duplicated on a bunch of domains.

  • Monkey With A Shell@lemmy.socdojo.com
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    A lot of search engines rely on backlinks to rank the reliablitly/validity of a site so even if a given instance was picked up to have enough places reference it to be seen as a valid source would ve a pretty heavy lift.

    • ccx@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 year ago

      Making it a Searx plugin would probably do better in terms of making it accessible to a lot of people.

      I wonder how good are various ActivityPub instances at searching. Having pregenerated fulltext indexes of public content available for download could go a long way to make building search engine easy and fast.

  • thx1138@lemmy.one
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Yeah I think it will happen but maybe more slowly. From a search engine’s perspective it is combing results for a particular site, so search results will be shown that way. However, it seems that this would also mean that the same post would be found across multiple sites, which I can only see as a good thing in terms of searchability. Looking for some old obscure post on reddit? The results are only going to come from one site. Same kind of search on Lemmy? Search results could show pages of results of the same post but on several different sites. I’d think that would actually boost the efficacy of searching Lemmy.

    Also site search features like googling “site:lemmy.one [search terms]” would still allow you to search just on one site at a time, returning deeper results.

  • binwiederhier@discuss.ntfy.sh
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I had quite the negative Google experience just now with Lemmy. I googled “ntfy Mastodon” and to my surprise the ntfy Lemmy instance (discuss.ntfy.sh) fully indexes communities that I imported for my personal user, i.e. non-local communities.

    IMHO that should really not happen. In this case, i really don’t want ntfy be associated with some random posts from random communities that i just happen to subscribe to personally.

    It should IMHO only index local communities, or make it possible to configure this.

  • JohannesOliver@beehaw.org
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    One would hope! I can find results from lemmy instances on Google - they are definitely crawling them, but their page rank is going to start out very low.

  • Ben@lemmy.ml
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I actually added a custom search engine to Firefox… so I can search something on Lemmy. I have the keyword ‘LW’ for Lemmy.World search right now (because Lemmy.ml was offline a while).

    Basically, do the Lemmy search (search term ssss) then edit/replace ssss > %s and copy the entire link. https://lemmy.world/search/q/%s/type/All/sort/TopAll/listing_type/All/community_id/0/creator_id/0/page/1

    Then using ‘add custom search engine’ extension on Firefox, you add it.

  • Djokkum@rammy.site
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    1
    ·
    1 year ago

    I would expect Lemmy to show up equally in the search results if there is enough relevant content. My tiny tiny instance is already showing up in search results, crawlers can definitely find stuff on here. It would be great if at some point we can append “lemmy” to search queries to get the good stuff like we could with Reddit.