This site is currently struggling to handle the amount of new users. I have already upgraded the server, but it will go down regardless if half of Reddit tries to join.
However Lemmy is federated software, meaning you can interact seamlessly with communities on other instances like beehaw.org or lemmy.one. The documentation explains in more detail how this works. Use the instance list to find one where you can register. Then use the Community Browser to find interesting communities. Paste the community url into the search field to follow it.
You can help other Reddit refugees by inviting them to the same Lemmy instance where you joined. This way we can spread the load across many different servers. And users with similar interests will end up together on the same instances. Others on the same instance can also automatically see posts from all the communities that you follow.
Edit: If you moderate a large subreddit, do not link your users directly to lemmy.ml in your announcements. That way the server will only go down sooner.
I’m going to set up a general purpose instance tomorrow with the intention of handling a relatively large number of users. The main problem is choosing a domain!
Lemmy.world is a new server, accepting signups. You’re welcome there.
@nutomic@lemmy.ml It might be a good idea to default the Communities page to All instead of Local, to help push users into discovering other instances and promote them.
Sadly, I feel like the Fediverse, based on ActivityPub, was fundamentally designed wrong for scaling potential. I do like Fedi and I like ActivityPub, but I think instances should not have to be responsible for all of this:
- Owning user accounts
- Exclusively host communities
- Serving local and remote users webpages and media
- Never going down, as this results in users and content becoming unavailable
Because servers “own” the user accounts and communities it’s not trivial for users to switch to a different instance, and as instances scale their costs go up slightly exponentially.
I wish the Fediverse from the beginning was a truly distributed content replication platform, usenet-style or Matrix-style, and every instance would add additional capacity to the network instead of hosting specific communities or users.
I guess it’s a bit too late for a redesign now… Perhaps decentralized identifiers will take us there in some form in the future.
Not sure why you reference Matrix, which has even worse scaling issues as it indeed tries to replicate nearly the entire network on every server.
The Fediverse is really just working how the general web does, just with some standardized API for websites to interact. It’s not perfect, but it works and has proven to be relatively scalable.
It sounds a bit like you had a bit too much of the Bluesky cool-aid, which indeed replicates nearly all of the mistakes of Matrix and makes it impossible to scale via small community owned servers instead of big company owned data-centers (which might be by design?).
Well yeah, point taken that replicating everything everywhere and forever might be impossible. But I do believe at a minimum my identity should be portable and accessing Fedi (ie. in microblogging: posting and viewing a feed of the latest posts of my follows) should be decoupled from which instance I pick to access the Fediverse.
I don’t particularly like how owners of instances which grew are now essentially locked in to having to spend 100s or 1000s of dollars a month keeping their now expensive instances running and providing service. This is a bad place to be for a platform ran by volunteers. Letting instance owners scale their service down as well as up would be ideal. But this requires at least decentralized identity, and at best some form of content hosting redundancy…
It’s easy to say the current architecture of Fedi works when it’s still small. Your instance has 139 users… That’s not intended as a slight. Hosting instances is good and I applaud you for it! But I wish it were easier to more equally share the load once the platform becomes more popular.
Is there any group of devs that work on this issue that you know of? I’d be interested in looking into it.
No. And I think it’s a really hard problem. poVoq was right to call me out on full replication being a bad move, because duplicating all content on every server is obviously inefficient. But a solution in-between, with decentralization and redundancy, is probably a very complex challenge. Doesn’t seem impossible, but very complex network protocols rarely seem to succeed.
Edit: Sorry I was still thinking about some fabled perfect protocol. But if you’re looking into decentralized identifiers, W3 is working on one approach. It’s not something I have seen used anywhere or integrated with ActivityPub yet, but that could be the future I’m hoping for. Probably.
It almost sounds like you’re describing RAID 5 of content across fediverse servers.
Something like that. But also with fully decentralized identity. So all content is signed by a keypair which is local to the user, and can be used to access Fedi through arbitrary instances. Probably I am too wishful.
I like the idea of decentralizing identity. One of the oddest things about the current fediverse is how closely tied accounts are to servers that host specific content. From the server’s perspective it would be like everything’s posted anonymously except all the messages are pgp signed.
But how would the system handle user customization settings? Things like blocked users or subscribed topics. Would that all need to be stored locally in your browser and parsed by the arbitrary instance you’re using?
And what if some instances want to refuse hosting certain content on the network. Maybe there’s some way defederating instances could account for that.
I could envision a 2nd class of server, running something like OAuth/OIDC, which handles the authentication into any Lemmy instance (or better yet, any ActivityPub based instance).
This server would also be self-hostable, and provide only authentication services, so it would be rather lightweight. But would help reduce the load on the content servers.
I feel like for decentralized identity maybe a page could be taken out of Blockchain. Ethereum ledger is duplicated in its entirety on every host, and there are L2s that help spread the load but roll up to the L1. If identity could be attached to something like that, each person has a key to identify themselves. Identity I think would be best separated from all content related to the identity, the user could choose a server to host that data, as well as back it up with like a shared user data backup agreement between a few servers in case one dies. It’d be very similar to raid but the data only needs to be on 2-3 servers. I suppose community data could be the same. I can envision when a new server joins the federation, it could be auto assigned to share with 2-3 similarly sized communities with algorithms making sure there aren’t any closed groups of sharing (a->b&c, b->a&c, c->a&b) wouldn’t ever want that. They all seems to be the most reasonable solution in my head.
Yeah, the fact that the user auth and permission models were intentionally left out of the W3C spec initially really ended up locking ActivityPub into particular dialects and patterns that are now proving problematic for scaling.
I’m not sure it’s 100% too late for a redesign, though. The committee is still active and the Fediverse could still theoretically grow by an order of magnitude or two. Does that seem likely right this minute? No, but sometimes that kind of vision is what an ecosystem needs.
Thank you for posting this, as you seem to have more knowledge of the underlying protocols than i have. Had some visions of server takedowns and meltdowns. I have little idea how the protocol works but i’d imagine that account migration/replication would not be such a big deal to implement, and communities are already being replicated, no? As a work-around.
While it might not be too late for that update, it would require some reconciliation to happen. There’s the potential for multiple users and communities of the same name across servers that would need to be considered.
Following a decentralized identifier would essentially be like following a unique public key. Their screen name is just some text which can be anything. The bigger problem is the overall infrastructure of Fedi which is very much based on lists of user@domain … this no longer trivially works if the “user” you follow could be posting from anywhere. It doesn’t seem unsolvable, just kind of difficult to imagine with the momentum behind Fedi as it is right now.
I think lemmy will be bitten in the ass by not having considered clustering/horizontal scaling from the start. Federation alone as a scaling mechanism is only feasible for “nerds”. But if the network wants to grow, we will need a few scale-able large hosted instances. And if their only choice is to scale vertically, there will be a hard limit (unless we put a good old Mainframe somewhere ^^).
Another downside of this design is: you can’t run it with high availability. If there’s only one process per instance, updating it will mean the whole instance is down. Sure, if all goes well this downtime is under a second. But if it doesn’t go well or if a migration is needed, this might quickly become hours.
I think you probably underestimate how far one can get with “vertical” scaling. Here’s the dockerfile: https://raw.githubusercontent.com/LemmyNet/lemmy/release/v0.17/docker/prod/docker-compose.yml
- It includes 4 different containers… so there’s a way to scale out to 4 machines right away. Maybe not every container is doing an equal amount of work… but there’s some amount of immediately available machine-splitting.
- I’m no expert, but I believe that at least the lemmy and lemmy-ui containers are stateless. If so, they’re horizontally scalable already.
- Postgres then would likely be the main bottleneck. But postgres offers read-replicas, so again the write-load and the read-load can be hosted on separate machines. And if there’s enough read-load, you can have many replicas.
Other comments from the admins have shown that lemmy.ml today is running on a single eight-core box and it’s currently hosting 30k registered users and over 1k active. So how much more compute capacity can we throw at “vertical” scaling on the current software architecture?
- Just by going to a bigger single box, we can get 128 cores with no problem, a 16x bump in capacity. Does that get us to at least to 300k registered + 10k active?
- Splitting the containers onto 4 separate machines. Does that get us 2x more?
- Adding PG read-replicas and additional lemmy/lemm-ui containers would allow us to expand our instance footprint to maybe 6 physical machines should get us another 2x or more in performance.
Conservatively, that’s 100x the computing capacity of the current hardware and could potentially support 1m registered users and 50k active. Now, I don’t REALLY expect this to be possible today, there will be many software bottlenecks found along the way to scaling a single instance this large. But my point is that there’s already a medium amount of horizontal scalability built into lemmy, and if the software doesn’t fall over for algorithmic reasons (which is will at first), the current infrastructure architecture allows quite a lot of growth. There’s plenty of time between now and a federation of million user instances to adopt a truly distributed storage backend if needed.
Doesn’t solve the availability issues, though. I know of no seriously hosted system that doesn’t have at least two replicas in different availability zones. I don’t expect any hobby instance to offer any kind of availability guarantee. But if we want to have one or two central instances that the typical reddit user can flock to, this would IMO be essential to have.
Also, in my experience it is FAR cheaper to have a few low to mid range systems for vertical scaling, than to throw a high end machine at it for vertical scaling. If you look the the pricing, the monthly costs for vertical scaling goes up exponentially once you want much more RAM and CPU cores (and storage, and so on).
Being able to scale horizontally solves both issues: hardware is cheaper and reliability is higher.
That lemmy is so damn efficient would then simply mean, that we can achieve excessively good results with low resources, where Reddit would already struggly and needs to put much more machines in place. That would be a nice “business” advantage.
Doesn’t solve the availability issues, though. I know of no seriously hosted system that doesn’t have at least two replicas in different availability zones.
I’m not sure why you think the setup I’ve described can’t have coverage in multiple availability zones. If the lemmy and lemmy-ui containers are stateless as I suspect, you can autoscale them. Pictrs is new to me, not sure there… but it appears to support object-storage which would likely make it stateless and the object-storage can replicate to multiple-az’s. Postgres read-replicas can be placed in multiple az’s as well. The only component that presents an issue is the Postgres write-leader, and failovers there can be done in minutes. Many many popular sites run with an infrastructure like this and achieve excellent uptimes.
I do get the power of horizontal scalability, I specialize in distributed databases. But they come at a cost in flexibility relative to something like Postgres… and we’re very far from “needing” horizontally scaling database writes here. Everything else looks like it can be scaled horizontally if someone wants to take on the headache of doing so.
Well, one could try to swap postgres for cockroachdb. But a ticket in github that asked for clustering support was closed with being out of scope. So might be lemmy is not stateless. Haven’t checked the code yet, though.
If cockroach is truly PG compatible, lemmy admins can swap it in without developer support. I suspect Cockroach constrains some SQL features and has poor performance on others, but that or AWS Aurora are things you can experiment with without dev support if you’re passionate about the proving out the value of scale-out.
The statement that spawned my response though was this:
I think lemmy will be bitten in the ass by not having considered clustering/horizontal scaling from the start. Federation alone as a scaling mechanism is only feasible for “nerds”. But if the network wants to grow, we will need a few scale-able large hosted instances.
I still don’t think it’s true that we need horizontal scaling to support sufficiently large instances. The amount of vertical and horizontal scaling ability built into Lemmy today is both useful, and likely to outstrip the current ability of its code to scale a single instance. Any algorithms that scale super-linearly with respect to comment-count, post-count, user-count, or community-count, will fail just as hard with distributed backends as they do with an RDBMS. And as you note, PG-compatible distributed systems provide a potential lower-engineering-cost on-ramp to distributed systems once the codebase is efficient-enough to warrant such a transition to scale further. I suspect I’ve contributed everything of use I have to this thread though, and don’t expect to respond further.
As someone not versed in DBs and scaling for web architecture, this was a super fun read through, appreciate the comment chains from both users.
Thank you for your thorough explanations and input. It definitely gave me a few things to think about. And if I have some spare time I might even try to spin up lemmy in some local k8s to see how it reacts to being scaled up and down.
i’ve been saying we need a COBOL/CICS implementation of ActivityPub for YEARS and it’s always the same “where the hell am i supposed to get a 3270 in 2023” and “what do you mean i can’t shitpost during the batch window”
Indeed. If a big instance like lemmy.ml was to be shut down all the communities would be lost. This is simply not sustainable. Why would users put effort building a community if it could be gone at any time?
That however would be a different problem. A horizontally scaled instance would be able to cope with more users, but if it shuts down for monetary, personal, or whatever reason, it’s still down.
Protecting a community from this is what the decentralized part is for. That is already in place.
(Although there is a middle ground where you could design the system in a way that one instance is mirrored and load-balanced across different hosters. That would actually also be quite interesting to have. But that’s another layer of complexity on top.)
Protecting a community from this is what the decentralized part is for. That is already in place.
What? How is it solved exactly? If say lemmy.ml is down, what’s the point of other servers existing, if most of the content and users are here? Like, I created a few new communities on lemmy.ml, which don’t exist on say Beehaw because for some strange reason, the Beehaw admins don’t allow users to create communities. So how is going to Beehaw help me, if lemmy.ml is unavailable? Okay, so you tell me I should go to a different server then. Maybe even make a new server. Done and done. But there’s very few to zero users on that server, so those new communities and content created there might as well not exist. Also, even though Lemmy is federated, the homepage defaults to “local”, so all the new users coming in may miss out on all the other federated communities, and, if I’m reading this correctly, the federation isn’t even a fully automatic process, and some admins may even choose to put there server in a whitelist mode. All of it makes the whole “advantage” of federation, or at least Lemmy’s version of it, seem kind of pointless.
It’s like saying, “Hey, Gmail is down so you should just use Hotmail instead.” Okay, so I can still send and receive emails, but I can’t access any of my old emails for context, and none of my contacts can reach me using my Gmail address, and none of my filters, address book and other content is available so I may not even be able to reach out to my contacts and let them know what my new email is.
IMO the way the way the federation should’ve been designed is to use something like blockchain technology, so every instance basically has all the content and there’s only one source of truth for user accounts and data (distributed ledger), or maybe even just implement the whole thing as a plain old high-availability cluster with load balancing.
Unless I’m missing something fundamental, I don’t see how this decentralization is of any use if the content isn’t there.
If say lemmy.ml is down, what’s the point of other servers existing, if most of the content and users are here?
There is no replication and failover so the problem is not solved.
blockchain technology
Urgh, no way. Replication and some basic message signing would be enough.
What? How is it solved exactly? If say lemmy.ml is down, what’s the point of other servers existing, […]
Because you want to rely on someone else’s instance. The idiomatic solution would be for a community to host their own lemmy/activitypub instance and join the federation. Then the community has control over their own data. In every sense. If they want to delete something (for breaching law, protocol, or whatever), they are free to do so and don’t have to ask anyone else.
IMO the way the way the federation should’ve been designed is to use something like blockchain technology […]
Please no. I mean there is IPFS out there that somewhat works like that, but I don’t really like that. First, the ever-growing amount of data means that every instance has to keep up with it. If they wouldn’t replicate it, the deletion of a single instance would still eliminate the data, even if there were references in a block-chain.
Also: the ability to “forget” is important. Not everything needs to live on forever. That it currently does, can already be a big problem. Look how peoples lives got almost ruined because someone dug up tweets from 10 years ago that were stupid. Solving the issue of data ownership is IMO one of the bigger things we need to keep in mind when designing a better web. Federation with the ability to “just” bring your own instance along where you are the owner is one of these options.
Fair point, but my original point/issue still stands. The admin here is saying “lemmy.ml is overloaded, use other instances instead” and that advice isn’t really helpful, at least in the present state of things. Right now, we have an influx of novice users coming in from Reddit, and other servers either not accepting applications at the moment, or they are tooniche/specific (or inflexible, like Beehaw); finally at the moment, majority of the content is on lemmy.ml. So the end result is that lemmy.ml is one of the main viable servers.
If people join some random server which doesn’t have the content they’re after, they’ll either lose interest, OR they may continue to consume the content on emmy.ml via federation, but then that’s not really going to solve the load issue since the content on lemmy.ml isn’t distributed/replicated.
I understand your point of ever growing data and how it may be better if that data is transient and not there forever, but for a news aggregator and forum type social network like Reddit (and now Lemmy), data is everything. If that data isn’t available, or not going to available in the future, or will not be visible to audiences due to it being on some random server, it’s going to give content creators much incentive to create content, and no content == no users. This sort of model/thinking will be doomed to failure, or be forever relegated to niche/enthusiast status, where only niche communities will thrive on specific servers targeting that niche. Which I guess is the ultimate goal of federation where every topic/community has its own server? But to get there, you’ll need interested users, and to get users to be interested you need a stable, singular place you can point them to, where they can post content knowing. And maybe, as that server grows, the admin could start splitting off the larger communities into their own individual instances?
This is how I am understanding it. Please correct me if I am wrong.
I’m going to use Reddit as an example, since we all understand that…
So the way I understand this is that backbone is now the whole of the internet instead of just reddit.com.
Each instance would be somewhat akin to a self-hosted subreddit. We can reach any sub from any other sub, since the backbone is now spread across the whole internet instead of just reddit.com.
These subs (instances) are also like old style BB forums in that there can be different categories (communities) hosted by that instance, but those are also still visible across other instances.
So basically people who are making communities here are making a sub in a sub (in Reddit terms).
Do I have that correct?
Mostly. I try to think of instance as not a subreddit but a loose collection of them, like a multireddit.
What is kind of nice, in my understanding, is that text content is replicated across federated instances when a user is using both. So if you’re on beehaw and comment on lemmy.ml, both of these servers will have your comments. That’s already providing slightly more redundancy than reddit.
I still don’t quite understand how the community is replicated…
Are you saying that if Lemmy.ml/tiki exists and someone creates Beehaw.org/tiki that they are the same community? They would show the same posts and comments?
Or are they completely separate communities that would just have the same name… users could subscribe to both if they wanted, but the posts and comments would be stuck on their respective instances?
Or - Is it the case that Lemmy.ml’s tiki community and posts and comments are also stored on Beehaw.org somehow?
If I deleted the tiki community on Lemmy.ml, would users from both communities lose their posts and comments from the Lemmy.ml instance of that community?
The current state is that they are separate communities, but I believe the person you’re replying to is proposing something like the other option, where some communities would be the same across instances so that the community and its post history would survive if one of the instances went down (not currently the case).
Currently, if you deleted the tiki community on lemmy.ml, only the lemmy.ml tiki community posts/comments would be gone. Any other tiki communities on other instances would remain.
Ok, say there was an established /tiki community on lemmy.ml, and some new server started up and started its own /tiki community. Would the posts from the lemmy.ml tiki community show up in NewLemmyServer.com/tiki… but only if NewLemmyServer was connected to lemmy.ml? Right?
From how I understand it, they would be different communities. Example: you have lemmy server A, B, and C. Your account is on C, and all 3 servers have tiki comunities. To access tiki on C you would go to tiki (since it’s local), to access tiki on A you would go to tiki@A, b would be tiki@B.
If there’s a community serverA/tiki, you can search on serverB for serverA/tiki and join the community serverA/tiki from serverB. Content ist replicated to serverB and back.
serverB/tiki@serverA is the replica you can fully use on serverB. This can exist beside serverB/tiki, which is a different community.
If someone writes a posting or comment on serverA/tiki, you can see it in serverB/tiki@serverA.
If someone writes a posting or comment on serverB/tiki@serverA, you can see it in serverA/tiki. (And even on serverC/tiki@serverA)
The Tiki community should simply run a Tiki server, no? Problem solved.
Great idea, but then I’d have to get into the whole hosting thing and all of that which I don’t want to do.
There may be someone in the community that’s interested and/or willing.
But i agree, it’s not as simple as it sounds.
you could design the system in a way that one instance is mirrored and load-balanced across different hosters
That’s exactly what I meant. Horizontal replication shares a lot of building blocks with federation. NNTP had peering/replication and worked quite well for a protocol designed in 1986.
unless we put a good old Mainframe somewhere ^^
🎼The stranger there among them had big iron on his hip🎶
gotta love a good gamer-flavour mainframe
Point us to where the coin slot is. E.g. Patreon. We insert coin 🪙, you upgrade.
Many thanks, dropped 10mBTC.
Thank you. Did the thing.
Thank you, dropped a few buckeroos :)
I have been wondering how cumbersome the Lemmy design will become for some. I love the idea that it is federated and decentralized however these are also major drawbacks for most
average
users (i.e not multi account users.Multiple accounts needed for
maximum uptime
on different instances. What if I really like my username and its taken on another instance? If one instance is down and i comment with my other account will i then need to manage replies etc through different profiles? What happens if something spins up another instance of a similar domain so that they can get a username of someone to imitate them? I am sure these can be blocked after the fact or will other federated instances be automatically blocked.What happens when someone gets bored of their instance and stops it, or it gets blocked, or they start getting unwanted attention. Does this mean all that content then goes into the ether?
Will this go down the route of whomever provides the instance with the most resources, best load balancing becoming
the one
, blocking other instances and controlling it as if it were private and independent?There are a lot
wait and see
things, but I am excited to help and see what this great project becomes.Over at https://join-lemmy.org/ , when someone clicked on “Join a Server”, they are presented with a list of instances, it’s not that obvious that these are cross-accessible (yes, the homepage mentioned it, but not here), and people are bound to look for one with the most users.
Perhaps, add a simple TLI5 explanation/diagram explaining how Lemmy works on https://join-lemmy.org/instances .
(The documents are also too wordy for most people to care.)
Pull requests welcome, the code is here: https://github.com/LemmyNet/joinlemmy-site
Another thing:
We do need more site admins to help us handle the applications and moderation.
For obvious reasons, we prefer ppl who have been here for a long time, and post / comment consistently. If you’d like to help us out, so that nutomic and I can focus on coding, that would be splendid.
I tried like 4 or 5 instances before coming to lemmy.ml, but none of them were taking applications anymore. Finding even those was a hassle, since all I got was a list of domains without any details as to what the instance is about or if they allowed newcomers.
Now that I’ve setup everything, Lemmy does seem like nice alternative to Reddit, but as someone from the outside, all of this is daunting.
Exactly, can’t join another instance if they don’t accept your application.
This list may help newcomers:
Yeah, I can agree. I applied at two instances and found one that did not have the questions and allowed me to make and verify. Once I got in, it’s not too bad. I took a peek here first and saw the posts mentioning using All to see the instances, and then it opened a bit more.
I sent my registration yesterday, because I signed in another instance, one from my country, but I couldn’t see all the post and no comments from lemmy.ml even thought is supposedly linked, so thank you for approving my account.
Even if I’m a tech savvy person I found the whole experience of joining lemmy pretty bad, I like the concept of federation, but I think it’s too confusing to normal people, it really needs to be more seamless if you want to grow, how? idk, I was thinking some sort of replication, when you sign up, you are registered to the main instance (this) and given the choice to select other instances, automatically selecting let’s say another 3 based on your location, then your account is synced in all the registered and linked instances, when you login if an instance is experiencing overload then it switches to another one. I don’t know if this is realistic or out of the scope of Lemmy, or maybe against the philosophy of it. I’m just rambling.
I’m just glad that there is an open alternative for anonymous social interaction in this day of walled internet services such as discord, twitter, facebook etc. and I wish you all the success.
Agreed, someone needs to create an easy “sign up here” with a default option (maybe just randomize across various instances, not sure)
Randomizing would cause lots of issues since each instance has different rules and philosophies. It’s a difficult problem to solve.
We could just get all new users to sign up to lemmygrad
Kinda cancel out the far-right invasion of Voat? It’s a line of reasoning, certainly.
haha, everyone is free to request an account on lemmygrad
LOL! That would go well.
I found it rather easy to get signed up, just had to wait for the admin to actually approve the application. Otherwise it was pretty easy.
However, I do see a HUGE benefit to “load balancing” as you are mentioning. Where you sign up for a master server and then replicated to the others that are more applicable. I’m surprised this isn’t already a process as this is very common in gaming and proxied sites.
Yeah the registration itself was easy like any other site, I was talking more about grasping and understanding the concept of instances and how they interact.
And as someone said in another comment, the see all posts options should be the default in your home and community search or you feel like in a dessert island when you are new to all of this.
Both Mastodon and Lemmy have this problem. Make the default where the most new content is, which is going to be the federated tab and all tabs respectively.
Mmm, dessert island… drools
If now is struggling then on June 12 will be a nightmare.
Reddit will go dark in protest, many messages to join Lemmy, most instances will be overloaded or even DDoS with so many users, like what happen with Mastodon.I wonder if a longer term solution would be to auto rotate the server list to bump less popular ones.
Users are likely going to see this as it’s the “official” Lemmy instance when trying to join for the first time.
Any admins of instances that are accepting people, give your best elevator pitch!
Any admins of instances that are accepting people, give your best elevator pitch!
I just want to contribute where I can 😅
I just created https://lemmy.studio/ for music-related discussions.