The most significant complaint I received about IRC from newcomers, and one that can't be solved on the client side, is lack of history.
Funny enough, that's also the most frequent complaint I hear about Slack.
But those are two different types of history. I talked about it a while ago, but can't be bothered to dig it up, so I'll repeat it in a moment.
This, plus the inherent SPOFness of all chat protocols except for IRC, inspired me to come up with a new chat protocol.
There are 3 kinds of history:
A) what happened in the channel before you become a member of it
B) what happened in the channel you were already a member off while you were offline
C) what you've seen in the channel 2 months ago but forgot the details
Now, I believe (A) isn't all that necessary, and may even be harmful from privacy POV.
But (B) and (C) are important.
IRC gets (C) right but fails at (B) unless you use a bouncer or 24/7 irrsi/weechat.
Slack gets (B) right but fails at (C). Unless you pay, that is. You have to pay Slack for the ability to grep the logs of the conversations you've already seen.
I wanted to make sth that gets both (B) and (C) right, while also being simple and high-availability.
By high-availability I mean sth like IRC's multiple servers, where even if there's netsplit, you can talk to some of the people. OTOH, if Slack goes down, it's down for good. (kinda scratching my sysadmin itch here)
So to have history (B) but not (A), you need to maintain a list of channel members, and keep track of what each of them has received.
Besides, maintaining (A) would be a pain, as it impleis infinite history, so a lot of resources. (though I may consider adding an optional limited (100 lines?) version of it.)
So from a server's POV, a channel is 3 thigns:
- a list of members
- a list of clients connected to my version of that channel
- a list of other servers hosting that channel
And on top of that, you need a pubsub, where each member has a queue of messages they're supposed to receive.
When the member's client connects, the server should send those to the client, and the client shoud ack that it received it, so the server can drop them from the queue.
Now, there's CAP theorem, which means if you get a netsplit, you either pick availability or consistency.
In this case, I'd pick availability for the messages and member list. But everything else - servers in the cluster, channel configuration, permissions, etc. need to be consistent. Otherwise someone will add a centralized ChanServ and we're back to IRC's mistakes.
So we need a consensus protocol (Paxos, or hopefully sth simpler) for channel metadata.
We'd probably also want federation, to avoid creating lots of accounts.
This is already getting pretty complex, and I'd say it's get to the point where we need to cut a layer of abstraction.
File uploads, inline images, etc. can be added on top, hopefully w/o modifying the core server or protocol.
The protocol should have no XML, should be usable via telnet for debugging purposes, and have a limited support for extensions.
A basic client should be easy to implement w/o big libraries (so no JSON-LD).
@Wolf480pl Why no XML? It's not any better or worse than any other structured data format.
Is it just that the libraries suck?
Unnecessary complexity, parsing it may require fetching all those schema urls, namespaces are easy to get wrong... it's just a lot of opportunities for bugs.
And I don't really see any need for XML. And there's no telent equivalent for XML, so debuggability etc. suffers.
And the way I see this protocol, for the server a message may as well be a byte string with minimal metadata. The server doesn't need much structure, so no need for XML.
Guess the clients could send XML (or HTML) between each other, and that'd better be standardized early (we don't want another extension hell), but the part the server needs to care about should be very simple.
@Wolf480pl the only schema you need is the schema for the protocol itself, and the advantage there is that you can make it publicly accessible, so you don't need to share code for cross communication (see all the independent codebases for HTML5, which is XML).
Nested schema is an indicator of sloppy design.
Or you can express the same thing in BNF, and maybe even provide the grammar in bnfc format, so that everyone can auto-generate a parser in their preferred programming language.
@Wolf480pl I consider that a reasonable alternative, neither better nor worse.
@RandomDamage well from my POV, it's much simpler than XML. Less code, less edge cases, etc.
@Wolf480pl author's choice.
I've written interface code for damn near every structured date format out there, and I like XML because the working form is easy to deal with (for me).
Now, mind you, I've seen some truly awful XML, but I've also seen awful structured data in every other format.
If you can adequately describe what you want to communicate, the format doesn't really matter that much.
This also applies if you don't know what you want to communicate.
@Wolf480pl I'm certainly no XML hater, but I will agree in this case that it's unlikely to be a good fit for IM. There graceful degradation means it needs to be readable as plain text in clients that don't support new features. Or the server needs to determine if it should reformat the message so it's supported by the client.
On the otherhand I do love URI schemes. So if this gets implemented, please define one to link to chatrooms!
@alcinnz I like URI schemes too, thanks for reminding me about that.
And I don't want the server to look inside the messages or reformat anything.
Everything about message formatting should be clientside.
And I'm sure it can be made to gracefully degrade w/o server intervention.
I want to keep the server simple, so that it can have multiple implementations, including fast ones (opposite of Matrix).
@Wolf480pl Good choice!
@wowaname give me a generic protobuf client that can be used without knowing the particular protocol, and I'm sold.
As for audiovisual streams, I think they're entirely synchronous - a B-history of them wouldn't be all that useful. So a Jitsi Videobridge next to your chat server could do the trick.
@wowaname well, I definitely want the server to be content-agnostic, and do one thing and do it well.
There's a question if we want it to be a binary protocol or a textual one.
And whether we want length fields (context-sensitive language) or delimiters (regular language). From a security POV I'd prefer a regular language, and pushing the escaping onto the client.
@wowaname you can still do it with a regex, so from a mathematical point of view it's much more limited language.
Also, IRC's trailing colon is IMO not that bad a thing.
Also, you can do escaping like this:
then space means only one thing, and it's client's job to figure out where to replace \x20 with space when decoding.
@Wolf480pl lot's of big statements. A few thoughts:
1) Federation is overrated, and in the end it's likely that people will want multiple identities. It's more important to make it easy to support multiple independent systems, than it is to support a single, interlinked system. Most IRC clients got this right in that adding multiple networks is trivial. IRC got it wrong in that identity is very weak.
2) (plain text?) over telnet is a strange limitation. Don't tie yourself to the lowest common denominator and have that dictate shape for you. Debugging via telnet is shit anyway. Better to write good debugging tools, a wireshark dissector, etc.
3) Some of the things you have mentioned feel like solutions looking for problems. Paxos, etc. Some of these problems don't need solved, or have reasonable solutions already.
4) It sounds like it'd get really heavy, really quick. Like, really quick. When does a user start generating log data for later? How long is it retained? I don't think freenode could operate it without a pretty decent ram+storage upgrade.
But it's got legs. Some bits I think could be simpler and some bits I think are constrained needlessly, but solving the account, history, and rich-media problems are absolutely the direction that an IRC inspired chat protocol should look. Failure tolerance, a la netsplitting, is something that should be retained, and you identify that as well.
One of the ideas was that your homeserver would have a higher limit for log size than servers hosting the channel, because there are more homeservers and they have fewer users. So this kinda addresses 1 and 4.
@Wolf480pl except the telnet part (which sounds like a not needed limitation as people already explained it), what you describe makes me think a lot about matrix.
@erdnaxeli you mean what I describe is similar to Matrix?
I'll read the spec and come back to you, and I'm pretty sure I'll be coming with a list of flaws in their protocol.
> There are 3 kinds of history:
A) what happened in the channel before you become a member of it … Now, I believe (A) isn't all that necessary, and may even be harmful from privacy POV.
I said this last time you brought this up, and don't want to belabor the point, but…
(A) is *really* useful for a lot of project/business/organizational chats—it lets a newcomer join a channel and get up to speed quickly.
If you're hoping to replace centralized things (#slack), (A) is a must (imo)
To expand on this a bit/make it more concrete: I recently joined the #rust CLI working group, which communicates via chat methods that have have " what happened in the channel before you become a member of it" history.
I was able to skim through what the group was talking about in the weeks before I joined, giving me a good sense of what the live issues/current projects were. Without that history, it would have taken much more time (from me and others) to get me up to speed
(And the same thing happened with Slack channels I (unfortunately) joined in my last job)
And there wasn't any negative privacy effect, either—that channel was intended for professional discussions within the working group, and everyone understood that the composition of the working group would change and that new members would have access to past conversations. Anything that would have been inappropriate to share with new members would *already* have been inappropriate to say.
@codesections well yeah, there are classes of channels for which having (A) is not a privacy issue. IMO it should be optional.
>I was able to skim through what the group was talking about in the weeks before I joined, giving me a good sense of what the live issues/current projects were.
Why was this not documented in a more persistent place, like a wiki, an issue tracker, a forum, or a mailing list?
@codesections My guess it wasn't documented in a more persistent form because you have (A)-history.
And IMO this is a problem.
If you're forced to document stuff in a more persistent form, you'll better organize it, and have a clearer separation of documentation vs ephemeral discussion.
> Why was this not documented in a more persistent place, like a wiki, an issue tracker, a forum, or a mailing list?
Some of it was. (the CLI working group keeps notes from it's meetings).
Let me switch examples: when I became a Fosstodon mod, I also joined a chat with (A) history, including discussions about how to handle moderation issues.
Could the results of each discussion have been documented? Well, maybe, but not easily
Formalizing that sort of history would require sifting through the conversations and abstracting a series of decisions into general principles. (Or effectively just copying the chat history).
Maybe that work is worth doing, but it's definitely *work*—and it's something that would have to be maintained over time.
It's like docs for code. Yes, external documentation is great, but it can get stale and is a lot of work. There's a lot to be said for self-documenting code
Also you need (A) for compliance reasons, and you need it on server.
Say someone harasses somebody via the company IRC direct message. You need copy of the messages (well slack hides them from export, but I guess they can be obtained).
Or your company needs to provide for court all communication between some parties.
I guess if Facebook used IRC instead of e-mail we'd get one or two scandals less ;)
Yeah, this is also something I'd be interested in helping with, depending on how it goes.
I've been working with pub/sub in Redis lately, which has a pretty good model for this sort of thing (not saying you would/wouldn't want to use Redis, just saying the model is similar)
Welcome to your niu world ! We are a cute and loving international community Ｏ(≧▽≦)Ｏ !