Giter VIP home page Giter VIP logo

Comments (26)

gritzko avatar gritzko commented on June 2, 2024 4

You may wonder why I obsess over the protocol that much.
The reason is that software is built like this:

upside-down-pyramid

At the foundation of everything, there is mov, cmp and jmp.

So, every small change to the protocol reverberates further up the stack, making things simple or complex.

from swarm.

gritzko avatar gritzko commented on June 2, 2024 3

Hello!

I was traveling for last week, catching up now.

On GitHub master, I keep the recentmost version that passes all tests.
You may also check the head branch, especially this lovely picture:
https://github.com/gritzko/swarm/blob/199e6ebb7f588285e4f3e679dc68b129b05c6bfc/doc/opstream-arch.png

Currently, (A) I am figuring out GraphQL-like data subgraph queries, that is quite a challenge. I am not sure whether I should bundle "yet one more feature" this time.

(B) I am also tempted to change op syntax from my the historical one (dates back to 2011-2012) to something a little bit more intuitive. This change is mostly sugar.

My format: /type#birth+author!time+source.opname (see protocol spec and examples )

URL/filename friendly: #birth-author.type@time-source:op
Reasons for that:

  • an object id looks like a filename now, e.g. 1D4IK-Rgritzko01.json (timecreated-author.type)
  • ! for version is not bash-friendly and I want the power of that spells on my command line. For a timestamp/version id, I plan to reuse npm-like @-syntax. Although, I would prefer a different character than @. Unfortunately, git has no convention for that, they just use a hash as a positional argument (right?)
  • the entire thing looks like an URL hash organically.
  • minuses are way more URL- and path-frindly than pluses
  • the dot went to the data type, so the op name gets : instead.

There is yet another pending change that makes the protocol stricter and more concise: making it explicit which location inside the object the change is happening at. (As most data types have to ship that in the value anyway, I move it to the key.)

What is your perspective?
@Tamriel, what do you think of GraphQL?

from swarm.

gritzko avatar gritzko commented on June 2, 2024 3

@aravantv Hi Vincent.
I am doing my best to make a release.
Can I count on you to proofread the code against the spec in about ~1 week from now?

from swarm.

Tamriel avatar Tamriel commented on June 2, 2024 3

I was very interested in this project, too. I am happy I researched further and found crjdt. It's based on a paper from a research team and proved to be correct. I successfully used it in a Scala.js web application (a collaborative editable text field).

from swarm.

gritzko avatar gritzko commented on June 2, 2024

Activity in the repo has paused because I switched to the Java version. It is much simpler to refactor code in a typed language (and IntelliJ IDEA helps a lot).
I am really afraid of mentioning any release dates because I fail at that 100% of the time. Our only release so far happened by accident when Dan Abramov submitted a leaked post to HN.
On 5th of June I am presenting at HolyJS in SPb, so I have to make an interim release in two weeks...

"The most minimal initial marks" is best described as a "replicated partially ordered log platform". Everything else can be built on top of that log, incrementally. Recently I was obsessing with the write path performance and I mostly like the results. Basically, in a system like Swarm load patterns are different from SQL databases and linear log systems (eg Kafka). Clients submit thousands of small subscriptions, so read and write paths have to be separated. Otherwise, one client can stall the system for 1 second with just 10KB of subscriptions (~1000 round-trips to HDD).

Once the log sync part works correctly, I will release it. At that point, it will be possible to split the work between many people.

from swarm.

gritzko avatar gritzko commented on June 2, 2024

sounds like you're a couple months away from a first 'production' version, maybe a few weeks from a basic 'functioning' version ?

Yes, this is my estimation.

you really should take a look at TypeScript

Yes, I lean towards TypeScript too.
http://staltz.com/all-js-libraries-should-be-authored-in-typescript.html

Java to TypeScript/Javascript transpiler

This will not work, the Java version uses 64-bit long for various identifiers (there are lots of those). In JavaScript, it is easier to use Base64 strings.

understanding Swarm is still being developed, what might you recommend we start with

I may release the js core for the new protocol, which is essentially, math. Then, you may integrate it with WebStorage/IndexedDB/WebSocket, auth, you-name-it.
At this point, reading the sources gives some idea of how things work, but the current code is not playable. stamp and syncable packages stabilized long ago. replica needs work.

how you envision Swarm integrating with such systems

Swarm is essentially a shared-log data-bus system, like Kafka. The key difference is that Swarm works with partially ordered logs. It also has per-object subscriptions for client-side use.
Its own storage is either a raw file ("pure" servers) or an ordered key-value database (clients and client-facing servers).
So, the model for integration is Kafka-like: read from Swarm, write to Swarm.
One-way integration is rather straightforward: tail the log, write to db.
Two-way is a different story, but also doable.

from swarm.

Tamriel avatar Tamriel commented on June 2, 2024

You worked on the protocol. How is the current state of the project now? What needs to be done yet? I would like to participate, if the way is clear.

from swarm.

gritzko avatar gritzko commented on June 2, 2024

@phestermcs Thanks for the response!

I do not intend to implement GraphQL per se. In my understanding, its many idiosyncrasies are specific to the Facebook platform. Still, (1) async retrieval of every object is a bit tiresome and (2) full-database retrieval is a bit too limiting. I know some folks who made full-db retrieval the only option. That caused some very predictable issues. I plan to implement object subgraph retrieval using path expressions. Like /*/*/* (load everything, three levels deep).

I want the specifier syntax to be a very basic DML in itself. Use cases are:

  1. op transmission (100% by the specification)
  2. op storage (may have slight tweaks)
  3. command-line use for object/op queries and manual data edits (e.g. @~1d means "yesterday", or /users/john:diet="Vegetarian" means mark John a vegetarian)
  4. URL use ("routing" in frontend parlance) - so a web app can reflect the point the user navigated to (aka deep linking).
  5. data/event firehose filtering (#~snowden.user:location)

The positional notation only works for the first case. But yes, it is an internal detail, I may change it at will. I just want it to be somewhat readable: here is the object id, here is the author's login, here is the data type.

Ideally, spec expressions must be compatible with path expressions: /apartments/3r@~1d/0-100 (100 recentmost 3-room apartments as of yesterday) or /apartments/3r/district="Central".
Well, I definitely do not want to go into the RethinkDB territory. At the moment, I am just trying to keep all those options open.

from swarm.

gritzko avatar gritzko commented on June 2, 2024

@phestermcs It depends. Swarm is a distributed real-time object syncing system that spans to the client side.
Swarm is protocol-first not business-first.
If you consider path expressions, Swarm is a-web-of-objects, not a JSON tree. Strictly speaking, a Swarm database cannot be serialized as JSON as one object may be linked to different parts of the tree and also cycles are possible. There is a web, a tree is just a local view.
Probably, I need to revisit Firebase to give a better comparison. What aspects are you most interested in?

from swarm.

gritzko avatar gritzko commented on June 2, 2024

I think, all the differences can be summarized this way: Swarm is build on proper distributed system primitives (Lamport stamps, CRDTs, web of objects) to make it scale way beyond Firebase use cases. I just don't feel comfortable discussing that level of ambition :)

from swarm.

ai avatar ai commented on June 2, 2024

I think too, that GraphQL as protocol per-se is not ideal. It is good idea, but we could improve it.

Also I think nobody care about GraphQL it-self. The main idea is Relay and same API.

And I think all future log-based solutions should have Relay-based API, because it is awesome.

from swarm.

tonsky avatar tonsky commented on June 2, 2024

In our experience firebase scales like crazy (10's of thousands of concurrent users, delivery of changes to client's faster than can all be repainted by the UI library and DOM to a screen), so I'm curious how you seeing swarm scaling beyond firebase's use cases?

Firebase is last-write-win, so it scales fine while multiple users work on different documents, or for situations of one writer-multiple observers. Once you need realtime collaboration and/or offline work, you‘ll have to deal with conflicts, and Firebase doesn’t offer anything there

from swarm.

fckt avatar fckt commented on June 2, 2024

Subjectively, the whole discussion makes me think: GraphQL's is about "an efficient read as a top priority, and write as an additional feature", Swarm focus is "fast and conflict-less write and smooth sync", which lacks (as far as I know) "vacuum" feature, which makes it sometimes dangerous (I guess) from perspective of resources consuming (network/storage overuse).

Why people tend to choose GraphQL today (besides of FB marketing efforts), because it feels more predictable (comfortable) for them, "easy" in other words. As I have an experience with GQL/Relay stack, I can state, that at some point ppl are find that GQL is really easy, but as they move they stuck with complications and the need to make a lot of various, subjective and not (really) important decisions, which lead them to the huge mess. It's just people, they need easy solutions, not everyone thinks too far..

Oppositely, CRDTs are simple (in contrast to the GQL/Relay stack), but not easy to reason at the glance. That's why the technology has a slow adoption, IMO..

from swarm.

tonsky avatar tonsky commented on June 2, 2024

@phestermcs right, transaction is kind of compare-and-swap which is good, although involves additional round-trip to the client. But can you span it across multiple keys? E.g. update user name and add new post for that user only if update succeeded?

Not sure how to use security rules to detect conflicts. Can you give an example?

What does LRW stands for?

from swarm.

gritzko avatar gritzko commented on June 2, 2024

I'm curious how you seeing swarm scaling beyond firebase's use cases?

Causally consistent architectures can scale virtually infinitely, which feature I want to exploit to the fullest.

from swarm.

gritzko avatar gritzko commented on June 2, 2024

a fat enough pipe to go from the authority to all the replicas

Swarm has no "authority". It has "peers" that have a full log and clients who have a subset of data of their choosing. Hence, fat pipes are only needed for peers/servers.

from swarm.

gritzko avatar gritzko commented on June 2, 2024

A swarm id is 128 bit, 64 bit for the timestamp, 64 bit for the replica.
An IPv4 address is 32 bits.
Last time I checked, an index of the entire Web fitted into a single server. Because it's text.
Imagine that yourself then.
Wikipedia, but better.
Twitter, but better.
Content-addressed Web.
Whatever.
We liberate the data from its storage location.

from swarm.

gritzko avatar gritzko commented on June 2, 2024

An IPv4 address is 32 bits.
Which we're almost out of btw, hello ipv6? But how does that relate to swarm scaling vs firebase?

64 bits are twice as scalable as 32 bits. Isn't it obvious?

from swarm.

gritzko avatar gritzko commented on June 2, 2024

It is sort of well-known paradox that the classic bookkeeping is eventually consistent.
Classic bookkeeping is ~500 years old while ACID databases are younger than 50. So, the former can certainly function without the latter, right?

If required, all the linearization semantics can always be wrapped into some request-response interface. Why not.

from swarm.

gritzko avatar gritzko commented on June 2, 2024

Administrativia: I cleared the thread of @phestermcs posts that did not fit on my 32" monitor (3 or 4 of those), I also blocked the author.

What are swarms ideal use cases?

Collaborative apps on mobile devices.

I see Swarm as a very general protocol for causally-consistent data sync. A lot can be built on top of that.

How would swarm scale firebase's use cases better than firebase?

Should it?

Will it ever be done?

It evolved a lot over all those years.
After all, it was lots of research.
But yes, it's time to make a stable release.

from swarm.

aravantv avatar aravantv commented on June 2, 2024

Hi,

it is a little bit hard to understand out of the conversation the answer to the original question (seems some parts were removed?).

Can I summarize it as follows:

  • for now there is no official release
  • there is no up-to-date documentation
  • there is actually no running code either

Am I right? If so, any estimation on when you will get just running code?

from swarm.

aravantv avatar aravantv commented on June 2, 2024

Happy to read!
I'd be happy to test.
Best way for me would be at first if you write a small example/tutorial, I try to reproduce it, then give you feedback what works, what doesn't. And progressively go into more detail. When the testing is convincing I could have a look at the code as well!

from swarm.

ahukin avatar ahukin commented on June 2, 2024

G'day
I recently became interested in off-line first development and mobile databases with automated sync. Our team is using Xamarin and there weren't many choices and most of them I wasn't too keen on. I have been a project leader / analyst and haven't done a lot of coding for a long time and when I saw swarm I thought "Here's something I could sharpen my skills on by converting it to C# and I might learn about CRDTs and a bit of JavaScript on the way." Stupidly, I didn't read the open issues and notice that you were already refactoring to Java. So, I've spent quite a bit of time struggling with converting JavaScript to C#, trying not to use dynamic typing, etc without always fully understanding the intention of the code. I must say, I am surprised that anybody can write anything significant in plain JavaScript.

Anyway, I got to a tough patch, had a look at the open issues and have ended up here. So, I'm wondering how the Java refactoring is going? I'm happy to work on a C# version (much easier to convert from Java). I can also review Java code if you need it but I haven't used Java for a while. I'm treating this all as a skill building exercise that may be useful to our team or it may just help me.

So, how's it going?

regards
Andrew

from swarm.

gritzko avatar gritzko commented on June 2, 2024

State of the things: the protocol is stable for a long time already. We mostly obsessed about making things production-ready. Since November, things became way more algebraic/functional-programming style. It also gets simplified, which is the most exciting thing for me personally.
Swarm is an open-source research project, in fact. There is no implementation plan, 0% to 100%. We hope to make a release "soon" :|

@ahukin Thanks a lot, C# is very relevant. Swarm is a protocol for multi-platform synchronization, hence all implementations are relevant.

@Tamriel I greatly respect Mr Kleppmann and the team, but CRDT JSON is a bit too rigid for my goals. If it works for you, go ahead. Regarding proofs, it all grows from the same root (see papers on the Woot/RGA/Causal Trees family of algorithms).

from swarm.

ahukin avatar ahukin commented on June 2, 2024

I'm happy to start working on the C# translation from your Java before release, if you wish. It will give me something to do.

from swarm.

gritzko avatar gritzko commented on June 2, 2024

I'm about to release 2.0.
That is the protocol and a proof-of-concept implementation.
I welcome reviewers (who can actually read the code).

from swarm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.