cbournhonesque / lightyear Goto Github PK

View Code? Open in Web Editor NEW

235.0 7.0 23.0 67.39 MB

A networking library to make multiplayer games for the Bevy game engine

Home Page: https://cbournhonesque.github.io/lightyear/book

License: Apache License 2.0

Rust 99.99% Handlebars 0.01%

bevy gamedev multiplayer networking rust

lightyear's Introduction

Lightyear

A library for writing server-authoritative multiplayer games with Bevy. Compatible with wasm via WebTransport.

53faada0-1a3d-476a-8e64-30a4c1502859.mp4

Demo using one server with 2 clients. The entity is predicted (slightly ahead of server) on the controlling client and interpolated (slightly behind server) on the other client. The server only sends updates to clients 10 times per second but the clients still see smooth updates.

Getting started

You can first check out the examples.

To quickly get started, you can follow this tutorial, which re-creates the simple_box example.

You can also find more information in this WIP book.

Features

Ergonomic

Lightyear provides a simple API for sending and receiving messages, and for replicating entities and components:

the user needs to define a shared protocol that defines all the Messages, Components, Inputs that can be sent over the network; as well as the Channels to be used:

// messages
app.add_message::<Message1>(ChannelDirection::Bidirectional);

// inputs
app.add_plugins(InputPlugin::<Inputs>::default());

// components
app.register_component::<PlayerId>(ChannelDirection::ServerToClient)
    .add_prediction::<PlayerId>(ComponentSyncMode::Once)
    .add_interpolation::<PlayerId>(ComponentSyncMode::Once);
    
// channels
app.add_channel::<Channel1>(ChannelSettings {
    mode: ChannelMode::OrderedReliable(ReliableSettings::default()),
    ..default()
});

to enable replication, the user just needs to add a Replicate component to entities that need to be replicated.
all network-related events are accessible via bevy Events: EventReader<MessageEvent<MyMessage>> or EventReader<EntitySpawnEvent>
I provide a certain number of bevy Resources to interact with the library (InputManager, ConnectionManager, TickManager, etc.)

Batteries-included

Transport-agnostic: Lightyear uses a very general Transport trait to send raw data on the network. The trait currently has several implementations:
- UDP sockets
- WebTransport (using QUIC): available on both native and wasm!
- WebSocket: available on both native and wasm!
- Steam: use the SteamWorks SDK to send messages over the Steam network
Serialization
- Lightyear uses bitcode for very compact serialization. It uses bit-packing (a bool will be serialized as a single bit).
Message passing
- Lightyear supports sending packets with different guarantees of ordering and reliability through the use of channels.
- Packet fragmentation (for messages larger than ~1200 bytes) is supported
Input handling
- Lightyear has special handling for player inputs (mouse presses, keyboards). They are buffered every tick on the Client, and lightyear makes sure that the client input for tick N will be processed on tick N on the server. Inputs are protected against packet-loss: each packet will contain the client inputs for the last few frames.
- With the leafwing feature, there is a special integration with the leafwing-input-manager crate, where your leafwing inputs are networked for you!
World Replication
- Entities that have the Replicate component will be automatically replicated to clients. Only the components that change will be sent over the network. This functionality is similar to what bevy_replicon provides.
Advanced replication
- Client-side prediction: with just a one-line change, you can enable client-prediction with rollback on the client, so that your inputs can feel responsive
- Snapshot interpolation: with just a one-line change, you can enable Snapshot interpolation so that entities are smoothly interpolated even if replicated infrequently.
- Client-authoritative replication: you can also replicate entities from the client to the server.
- Pre-spawning predicted entities: you can spawn Predicted entities on the client first, and then transfer the authority to the server. This ensures that the entity is spawned immediately, but will still be controlled by the server.
- Entity mapping: lightyear also supports replicating components/messages that contain references to other entities. The entities will be mapped from the local World to the remote World.
- Interest management: lightyear supports replicating only a subset of the World to clients. Interest management is made flexible by the use of Rooms
- Input Delay: you can add a custom amount of input-delay as a trade-off between having a more responsive game or more mis-predictions
- Bandwidth Management: you can set a cap to the bandwidth for the connection. Then messages will be sent in decreasing order of priority (that you can set yourself), with a priority-accumulation scheme
Configurable
- Lightyear is highly configurable: you can configure the size of the input buffer, the amount of interpolation-delay, the packet send rate, etc. All the configurations are accessible through the ClientConfig and ServerConfig structs.
Observability
- Lightyear uses the tracing and metrics libraries to emit spans and logs around most events ( sending/receiving messages, etc.). The metrics can be exported to Prometheus for analysis.
Examples
- Lightyear has plenty of examples demonstrating all these features, as well as the integration with other bevy crates such as bevy_xpbd_2d

Supported bevy version

Lightyear	Bevy
0.10-0.14	0.13
0.1-0.9	0.12

lightyear's People

Contributors

Stargazers

Watchers

Forkers

mcbernie cubixsystem rbouar ant59 nvzqz nul-led blinkdog doonv molixianggu thebluefish zwazel sschneiders df51d msvbg nuts-rice killidia dpogorzelski schnippl0r mikeder simbleau 15ho

lightyear's Issues

Try adding collisions to the snake example to see how server-authoritative collisions with other interpolated clients feels

Interpolation logic might not be correct when server send_interval is low

I am starting to think that interpolation logic is not correct.

PROBLEM 1:
Let's say that the server sends updates fairly infrequently.
Then you could have en entity getting spawned at tick 200, and then receiving updates, for example for position.
Then the entity gets replicated at tick 210 with the final message looking like:

tick: 210. Spawn = BulletMarker. Transform = pos-210.

Then on the server we will never see the entity getting spawned at tick 200 and then moving to tick 210. Because the only message we are sending is the entity state at tick 210.
In most cases it should be ok, especially the server send rate is fairly fast.
But i wonder if we should make sure (especially for interpolation) that the state of the entity when it was spawned is also sent.
(so at tick 210 we send both Actions-200 and Updates-210 for that replication-group)

PROBLEM 2:

Currently, we sync the component instantly when we receive it.
But if the updates are infrequent, that means that the component is static until we get a next update that we can interpolate towards!
Which is strange for example for interpolated bullets: the bullet was frozen for send_interval and then start moving towards the next interval

Instead we should wait to have 2 updates to interpolate between before inserting the component.

Enable easy replication of Resources

Leafwing ActionState resources show us that it might be valuable to replicate a resource in the same way as a component. (for replicating global action state)

Maybe:

do it only for resources that are also components?
need to provide a ReplicationGroup, so can add a global Replicate::<R> resource to track how the replication is done?

Then we might not need to do any special InputMessage for replicating inputs, it's just a matter of replicating the ActionState Components/Resources

Handle client closing webtransport connection

If the client disconnects, and the netcode timeout duration is longer than the quic timeout duration, the webtransport server gets stuck in a loop returning 'error getting message from webtransport: channel closed'.

Instead, if the quic connection gets closed, netcode should also consider the connection as disconnected.

Or in anycase, netcode should still update correctly and remove the connection at timeout

Do not send entity-despawns to the client who just got disconnected

This causes error! messages in the logs: can't send entity despawn to client who doesn't exist:
2023-12-25T17:44:47.566485Z ERROR lightyear::shared::replication::systems: error sending entity despawn: client id not found

We detected that client disconnected so we are despawning all their entities (this is user-defined, for example here)
but we shouldn't send that to the client who disconnected.

(usually it's sent to that client because we Replicated the entity to them.

Maybe let users override the Replicate value for despawned entities? (i.e. they could modify the replicate value stored in the cache: pub replicate_component_cache: EntityHashMap<Entity, Replicate>,)

Add more unit tests

Tests mainly needed on:

Behaviour when tick wraps around
Replication (clamping latest_tick and last_action), or just replication edge-cases
Sync
Inputs

Reduce bandwidth by doing eventual consistency only after a certain amount of time

Joy — 12/31/2023 3:48 PM on Discord

traditionally the server maintains some bookkeeping per-entity, per-component (sometimes per field/property), i.e. it'll remember the last tick it sent an (entity, component) to a specific client, the last tick that client acknowledged, and the last tick that state changed on the server
and that's what would be used to maintain eventual consistency

imagine your policy only chooses the entities to send but leaves which components to include up to the server
then, if sending a snapshot for tick N
if entity e's component T has tick changed > tick sent, include (e, T) in the snapshot and bump its tick sent to N
later
if this message is delivered (ACK), bump (e, T) tick acknowledged to N
if this message is lost (NACK), reset (e, T) tick sent to tick acknowledged

while not a literal binary diff of the component data, this is still a delta in a sense
but yeah, in the absence of any user policy, the server could prioritize entities in descending order of "staleness"

the ELI5 of this is "the server includes components that changed since its last send attempt"
the > is on "tick sent" and not "tick acknowledged" because we assume that the snapshot will be delivered
(the vast majority of packets do successfully arrive)
and that way, the server doesn't waste bandwidth redundantly spamming the same data
If the snapshot is lost, then it just resets its bookkeeping so that data once again falls under the send criteria

How to do rollback for a component that was predicted updated on client, but didn't change on server?

PROBLEM 1:

the client predicted that the color of the ball changed on the client.
but on server the color of the ball didn't change, so we don't send any update
then there is no rollback to correct this! (since we only rollback when there is a client update)

Being able to not replicate some components for a given Entity

2 options:

add Ignored to the entity
specify a HashSet of ignored ComponentKinds to Replicate

Typo in README.md

https://github.com/cBournhonesque/lightyear/blob/main/README.md#L59

remove -> remote

Plans for wasm / browser support?

Hi, I was wondering what your current plans are for support for wasm. (I saw #14 but the author closed it without a reply.)

I’m browsing what my options are for building a game in bevy with multiplayer support and came across lightyear. I’d like to be able to run it in the browser at some point so having an idea of what you intend or whether there are upstream blockers will help me choose.

No pressure if you don’t have any plans currently or aren’t sure. If you reply, thanks for taking the time to respond.

Integrate with other transports (EOS, Steam sockets)

Integration with other transports (EOS, Steam sockets) might force me to revisit the Transport trait to make it more flexible:

maybe separate client to server and server to client (since server to client has multiple connections, it's a bit different)
instead of using SocketAddr directly for addressing, maybe we might just want to be able to use an id
see if we can support using crossbeam channels as a transport

Bandwidth reduction: delta-compression and sub-component updates

excluding unnecessary fields from replication (would either be unavailable or defaulted at the receiver end)

-> this is already available, it is possible via the Replicate component to specify a component that won't be replicated.
I guess it's also possible to exclude fields from a component by just not serializing it.

updating component fields with change detection instead of the whole component
no need to send component creation data when data matches defaults (not worth it for fields since it would require at least one bit to indicate defaults being used)
delta-compression in general

Support more custom reconciliation strategy

I envision 2 main cases:

in some cases you might not want to perform a rollback even if some components are divergent. You should be able to specify this, maybe for a given entity, don't rollback for a component?
Maybe via something like: replicate.disable_rollback::<C>()
the user might not want to snap directly to the rolled-back state, but they might want to slowly interpolate towards it. In that case we might want to provide: The Previously-Predicted-State, the newly-RolledBack-State, the appropriate ticks.
let users have predicted entities but no rollback/reconciliation

Add a `send_message_to_room` helper method

So that users can send messages to all clients that are present in a room.

The implementation is probably just:

get the list of clients in the room
call send_message_to_target(M, NetworkTarget::Only(clients))

EDIT: actually this is harder than expected because the client has no concept of rooms, only the server knows about the rooms.

feat: Add WebSocket Transport

Server

Client

Provide aliases for Type<Protocol>

After the user creates their own protocol, they would have to use
Replicate<MyProtocol>
Client<MyProtocol>
Server<MyProtocol>

which can be tedious to type.
Provide aliases for them in the protocolize macro

Bandwidth optimization

Several possibilities:

enable setting the update_rate per entity. This would be a Timer on the Replicate component that would indicate if we should even consider this entity for replication.
- enable overriding the update_rate for specific components of the entity (PerComponentReplicationMetadata)
- how to reconcile the update_rate of an entity with the update_rate of the server? Should we just ignore the update_rate of the server? On the extreme, the user can set server_send_interval to 0 and control everything via entity send intervals
once we have decided which updates we want to send:
- have a Priority for each ReplicationGroup
- it can be overriden at any time by the user
- sort all replication-groups by Priority * time_since_last_update (latest_tick).
- put all the updates in the channel, up to the cap? how do we know if we reach the cap, since we haven't serialized the messages at this point?
- how to take into account the other channels during priority?
idea:
- when we buffer a message we need to specify a priority
- default priority is 1.0 for messages, for replication it depends on the Replication component
- there is a default priority per channel as well. Final message priority is message_priority * channel_priority
- message_manager does priority accumulation when trying to add messages to packets. It knows the amount of bytes that it can use.
- when considering which messages to add in the packet, the message_manager checks priority * time_since_buffered ( or can just add the priority score every send tick, same thing).
- Ping channel has infinite priority?
- because of priority we might have more updates for the same replication-group being sent at the same time. I.e. we might have updates for tick 10 and 15 for the same entity group in the buffer. In that case should we just skip sending the update for tick 10? How would we do it?
send bandwidth is measured via the diagnostics? (since it has smoothed average, etc.)

CURRENT STATUS:

added a PriorityManager on the packet sending step.
when all channels prepare a list of packets to send, it sorts them by priority, then checks via the rate-limiter if we can send them
how to compute priority?
- should we accumulate priority every time we try to send?
if can, great!
if we can't:
- should we just drop the message?
  - for unreliable messages: it was unimportant anyway, so it's ok to be dropped.
  - For reliable messages: we will retry sending it later (but we need to retry again with higher priority! so maybe on each send-retry of reliable channels we accumulate the priority?)
  - for unreliable updates: we can afford to drop it, but the priority must be accumulated for that entity if no update was sent -> need a way for the priority-accumulator or the message manager to indicate to the replication-group that the priority should change (i.e. that the message has been sent or not)
    - HOW?
      - one idea: we always generate a message id for each message buffered. If the message is a replication message, we the sender will always know the message id containing the update for the ReplicationGroup. We have a special function buffer_replication_send where we buffer_send the message (EntityAction or EntityUpdate), but also keeps track of which messages are for a replication group. When the message is actually set (i.e. not buffered with priority manager), we send a notification via a channel to the ReplicationSender that the message has been sent; in which case we can set the priority for that replication group to 0 (or to the base value). For each one where we don't get an update that it was sent, we increase the accumulated priority.
  - for reliable actions: we will retry sending it later (so the priority accumulation is done inside the reliable sender?)

QUESTIONS:

I don't really like how the priority accumulation is done differently for EntityActions and reliable messages. The priority doesn't accumulate at the same rate. A solution would be to store the time or tick at which we wanted to send the message. Then we could multiply the priority by the time since we managed to send a message for the entity.
Or in practice, just have a higher priority for reliable channels?
- SOLVED: the priority is accumulated at the same right. Every time we run the SystemSet::Send systems
how to balance an entity's update-rate with the global update-rate? Should the global update-rate serve as a default for each entity's update-rate, but otherwise we can send packets every tick?
- should each channel also have their own update-rate then?
this could break the interpolation logic, since some entities will get updates way less frequently than how the interpolation time is progressing, no?

TODO:

I don't like how we keep sending Updates until we receive the ack. That means we send way more packages than required. We should send an update if the component changes OR if the change_bevy_tick > last_update_ack, and it's been a sufficient time between them. (1.5 RTT?)
When we exceed the bandwidth, nothing gets replicated anymore. why?

Interpolation is broken

Probably because of the change where:

I only spawn the interpolated components if there has been at least 2 updates

Bandwidth reduction: enable setting a replication rate per entity?

Unreal does this with NetUpdateFrequency: https://docs.unrealengine.com/4.27/en-US/InteractiveExperiences/Networking/Actors/ReplicationFlow/

How to reconcile this with the server's server_update_interval?

Client sync issue for prediction/interpolation

There is a bug in the syncing process where the client-time doesn't sync properly with the server after the initial connection.
This causes freezes of the Predicted/Interpolated entities for a certain amount of time at the beginning, as the client speeds-up/speeds-down to sync back to the server.

I noticed that the longer the server has been alive, the longer the freeze period (probably because the sync has a bigger diff to catch up on).

If the difference is more than a certain value (for example 100ms), just snap instead of doing speed up/down?

This is a top priority bug

Think about what to do with client inputs that happen before syncing if finished

Either we discard the inputs, but then all the inputs that happened before syncing will be lost
Or we map them to the current tick and send them, but then there's a risk that the input buffer gets huge?

Enable splitting protocol in multiple places

Currently you need to specify the MessageProtocol and ComponentProtocol as two monolithic enums.
It would be nice if similar to naia we could specify the protocol in multiple places.

either do protocol.register_message, protocol.register_component
- then how do we know what each Message/Component type gets serialized as? I guess it's in ordered of registration; but could be changed in the register function? keep a global map of what each type gets serialized as.
- should we use bevy_reflect?

We should have eventual consistency for server updates

PROBLEM DESCRIPTION:

ball is blue on both server and client. Server received ack from client that ball is blue
the server sends an update because the component has changed (ball became red)
that update got lost
the server doesn't send any more update, because we only send an update when there is a change on the server, so the client still thinks the ball is blue

EXPECTED BEHAVIOUR:

after a while, the ball should turn red on the client

SOLUTION:

we should send an update on the client if:
- the component changed on the server
- OR if the replication group's latest change_tick in bevy_tick (for updates) is > the latest acked bevy_tick, we need to keep trying to send an update.
  - note that this could lead to sending updates multiple times (we keep sending until we get the ACK). Maybe there is a better intermediate solution where we start checking this condition after a certain amount of time?

Revisit pre-prediction

Consider the following scenario:

client pre-spawns bullet
server must confirm if it's valid or not, and get authority over the bullet

Currently the way to do this would be:

spawn the bullet on client-side with a ShouldBePredicted component, and replicates that to the server.
server receives the bullet, creates a new one.
if the bullet is valid, server adds Replicate to replicate it back to clients; or deletes it if it's not valid (to replicate the deletion).

This means that the bullet-spawning logic (if client pressed Fire) is very different on client and on server.
Ideally we would like the spawning logic to be similar.

Here is what unity does: https://docs.unity3d.com/Packages/[email protected]/manual/ghost-spawning.html

client spawns the bullet on input Fire and indicates that this is pre-Predicted
server spawns the bullet on input Fire and replicates to client
when client receives an entity, it checks if it matches any client pre-predicted entities (by comparing archetypes and spawn ticks). If so, it doesn't create a new one but instead re-uses the one it had!

Basically right now we support:

normal spawning
spawning player-controlled predicted entity, similar to unity

Predicted spawning for the client predicted player object: The object is predicted so the input handling applies immediately. Therefore, it doesn't need to be delay spawned. When the snapshot data for this object arrives, the update system applies the data directly to the object and then plays back the local inputs which have happened since that time, and corrects mistakes in the prediction.

but we don't support the 3rd category:

Predicted spawning for player spawned objects: This usually applies to objects that the player spawns, like in-game bullets or rockets that the player fires.

Show how bevy+lightyear can be used to deploy actual games

My end goal is to be able to deploy a multiplayer web game (or if not web, with dedicated servers), with lobbies/arenas/matchmaking.

One potential solution for this is to integrate with Agones.

Pros:

scalable
bevy is abstracted: just needs to be a server container, which hooks up in the SDK for matchmaking, etc.
rust client

Input handling missing features

A) global inputs are being sent from Client to Server, but the server doesn't handle them currently because

I don't know how to represent the input of each client
I don't know how to re-replicate a client's inputs efficiently to other clients. Currently it works just by replicating the ActionState component. A workaround is to create an entity that will have the 'global' ActionState component.

B) how can the server use a similar diff-based strategy to replicate a client's inputs to other clients?

C) in general can we use some diff-based strategy for any component (delta compression)?

D) how can we make sure that we have the correct entity-id in the client input message so that the server knows which local entity of theirs the input corresponds to?

pre-predicted: supported directly because the server entity mapping has pre-predicted <> local
confirmed: supported directly because of the server entity mapping
predicted: not supported (the server does not know about the predicted entity)

Possible sync break after ~500 seconds?

This was reported by a user on discord.

2 possibilities:

the 'no change since ...' tick reported by bevy; might need to send updates every time that happens?
wrapped time wrapping going wrong

Document which replication features do not work for client->server replication

I.e. which fields/features work for replicating from a client to server?

Are rooms supported?

ANSWER: idk

Are the fields:

prediction_target
interpolation_target
replication_target
doing anything? Document this.

ANSWER: no, they are not used.

Provide easy way to replicate hierarchies (parent/children)

We might want:

to replicate all the children attached to a parent entity if add Replicate on the parent (i.e. propagate the replicate component to the parent)

Example:

server has a 'character' + 'weapon' component. Weapon is a child entity.
if we add a Replicate component to character, it should get replicated to the child. (maybe make this configurable based on a field in Replicate?)
then when the character + weapon is deleted on server (via despawn_recursive) for example, the child would get deleted

it's also a pretty minor change, just need to add one system that propagates the Replicate component to a hierarchy + makes sure that they use the same ReplicationGroup (the ReplicationGroup of the parent entity)

Enable adding metadata for how to replicate each component

Currently we have at compile time a ComponentSyncMode. It's useful for it to be at compile time because the SyncMode is only used on client, but we don't want to replicate a SyncMode component through the network.

Still, it might be useful to have some metadata be computed at runtime, because it might be different depending on the entities?

In any case, some metadata that would be useful to have:

in which direction the component is replicated (InputMap should only be replicated from server to client, not client to server)
frequency of replication: Once (don't replicate updates, just insert/removes), Every time, and maybe can specify the frequency? (e.g. ActionMap should only be replicated once)
component sync mode: how do we copy the component from the client Confirmed entity to the Interpolated/Predicted entities

Lag compensation

if other players are interpolated in the past, and current player is in the present, we need lag compensation to register hits
I believe that lag compensation only applies for very fast projectiles
for slow projectiles, 2 options:
- use prespawned player object (spawn them on both the client and server and have server be authoritative)
- spawn the projectile on server, spawn the projectile on client a bit ahead (to move it to predicted timeline), but do not maintain a replication link. Client only sends the input "Shoot" to server and spawns the bullet ahead of him (to account for RTT), server spawns it at the current position of the gun.
what is the best way to do this for fast projectiles?

Explore splitting up the Client/Server resources into more granular resources

Pros:

would enable more parallelism. Right now any system that uses mut Client (such as send-message) cannot run in parallel
- how does parallelism work for Resources that contain other Resources? (for example Client contains ReplicationSender)

Cons:

it is simple for users to just remember to use Client or Server resource.
- although Client.send_message<C, M> is maybe equivalent than just Sender.send::()
What we can do:
- being able to write to separate channels in parallel
Example of parallelism being limited in a pretty big way:

bevy_mod_debugdump shows when multiple systems share the same resource

Should we even improve parallelism? Because at some point we'll need to resolve everything everything anyway.

How to improve parallelism?

use a separate resource per channel?
use Arc on all the group channels? then systems might run in parallel, but we would have contention only on the same replication-group
- some things might not even have to run in parallel, for example update_collect_changes_since_this_tick would apply the same tick-update for each component in the replication group, so we don't need an Arc

Sending needs:

list of conneced clients (read)
mutate the connection...

What is missing for wasm / web clientside Webtransport support?

https://github.com/security-union/yew-webtransport

A similar implementation to this should work just fine or am i missing something important here?

Remove wrapping in WrappedTime

The wrapping means that we have 2 wrapping quantities: WrappedTime and Tick.
It's difficult to convert one quantity to the other.

As we don't need to send the WrappedTime through the network, let's just use the normal Duration

Handle entity owners natively in lightyear

Currently the examples have an ugly "Global" resource that helps you track who is the owner of an entity.

I'm not entirely sure that's needed, because the PlayerId component should contain the client-id currently.

But in any case it might be better if lightyear adds a component/resource called Owner that helps track natively which client owns which entity.

LeafwingInput::ActionState seems to be imported for the `protocolize` macro to work

In particular because of ComponentProtocol.
Investigate why, this shouldn't be required.

Update tutorial

Update tutorial to match the code:

need to always send Inputs::None
add more links to other sections
replace Client
with Client

feat: Multi Transport Servers (Cross Platform Dev)

Notes

Allows for clients with multiple backgrounds to connect to the same game server (eg. Web, Native and Steam) which enables easy cross platform development

Bullet prespawn example seems to have issues when a client disconnects and reconnects

The logs show messages like

received input message for unrecognized entity entity=3v0

All the other examples seem to work correctly.

Initial thoughts:

I thought a problem was that the initial PrePredicted message from client could be lost, but that shouldn't be the case because it will get retried reliably
We seem to get to a state where the server doesn't spawn any new players anymore at all. Server still receives messages because we get received input message for unrecognized entity entity=4v0 but the ReplicationReceiver didn't seem to receive any spawn message
Weird stuff:
- client 4 connects, spawns 2 balls
- client 1 connects.
  - server sends 1 EntityAction messages that contains all ball spawns (in replication group 4)
  - then server sends a SEPARATE entity-actions messages that adds PreSpawnedPlayerObject component to ball, and ShouldBePredicted to client 4. Why are those being sent now? The entities should already be predicted, etc.

BUG 1

When the bug happens, server doesn't receive the input messages at all
- actually, the client is not sending the input messages either
- the action-diff-buffer has a wrong start tick, which is much farther than the current tick
- for some reason sometimes the tick-sync system runs AFTER prepare-message, so we get:
  - SYNC: tick becomes 4000
  - PREPARE-INPUT-MESSAGE: the input_buffer start_tick is set to 4000 (at interpolation tick)
  - TICK-EVENT: the input_buffer start_tick is set to 8000.

FIXED? the input_leafwing system set ordering was wrong!

BUG 2

the player entity doesn't even get spawned on server

This seems to be when the server is beyond u32::MAX / 2 tick?
We receive the client's pre-prediction message with a non-synced tick: 80, even though the server tick is 50k.

FIXED? I think we need to only send replication messages after server is synced

BUG 3

server receives the pre-predicted entity, sends actions back
client receives the predicted entity, but somehow the ack doesn't go through, so the server keeps sending updates for Color/Transform/PlayerId

No buffer allocation on the hot path

Joy: I only meant that the protocol shouldn’t allocate Vec on the heap to hold packet/message content as a step of every send/recv. You want it to be as lean as possible. I said nothing about serialization.
Like, the protocol needs access to buffers on the heap to hold fragment content for reassembly or retransmission (and also certain kinds of channel multiplexing), but it should get those from a pool, not allocate them inline.
You don’t want “allocate on the heap” in the hot path.

Current unnecessary copies:

PacketReceiver signature is fn recv(&mut self) -> Result<Option<(&mut [u8], SocketAddr)>> {

So for example UdpSocket has an internal self.buffer where I copy the bytes received, and then I create a new ReadWordBuffer allocation that copies the bytes from the self.buffer into the ReadWordBuffer.
But instead of creating a new allocation everytime, we should have a pool of ReadWordBuffers that can be re-used

Improving sync and time_manager

Some facts:

the RealTime and VirtualTime are updated once at First
the FixedTime (including overstep) are updated once at RunFixedUpdateLoop
we reset latest_server_tick and duration_since_latest_tick in the PreUpdate schedule
we update most sync quantities (prediction-time, estimated-server-time) in the PostUpdate schedule
We need RealTime for some uses:
- clearing buffers in channels / packet header manager
- the pong receive/send time (for correct RTT computation)
- send_intervals (because we want bandwidth to be fixed)
We need VirtualTime for some cases:
- to speedup/slowdown virtual time so that client does through ticks faster/slower than server (so that client tick and server tick are aligned correctly)
- to compute the generation, we want the server's Tick and VirtualTime I believe?

Changes needed:

the overstep in TimeManager must be computed right after RunFixedUpdateLoop
maybe we want to update the duration_since_last_tick not by using virtual.delta() but by taking into account the real time elapsed since we received it? So just storing the Instant of receiving, and then computing the duration since then?
for the prediction-time, we need to take into account the overstep

Add example with physics and more chances of rollback

Current examples don't really have rollback unless the networking is bad, because there is no interactions with the environment so there is no possibility of the client predicting something completely wrong (for example that they kicked the ball, when in fact the second player kicked it)
Try an example that is like a 2D rocket-league:
- multiple players
- a ball
- they are all predicted by all clients
See how this performs, and if we need to interpolate the rollback result instead of completely snapping back to the new state.
Rewatch the rocket-league GDC video

Add option to run/not-run the input plugins

There are some cases where you don't want to run the InputPlugin in some situations:

for example if the player is dead, you don't want to handle the movement inputs.

It would be nice if there was an option to disable the plugin under some conditions.

One possibility is to put all the systems from the plugin under a given SystemSet, so that the user can configure the SystemSet however they want.
(it's possible to have generic system sets

    #[derive(SystemSet, Debug, Hash, PartialEq, Eq, Clone, Copy)]
    pub enum Set<T> {
        A,
        Marker(T)
    }
```)

Replication receive might break after u16::MAX ticks

In replication-receive we have the following logic:

only accept updates that are later than channel.latest_tick (the reasoning is that we don't want to apply old updates, only the latest ones)
only accepts updates once the channel.latest_tick is >= the update's last_action_tick (set on the send side)

I can see several problems with that:

imagine channel.latest_tick is 0, then we get an update on tick 40k. 40k is considered SOONER than 0 (because of tick wrapping) so we don't accept the update!
- potential solutions:
  - change the Tick to u32
  - after a certain time (something like interpolation_time), we are sure to not receive remote updates anymore, so we can update the channel.latest_tick for all channels if it's too old. But be careful that it doesn't trigger Confirmed.tick getting updated! It should be fine because confirmed.tick only gets updated for all the entities in the replication-group when we receive a message for them.
    The problem is that we need to update the last_action_tick as well on the same side, and it's hard to keep them synchronized. For the update to be valid we need the channel.latest_tick >= update's.latest_action_tick

feat: Implement a way to close transports at will

simple_box example stops working after 9 minutes

server/client interaction stops working properly after next step.

use lightyear example simple_box

run server
run client
do some moves periodically (or client will closed with panic)

thread 'Compute Task Pool (1)' panicked at /home/user/git/lightyear/lightyear/src/inputs/native/input_buffer.rs:98:18:
attempt to subtract with overflow
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Encountered a panic in system `lightyear::client::input::write_input_event<simple_box::protocol::myprotocol_module::MyProtocol>`!
Encountered a panic in system `bevy_time::fixed::run_fixed_update_schedule`!
Encountered a panic in system `bevy_app::main_schedule::Main::run_main`!

after nine minutes +- fail happens . always, after that looks like confirmed position does not updated on client side properly

2024-01-13.20-31-46.webm

Maybe add a Direction setting to the replicate component for easier usage

In some cases you might want to have the same system to add Replicate on a component on both client and server; but you actually only want to replicate in one direction.

Instead of having to fiddle with the replication-target or removing the Replicate component on either client or server, maybe I can add a setting Direction to specify if the replication is done on both sides, or if one direction is ignored.

Arithmetic Overflow in v0.3.0

I am using lightyear v0.3.0 (haven't tested v0.4.0 yet). If passing u64::MAX to Authentication::Manual, I get an overflow in multiplication:

2023-12-22T08:07:52.691052Z  INFO bevy_diagnostic::system_information_diagnostics_plugin::internal: SystemInfo { os: "MacOS 14.2 ", kernel: "23.2.0", cpu: "Apple M2 Pro", core_count: "12", memory: "32.0 GiB" }
2023-12-22T08:07:58.461873Z  INFO lightyear::server::resource: New connection from 127.0.0.1:24510 (id: 18446744073709551615)
thread 'Compute Task Pool (4)' panicked at /rustc/1a06ac5b5d7c9331e8de1aa1fd7e9d3533034b44/library/core/src/ops/arith.rs:346:1:
attempt to multiply with overflow

backtrace.txt

Send input using the same mechanism as ComponentUpdate replication

For ComponentUpdate replication, we send all updates since the last ACK-ed update.

For leafwing inputs we currently have an issue where if a diff like JustPressed is lost, the server/client states could diverge a lot.
Instead we want to compute all diffs since the latest ACK-ed component update.

In fact, we might want to not send inputs as a special message at all, but instead add delta-compression to component replication, so that we just replicate ActionState as a normal component? I think it would work, because we already have mechanisms in place to buffer the updates at the correct tick. But maybe not? since we always apply the latest update, and it might not be what we want in this case
We always compute the diff since the latest ACK-ed update.