Skip to content

Reconnect in _schedule_reconnect #14201

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: branch-3.8
Choose a base branch
from

Conversation

nojaf
Copy link

@nojaf nojaf commented Dec 18, 2024

Hello, I tried to implement some reconnecting logic.
In my own local experiment, the client was able to reconnect after the server shut down and restart. However, I do think that I'm lacking some after reconnecting. For example, a button click handler was no longer working after reconnecting.

Could I get some early feedback on this, to know if I'm going in the right direction?
I could also use some hint for the testing side of things.

Thanks a bunch!

Copy link

codecov bot commented Dec 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (branch-3.8@d8e81f2). Learn more about missing BASE report.

Additional details and impacted files
@@              Coverage Diff              @@
##             branch-3.8   #14201   +/-   ##
=============================================
  Coverage              ?   93.24%           
=============================================
  Files                 ?      284           
  Lines                 ?    20248           
  Branches              ?        0           
=============================================
  Hits                  ?    18880           
  Misses                ?     1368           
  Partials              ?        0           
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@bryevdv
Copy link
Member

bryevdv commented Dec 19, 2024

Hi @nojaf thanks for your interest in this. I had imagined in the past that "reconnect" would actually mean generating and refreshing an entirely new session on the page, i.e "automatic reloading". For instance, one common reason for a disconnect would be that a session was closed and deleted due to an activity timeout. In that case, with the original session destroyed, there is nothing to "actually reconnect" to. It seems to me that actually re-connecting to the same session could be very difficult for other reasons as well:

  • In the case of load balancers, how to guarantee that the connection lands on the same identical instance that is hosting the session?
  • How are problems with inconsistent state to be addressed, e.g. if a connection dropped in the middle of an update.

I'm not actually sure how or if "actual reconnection to the same session" can be made to work, or how difficult it would be.

If instead, "re-connection" is interpreted to mean "automatic reload" then I think all these questions are sidestepped. That also handles the session timeout case, what we had in mind primarily, which "actual reconnection" would not be able to handle.

cc @bokeh/dev and @jbednar since this is probably of interest to HV.

@philippjfr
Copy link
Contributor

I think both interpretations of re-connect would be valuable. Re-connecting to an existing session would likely have to simply treat the server side state as the source of truth (at least initially) because any attempt at merging state sounds like a rabbit hole. So it probably has to re-request the full session state.

The load balancer issue seems like something that can be solved by correctly configuring session stickiness on the load balancer.

So if it can be made to work it would be nice if we try to restore the server session state and if the session has already been cleaned up the server just creates and sends a new session.

@nojaf
Copy link
Author

nojaf commented Dec 20, 2024

Hi folks!

Re-connecting to an existing session would likely have to simply treat the server side state as the source of truth (at least initially)

Yes, this is what I'm aiming for in this initial implementation.

My company hopes to achieve the following behavior in the Panel:

A user is on a webpage, updates some parameters, leaves the tab open to look at something else, and later returns to find that a WebSocket has disconnected. Previously, this disconnection occurred silently, leaving the user to continue updating parameters without realizing that the page was no longer responding, which could lead to incorrect results. We have now added a disconnect notification for this.

Ideally, we want the state to remain exactly as it was when the user paused their activity. Users updating parameters will not affect the state for other users, so I expect this state to be retained on the server (unless it deletes all session state upon disconnection).

@bryevdv
Copy link
Member

bryevdv commented Dec 20, 2024

Not immediately on disconnect, but there is a periodic cleanup job to purge unused sessions. The parameters are tunable but I don't think you'd want to disable it entirely, since otherwise the server would just leak resources continually.

@mattpap mattpap added this to the 3.x milestone Dec 22, 2024
@nojaf
Copy link
Author

nojaf commented Dec 23, 2024

Hello, everyone! What would be a good next step for me to pursue here?

@nojaf
Copy link
Author

nojaf commented Jan 13, 2025

Hello @mattpap, any pointers for the next steps here?

Comment on lines 47 to 54
// TODO: refactor to connection_lost_permanently
notify_connection_lost(): void {
this.document.event_manager.send_event(new ConnectionLost())
}

// TODO: notify_connection_retry ?
// data: time (ms) to next retry, connection attempt number

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will implement appropriate mechanisms separately from this PR (issues #14252 and #6884 for tracking).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a new DocumentEvent is a perfectly fine start here and then people can set up their own notifications and we can hook up a built-in notification system later.

@mattpap mattpap changed the base branch from branch-3.7 to branch-3.8 March 7, 2025 07:57
@philippjfr
Copy link
Contributor

Can we try to push this along without immediately make it dependent on notification system? I'd suggest @hoxbro looks at this.

@philippjfr
Copy link
Contributor

Basically I suggest that we add Document events for retries for now and only integrate the notification system at a later point.

@mattpap
Copy link
Contributor

mattpap commented Apr 7, 2025

Can we try to push this along without immediately make it dependent on notification system?

Sure. Adding appropriate events should be sufficient.

@hoxbro
Copy link
Contributor

hoxbro commented Apr 23, 2025

I have added ReconnectEvent, which is triggered when a reconnect is successful.


I have noticed even though, the websocket is reconnected, events are no longer send, an example is the below button which updates the text before the reconnect, but no longer works after the reconnect. @mattpap is looking into this.

Screencast.From.2025-04-23.14-35-28.mp4
Script
from bokeh.plotting import curdoc
from bokeh.models import Div, Button
from bokeh.layouts import column

counter = 0

body = Div()

button = Button(label="Click me to update text", button_type="success")


def update_text():
    global counter
    body.text = f"Clicked {counter} times"
    counter += 1


button.on_click(update_text)

update_text()

curdoc().add_root(column(button, body))


from bokeh.events import ReconnectEvent
curdoc().on_event(ReconnectEvent, lambda event: print("reconnect_event", flush=True))

@nojaf
Copy link
Author

nojaf commented Jun 13, 2025

Hi @mattpap, were you ever able to figure out the reconnect problem?

@mattpap
Copy link
Contributor

mattpap commented Jun 17, 2025

were you ever able to figure out the reconnect problem?

@nojaf, not yet, I'm investigating.

@mattpap
Copy link
Contributor

mattpap commented Jun 17, 2025

@philippjfr, @hoxbro, so I was actually mistaken with my earlier assessment. Fortunately the problem ended up being trivial to fix. What happened was, when document was rebuilt with fresh session state after a reconnect, it would be first cleared, which would trigger root removal events and destroy state on the server side. We already have a mechanism to prevent server sync while allowing event distribution within bokehjs, so it's essentially a two line fix.

@mattpap
Copy link
Contributor

mattpap commented Jun 17, 2025

@nojaf, it's fixed now.

@mattpap mattpap removed this from the 3.x milestone Jun 17, 2025
@mattpap mattpap added this to the 3.8 milestone Jun 17, 2025
@philippjfr
Copy link
Contributor

Ah, that makes sense, but there's no way I could have tracked that down.

@mattpap mattpap force-pushed the reconnect-websocket branch from f8c3e27 to e03305b Compare July 18, 2025 12:22
@mattpap mattpap requested a review from philippjfr July 18, 2025 12:23
@mattpap
Copy link
Contributor

mattpap commented Jul 18, 2025

I finalized this PR. Changes:

  • renamed the new event to ClientReconnected to match the naming pattern
  • made the first reconnect instantaneous and reduced delay
  • added tests and fixed token generation in unit tests

@mattpap mattpap marked this pull request as ready for review July 18, 2025 12:50
@philippjfr
Copy link
Contributor

Okay, I'm quite happy with the reconnect logic and am also happy to merge it in the current state. That said, part of what we had discussed originally was that we wouldn't just implement automatic re-connects but also add a final event that's triggered when all re-connects failed. This event would somehow hold a "reconnect" function, which could then be used to pop up a dialog that would allow the user to trigger another re-connect attempt.

@mattpap
Copy link
Contributor

mattpap commented Jul 18, 2025

@philippjfr, what you are describing is what I intend to do separately from this PR. I will start a new issue for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature request] automatic reconnect to server
5 participants