Durable execution is in technical preview mode. Please contact us on Slack
if you’d like to ask a question or try it out.
Request Replay API
You can use Request Replay API to restart a failed Tensorlake application request from where it failed without re-executing the previously successful Tensorlake function calls in it.Application code upgrade
When request gets replayed it runs the same application code version as in the previous run. You can upgrade it to the latest application code version by passing--json '{ "upgrade_to_latest_version": true }' in HTTP replay API call or passing request.replay(upgrade_to_latest_version=True) in Python. This is handy if you fixed
a bug in your application code and want to re-run the request with the fix applied. If you replay with a code upgrade, please ensure that the latest application code
can handle the original request inputs. This typically requires backward compatibility implemented at your application function parameters level.
Replay modes
Tensorlake detects when a replayed request follows a different execution path comparing to the original request run or any its past replays. For example, a replayed request may execute a new function call if it uses a random number generator to do it:- Conditional execution of code depending on current time, database state, values returned by external APIs, etc.
- Changing order of Tensorlake function calls depending on duration of external API calls, LLM calls, etc (aka race conditions).
- Change of Tensorlake function calls in upgraded application code.
Adaptive replay
By default, Tensorlake uses adaptive replay. In this mode, all new Tensorlake function calls are allowed to execute, even if the replayed request doesn’t run some function calls that were executed in the original request run or in previous replayed runs. This mode is useful when the user just wants to re-run the request from where it failed without being concerned about potential behavioral changes or non-determinism in their application code. To explicitly enable adaptive replay, pass--json '{ "mode": "adaptive" }' in HTTP replay API call or pass request.replay(mode=ReplayMode.ADAPTIVE) in Python.
This is not necessary since adaptive replay is the default mode.
Strict replay
In this mode, if a new Tensorlake function call is detected during the request replay and one or more Tensorlake function call from the original request run or from previous replayed runs are not executed in the current replayed run, then the request replay fails with aReplayError. This mode is useful when the
user wants to ensure that the request behavior remains the same during replays. i.e. that all the resources claimed during the original request run are reused
during the replayed run without claiming more resources again (i.e. to not redo cross-service transactions).
To enable strict replay, pass --json '{ "mode": "strict" }' in HTTP replay API call or pass request.replay(mode=ReplayMode.STRICT) in Python.
How function calls are matched
Tensorlake makes a fingerprint of every Tensorlake function call made in an application request. It then compares fingerprints of new function calls made during a request replay with fingerprints of previously executed function calls in the same request to determine whether the function call has been made previously. A function call fingerprint includes:- Function call type (i.e. “function_call”, “map”, “reduce”).
- Function name.
- Parent function call fingerprint.
- Function call sequence number in the parent function call.
- Other information to ensure that changes in function call tree structures are detected.
- Changing function parameters in application code doesn’t affect replay behavior. A new function call with different parameters still matches the previous function call. This enables seemless application code upgrades without affecting replays.
- Passing different values (e.g., random numbers, current time) as function parameters doesn’t affect replay behavior. A function call with a different random number passed into it still matches its previous function call where the random number was different.
- If sequence of function calls changed in the latest application code then the replayed function calls will not match the previous function calls. In this case the replay behavior depends on the selected replay mode (adaptive or strict).
- If function calls are started in an arbitrary order (i.e. with a random delay) then the order of function calls would differ between the original request run and the replayed run even without application code changes. In this case the replay behavior depends on the selected replay mode (adaptive or strict). Application code should avoid arbitrary function call ordering to ensure consistent behavior during request replays and reuse of previously completed work.
Automatic retries
When a Tensorlake function call gets retried automatically, it uses the same durable execution mechanism to re-use outputs of previously successful Tensorlake function calls from the same request. In this case, adaptive replay mode is always used.Disabling durable execution
Durable execution is enabled by default for all Tensorlake functions. You can disable it for a function by setting thedurable attribute to False in the @function decorator.
Best practices for durable Tensorlake applications
- Wrap every external call (LLM, API, database, etc.) in a Tensorlake function to make these calls durable and avoid repeating work. If a framework is doing these calls then use framework customization points (e.g., callbacks, hooks, decorators, etc.) to wrap the calls in Tensorlake functions.
- Design your application code to be deterministic to ensure that replays follow the same execution path and thus reuse previously finished work.
- If your Tensorlake functions have external side effects (e.g., sending emails, modifying databases), ensure that these side effects are idempotent or can be safely retried without causing issues.
- Disable durability for functions that must always run on request replay or retry.
- If strict mode and code upgrade to latest are used in a replay then the latest application code needs to be fully backward compatible with the original request code to avoid failing the replay.
Human in the loop and external events
The replay API can be used for resuming requests that timed out while waiting for external inputs (e.g., human review, external event).wait_approval function times out after waiting for 5 minutes. The wait can be resumed once the approval is granted by replaying the request
using the Request Replay API. The replayed request will skip the already completed create_approval function call and re-execute the
wait_approval function call which will now be able to complete successfully.
Comparison with Temporal
Both Tensorlake and Temporal provide durable execution, they achieve it through different architectures. Temporal relies on event history replay, whereas Tensorlake saves and retrieves function outputs and matches function calls using their fingerprints. This removes many constraints that Temporal imposes on application code.| Feature | Tensorlake Applications | Temporal |
|---|---|---|
| User Code Constraints | Adaptive. By default, a replayed request can change its function calls. | Strict Determinism. Workflow logic must be perfectly deterministic or replay crashes. |
| Handling Code Updates | Adaptive. By default, Tensorlake adapts to new code. New function calls execute normally, and removed function calls are ignored. No special versioning logic is required. | Complex. Requires explicit “Versioning” logic (workflow.patched()) or creating new task queues to prevent “Non-Determinism Errors” when replays encounter new code. |
| History Limits | Unlimited. There are no event history size limits. You can have infinite loops or long-running applications without resetting execution state. | Limited. Event history has hard size limits (typically 50K events). Large loops or long-running workflows must use “Continue-As-New” to truncate history. |
| Replay Behavior | Adaptive. By default, if the code execution path deviates, Tensorlake simply executes the new path while reusing cached outputs where possible. | Strict. If the code execution path deviates from the saved history, the workflow fails (Block/Retry loop). |
| Code Failures | Fails Fast. If a function fails and runs out of retries, the request fails immediately, allowing you to debug and Replay it later when fixed. | Blocks and Retries. If a workflow task fails (e.g., a bug in logic), it blocks and retries indefinitely until fixed. |
| Code Structure | Flexible. You can structure your application code freely, using any programming constructs without worrying about replay constraints. | Constrained. You must split the code into workflows and activities and use them carefully to avoid non-determinism and ensure replayability. |
| Strict Replay Mode | Available. You can enable strict replay mode to enforce exact function calls matching during replays to avoid non-determinism. | Available. Temporal always enforces strict determinism in workflow code |
| Non-durable Functions | Supported. You can disable durability for specific functions that must always run fresh on replays or retries. | Not Supported. All external data must be recorded in history. Retrieving fresh data during replay is generally forbidden to prevent non-determinism. |