[0.2] Initial batch of 0.2.3 backports#4683
Conversation
Previously, we logged "Persisting LiquidityManager..." on each background processor wakeup, which can be very spammy, even on TRACE level. Here, we opt to only log if something actually needed to be repersisted and we did so (in case of failure we're logging that anyways, too). Backport of 369ea98
When we make an MPP claim we push RAA blockers for each chanel to ensure we don't allow any single channel to make too much progress until all channels have the preimage durably on disk. We don't have to store those RAA blockers on disk in the ChannelManager as there's no point - if the ChannelManager gets to disk with the RAA blockers it also brought with it the pending ChannelMonitorUpdates that contain the preimages and will now be replayed, ensuring the preimage makes it to all ChannelMonitors. However, just because those RAA blockers dissapear on reload doesn't mean the implications of them does too - if a later ChannelMonitorUpdate was blocked in the channel we don't have logic to unblock it on startup. Here we add such logic, simply attempting to unblock all blocked `ChannelMonitorUpdate`s that existed on startup. Code written by Claude. Fixes lightningdevkit#4518 Backport of b0c312d Conflicts resolved in: * lightning/src/ln/channelmanager.rs
Add a characterization test for a claimed MPP payment whose preimage monitor updates are only partially persisted before restart. The test drives both channels through a held fee-update commitment dance, claims with async monitor persistence, reloads one fresh and one stale monitor, and verifies that we don't leave a sender-side HTLC stuck after reconnect. Backport of 01d55dc Conflicts due to different channel sorting and holding cell free timing resolved in: * lightning/src/ln/reload_tests.rs
In a very specific case, forgetting to do so can lead to a debug assertion failure when we see a double-claim of an HTLC (see the included test). Found by @joostjager's work on growing the chanmon_consistency fuzzer. Backport of f14b4b2 Trivial silent conflict resolved in: * lightning/src/ln/functional_tests.rs
It seems we forgot to ensure `OffersMessageHandler::best_block` is consistently updated, leading to us building invalid blinded payment paths for short-lived payment paths after two weeks without restart. Backport of 417b065 Trivial silent conflict resolved in: * lightning/src/offers/flow.rs
Backport of 6bf2352 Conflicts resolved in: * lightning/src/chain/chainmonitor.rs
Incomplete keysend MPPs skipped the receive timeout path, allowing partial payments to hold HTLC slots until CLTV expiry instead of failing after `MPP_TIMEOUT_TICKS`. Apply the existing `total_mpp_amount_msat` completeness check to all MPP receives and add a regression test covering the keysend case. The timeout logic was originally added only for invoice-backed MPPs in 2022, and that invoice-only guard remained when receive-side MPP keysend support landed in 2023, leaving this gap latent until now. Co-Authored-By: HAL 9000 Backport of fd8846b Conflicts resolved in: * lightning/src/ln/channelmanager.rs * lightning/src/ln/payment_tests.rs
`GetHistoryRes::height` from electrum-client is a *signed* integer. Here we first check for `<= 0` *before* casting to `u32`. Signed-off-by: Elias Rohrer <dev@tnull.de> Backport of 8b383bb
`OMDomainResolver` rate-limits in-flight DNSSEC proof builds via a `pending_query_count` counter capped at `MAX_PENDING_RESPONSES` (1024). The counter was only released when the proof build succeeded, so any failure mode -- NXDOMAIN, insecure zones, unreachable resolvers, I/O timeouts, malformed names -- permanently consumed a slot. Because the queried name is attacker-controlled (it travels in over a `DNSSECQuery` onion message from any LN peer, given DNS resolution is an opt-in network-advertised feature), an adversary could exhaust the counter with ~1025 failing queries and persistently DoS the resolver for any subsequent legitimate BIP-353 lookups, until the process is restarted. Always release the slot once the proof build completes, regardless of outcome, and add a regression test which points the resolver at a TCP-refusing local port and asserts the counter returns to zero. Co-Authored-By: HAL 9000 Backport of fb4103d Trivial conflicts resolved in: * lightning-dns-resolver/src/lib.rs
`can_support_additional_anchor_channel` decides whether the wallet has enough on-chain reserve to back another anchor channel by counting the node's existing anchor channels. The classification only checked the `anchors_zero_fee_htlc_tx` feature, so channels negotiated with the `anchor_zero_fee_commitments` (TRUC / 0FC, option 41) variant — which require the same on-chain reserve to fund commitment / HTLC fee bumps on force-close — were silently dropped from the count. A node enabling `negotiate_anchor_zero_fee_commitments` would therefore be green-lit to open more anchor channels than its wallet can actually back, risking unfunded fee bumps and HTLC loss on simultaneous force-closes. Treat both feature flags as marking a channel as an anchor channel for reserve-accounting purposes (factored into a small `is_anchor_channel_type` helper, used in both the chain-monitor and channel-manager loops), and add a regression test that opens a single 0FC channel with reserves sized for exactly one channel and asserts the function refuses to authorize a second. Co-Authored-By: HAL 9000 Backport of 33987e8 Trivial conflicts resolved in: * lightning/src/util/anchor_channel_reserves.rs
`PrintableString` is the sanitiser LDK uses to render untrusted strings
(node aliases, BOLT-12 invoice / offer text, `UntrustedString`, LSPS
messages, `lightning-invoice` descriptions) to logs and UI. It only
replaced `char::is_control` matches (Unicode general category `Cc`)
with U+FFFD, leaving the entire `Cf` (Format) category untouched.
That is the exact category covering the bidirectional override /
isolate codepoints (U+202A..U+202E, U+2066..U+2069) and zero-width
characters (U+200B..U+200D, U+FEFF) behind the "Trojan Source" attack
family (CVE-2021-42574): a peer can set its alias / invoice description
/ offer fields to e.g. `safe\u{202E}cipsxe.exe`, which previously
passed through verbatim while a human reader sees `safeexe.cips` —
defeating the threat model `PrintableString` exists to defend against.
Replace `Cf` codepoints alongside `Cc` ones. The `Cf` ranges are
inlined as a `matches!` table sourced from Unicode 16.0 to keep the
change `no_std`-friendly with no new dependencies.
Co-Authored-By: HAL 9000
Signed-off-by: Elias Rohrer <dev@tnull.de>
Backport of 1a01b5a
The std-only `StaticInvoice::is_offer_expired` accessor delegated to `InvoiceContents::is_expired`, which compares `created_at + relative_expiry` against the current time — that is the *invoice*'s expiry, not the offer's. The `_no_std` sibling and `flow.rs:: enqueue_static_invoice` already treat the two as distinct checks. A payer or forwarder using the std API to decide whether to honor a static invoice would therefore get the wrong answer in either direction: forwarding offers the issuer has already retired (when the invoice is still fresh), or refusing offers that are still valid (when the invoice has aged past its `relative_expiry` but the offer itself has no `absolute_expiry`). Route the std accessor through `InvoiceContents::is_offer_expired` so both the std and no-std paths consult the offer's expiry. Co-Authored-By: HAL 9000 Backport of c005b11
`composite_custom_message_handler!` expanded `peer_connected` to call every sub-handler and remember the last error, but never undo the already-succeeded ones. The `CustomMessageHandler::peer_connected` contract is that `PeerManager` will *not* invoke `peer_disconnected` when `peer_connected` returns `Err` — so any per-peer state allocated by an earlier sub-handler that returned `Ok` was leaked permanently once a later sub-handler returned `Err`. A peer who can elicit `Err` from any sub-handler in the composite (feature-bit gate, banlist, etc.) could repeatedly reconnect to grow that leaked state without bound (slow resource DoS), and "currently connected" predicates in the leaking sub-handler would lie about peers that were actually rejected. Mirror the rollback pattern `PeerManager` already uses for the four built-in handlers (`peer_handler.rs:2149-2188`): record each sub-handler's `peer_connected` result, and if any returned `Err`, call `peer_disconnected` on the ones that succeeded before propagating the failure. Co-Authored-By: HAL 9000 Signed-off-by: Elias Rohrer <dev@tnull.de> Backport of 5455058
`EsploraSyncClient::get_confirmed_tx` parsed the SPV proof returned by
the Esplora server but threw away the security check: the merkle root
computed by `PartialMerkleTree::extract_matches` was discarded
(`let _ = …`), and only the leaf-equality check (`matches[0] == txid`)
remained. Anyone can construct a single-leaf partial tree advertising
an arbitrary txid via `PartialMerkleTree::from_txids(&[txid], &[true])`,
so this gate was vacuous.
A malicious or compromised Esplora server could therefore convince
`EsploraSyncClient` that any transaction was confirmed in any block by
returning `MerkleBlock { header: real_header, txn: forged_partial_tree }`,
causing LDK to feed a synthesized `ConfirmedTx` into `Confirm`
implementations such as `ChannelManager` / `ChainMonitor`. From there,
the channel-funding / closing / HTLC flows would treat the transaction
as confirmed at an attacker-chosen height, with consequences ranging
from premature state transitions to force-close races.
Capture the merkle root returned by `extract_matches` and require it
to equal `block_header.merkle_root`, matching the validation the
Electrum sibling already performs via `validate_merkle_proof`.
Co-Authored-By: HAL 9000
Backport of b64efcd
`LSPS5ServiceHandler::persist` incremented `persistence_in_flight` at the top as a single-runner gate, but only decremented it on the success path: each interior `?` on a `kv_store` future propagated the error out of the function while leaving the counter at >= 1. After one transient I/O failure (disk full, brief unavailability of a remote `KVStore`, EPERM, etc.) every subsequent `persist()` call hit the `fetch_add > 0` short-circuit and silently returned `Ok(false)`. The in-memory `needs_persist` flags then continued to grow without ever reaching disk, so webhook state, removals, and notification cooldowns were lost on the next process restart — including the spec-mandated webhook retention/pruning state — without any error surfaced to the operator. The counter is monotonic, so recovery required a process restart. Adopt the LSPS1 / LSPS2 pattern: split the body into an inner `do_persist` and an outer `persist` that unconditionally clears the counter via `store(0)` after the call returns, regardless of outcome. A failed write now still propagates `Err`, but the next `persist()` attempt actually retries the write instead of no-op'ing. Co-Authored-By: HAL 9000 Backport of b3544de Silent conflicts resolved in: * lightning-liquidity/tests/lsps5_integration_tests.rs
`regenerate_and_broadcast_spend_if_necessary` used `pending_sweep: AtomicBool` as a single-runner gate but only cleared the flag with an unconditional `store(false)` *after* the inner future resolved. If the caller's future was dropped while the inner await was `Pending` — which `tokio::time::timeout`, `futures::select!`, manual `JoinHandle::abort`, etc. all do — the reset never ran, leaving the flag stuck `true` and every subsequent call to the function short-circuiting with `Ok(())`. Because `OutputSweeper` is what claims `SpendableOutputDescriptor`s back to the user's wallet after channel closure (including HTLC outputs with time-bounded recovery deadlines), a stuck flag turns into fund-loss exposure: time-sensitive HTLC sweeps simply stop happening, while every other code path keeps queueing new outputs to sweep, until the process is restarted. Replace the trailing `store(false)` with an RAII `PendingSweepGuard` whose `Drop` impl always releases the flag — covering normal return, error, and cancellation alike. Co-Authored-By: HAL 9000 Backport of 6394d18 Silent conflicts due to API changes resolved in: * lightning/src/util/sweep.rs
1a01b5a added detection of unicode format characters in `PrintableString`, but used a hard-coded table which may eventually become out of date. Here we switch to an auto-generated table, include all `General_Category` `Other` characters, and also ban unallocated code points. Finally, CI validates that the file is kept up to date. Written by Claude Backport of 65e8cc8
The exhaustive filesystem store listing treated leftover temp and trash files as namespace directories after identifying them as non-keys. Skip those artifacts before recursing so migrations can ignore crash leftovers. Backport of 9246d86 Conflicts resolved due to moved/split file in: * lightning-persister/src/fs_store.rs
lnd is preparing to ship a release with opt-in onion messages without support for forwarding onion messages from non-channel peers. This breaks the common BOLT 12 OM flow today where we direct-connect to the blinded path introduction point and send the `invoice_request` without a channel. For CLN it turns out this is fine as they never select a peer for their introduction point at all. However, for LDK this would break existing nodes as nodes might now pick an lnd peer as an introduction node but it won't forward the onion message. For now, we just drop the separate introduction point selection and just always use ourselves as an introduction point (assuming we're an announced node). This should also have the side-effect of making offers marginally more robust, which may be worth it, even if it sucks to drop any pretense of privacy. Backport of 8c08a30 Conflicts resolved in: * lightning/src/ln/offers_tests.rs * lightning/src/onion_message/messenger.rs
If an outbound payment was abandoned with htlcs in-flight and later claimed, we would previously have the PaymentSent::fee_paid_msat be set to None. This contradicted some docs on the event that stated the field would always be Some after 0.0.103. Backport of 3e9e6e9 Silent conflicts resolved in: * lightning/src/ln/payment_tests.rs
An offer advertising Quantity::Bounded expects at least one item, but is_valid_quantity accepted a quantity of 0 since it only checked the upper bound. Require the quantity to be greater than 0 so that an invoice request for 0 items is rejected as an InvalidQuantity. Co-Authored-By: Claude <noreply@anthropic.com> Backport of 7620124
AnchorDescriptor::previous_utxo is used for coin selection and PSBT witness_utxo metadata. For keyed anchors it should describe the on-chain P2WSH anchor output instead of the witness script so wallets can validate and sign the package. Co-Authored-By: HAL 9000 This finding was discovered by Project Loupe Backport of ccf45e4 Trivial conflicts resolved in: * lightning/src/events/bump_transaction/mod.rs
get_supportable_anchor_channels estimates how much each reserve UTXO can contribute after spending fees. Include the base input weight in that fee so UTXOs just below the public per-channel reserve are not counted as supporting another anchor channel. Co-Authored-By: HAL 9000 This finding was discovered by Project Loupe Backport of 5e7b7d3 Trivial conflicts resolved in: * lightning/src/util/anchor_channel_reserves.rs
When a JIT channel open fails, release queued intercepted HTLCs through the intercept API so they are not held until expiry. Keep resetting the LSPS2 state if an intercept has already been released. Co-Authored-By: HAL 9000 This finding was discovered by Project Loupe Backport of 6b75e5a Conflicts resolved in: * lightning-liquidity/tests/lsps2_integration_tests.rs
|
I've assigned @tankyleo as a reviewer! |
| @@ -1816,6 +1815,7 @@ where | |||
| for counterparty_node_id in need_persist.into_iter() { | |||
| debug_assert!(!need_remove.contains(&counterparty_node_id)); | |||
| self.persist_peer_state(counterparty_node_id).await?; | |||
There was a problem hiding this comment.
The ? here (and at the future.await? / persist_peer_state(...).await? below) returns early on an IO error without resetting persistence_in_flight. After one failed persist the counter stays >= 1, so every subsequent persist() call hits the fetch_add(1) > 0 guard at line 1789 and returns Ok(false) immediately — silently dropping all future persistence forever.
This is the identical bug that the LSPS5 part of this PR fixes via do_persist() + persistence_in_flight.store(0, Release) on error (and covers with lsps5_service_persist_resets_in_flight_counter_on_io_error). The LSPS2 persist here has the same structure but was not given the same fix.
|
Posting the top-level summary. SummaryThis is a backports PR (#4683). Most hunks are correct backports (DNS resolver counter leak fix, electrum/esplora
Cross-cutting / minor
|
Backports of #4246, #4517, #4520, #4537, #4543, #4544, #4558, #4590, #4591, #4592, #4593, #4594, #4595, #4596, #4597, #4598, #4605, #4618, #4647, #4651, #4667, #4669, #4670, and #4677.