diff --git a/docs/memories/MEMORY.md b/docs/memories/MEMORY.md index 7d12751..bb05240 100644 --- a/docs/memories/MEMORY.md +++ b/docs/memories/MEMORY.md @@ -45,6 +45,13 @@ go into detail and are loaded on demand. the wrong hint shape. Recipe (real `data[0][17]` from fli + IATA-prefixed flight#, human-readable airline name, space-separated times), empirical proof, and wire-through implementation notes. +- [gf_routing_and_carriers.md](gf_routing_and_carriers.md) — How + `--routing`/`--extension` reach Google Flights: the `fl[15]`/`fl[18]`/`fl[22]` + booking-carrier rule (marketing vs operating), the Tier-1/2/3 classification + (`routing_predicates`), the GF-serve gate + post-filter backstop + (`_gf_postfilter`), the concurrent GF-fast-paint + Matrix-enrich flow + (`_run_enriched_path`), and codeshare-aware display. Read before touching + `routing_predicates.py`, `_gf_postfilter.py`, or `_gflight_ids` carrier parsing. - [legroom_recipe.md](legroom_recipe.md) — Per-leg legroom + amenities + aircraft come back in-band in Google Flights' own response (no travelarrow.io API call needed for the data itself). Index map for diff --git a/docs/memories/gf_routing_and_carriers.md b/docs/memories/gf_routing_and_carriers.md new file mode 100644 index 0000000..1d8808c --- /dev/null +++ b/docs/memories/gf_routing_and_carriers.md @@ -0,0 +1,126 @@ +# GF carrier semantics + routing tiers + progressive enrich + +How `--routing`/`--extension` reach Google Flights, and the carrier-identity +indices that make it correct. Read before touching `routing_predicates.py`, +`_gf_postfilter.py`, `fli_bridge.apply_gf_native_filters`, or +`_gflight_ids._parse_leg_amenities` / `_flight_leg`. + +## Booking carrier: `fl[15]` (marketing) vs `fl[22]` (operating) + +Each Google Flights leg tuple (`data[0][2][i]`) carries two carrier identities: + +- `fl[22]` = `[code, number, _, name]` of the **operating** carrier (the metal). +- `fl[15]` = `null`, or a list `[[code, number, _, name], …]` of the + **marketing** (selling / codeshare) carriers. +- `fl[18]` = truthy (`[true]`) when the operating carrier markets the leg under + its own code; falsy/`null` on operated-for (regional feeder) legs. + +The carrier a passenger **books** (and what Matrix surfaces) is: + +``` +booking = fl[15][0] if fl[15] present AND fl[18] falsy # operated-for regional + else fl[22] # self-marketed / mainline +``` + +Ground-truthed 2026-06-13 against GF's own headline labels: + +| Leg | `fl[22]` | `fl[15]` | `fl[18]` | GF headline → booking | +|---|---|---|---|---| +| OS36 JFK→VIE | OS / Austrian | `[UA…]` | `[true]` | **Austrian** (`fl[22]`) | +| EN8858 FRA→FLR | EN / Air Dolomiti | `[LH9498]` | `null` | **Lufthansa LH9498** (`fl[15]`) | +| LX39 SFO→ZRH | LX / SWISS | `null` | `[true]` | **SWISS** (`fl[22]`) | + +`_gflight_ids._flight_leg` sets `FlightLeg.airline`/`flight_number` to the +*booking* carrier so gflight flight numbers match Matrix's (marketing) numbers — +which is what makes the GF↔Matrix reconcile join fire. `_parse_leg_amenities` +also keeps the operating carrier (`operating_carrier`/`_name`), the marketing +codes (`marketing_carriers`), and the full marketing flight #s +(`marketing_flights`, e.g. `LH9407`) for the `O:` filter, `-CODESHARE`, and +codeshare-aware display. All flow through `LegInfo`. + +## Tier model: who honors each constraint + +`routing_predicates.classify(routing, extension)` parses both DSLs into a flat +predicate set, each tagged with a tier: + +- **Tier 1 — native GF filter** (`fli_bridge.apply_gf_native_filters`): marketing + carrier *include* (`LH+`, `AIRLINES`), alliance, connect-at airport + (`F* X:FRA F*`), `MAXCONNECT`, `MAXDUR`, nonstop/`MAXSTOPS`. +- **Tier 2 — post-filter on the result** (`_gf_postfilter`): operating carrier + (`O:`/`OPAIRLINES`), marketing/airport *exclude* (`~UA`, `~DFW`, `-CITIES`, + `-AIRLINES`), `-CODESHARE`, specific flight #/range. +- **Tier 3 — Matrix only**: fare construction (`F bc=y`, `aa.lon.yup`), mileage, + `PADCONNECT`, aircraft, and anything the parser can't confidently classify. + +Routing language is **positional**, so it's parsed all-or-nothing: only single +order-independent forms map (one carrier-with-quantifier, nonstop, one flight #, +the `F* X:LHR F*` via-airport idiom). Ordered chains (`BA AA`, `DFW DEN`), bare +single-segment carriers (`LH` without `+`/`*`), country filters, and count +placeholders escalate the whole routing to Tier 3 — never partially honored. + +**The gate** (`_pick_backend` → `_gf_postfilter.gf_can_serve`): GF serves a query +iff it has no Tier-3 predicate AND every Tier-2 predicate is post-filterable. +Native filters are a pure *optimization* — if an fli carrier/airport code doesn't +map, that query dimension is skipped (no under-return) and the post-filter (a +string-based backstop that also enforces marketing-include + connect-at) is the +correctness guarantee. + +Time-based Tier-2 predicates (`MINCONNECT`, `-REDEYES`, `-OVERNIGHTS`) currently +escalate to Matrix — `_gf_postfilter` can't evaluate them yet (no per-segment +times threaded through `LegInfo`). Promote by threading those times, then adding +them to `_SUPPORTED` + `_slice_passes`. + +## Progressive enrich (`_run_enriched_path`) + +For a GF-serveable query (default; `--fast`/`--no-enrich` opts out, JSON output +stays GF-only), GF and Matrix are dispatched **concurrently** under one +`anyio.run`: GF runs in `anyio.to_thread.run_sync` (it's sync curl_cffi) while +the Matrix request progresses on the event loop. GF paints first (~1s); when +Matrix lands (~45s) `_enrich.merge_results` reconciles by flight #+date and +`_render_merged` repaints with both prices attributed (they can differ a lot — +Matrix surfaces cheaper published fares). PP/awards + URLs run on the Matrix +(authoritative) result. Per-backend `try/except` so one failing still shows the +other. + +**Codeshare display**: marketing matching is loose (Matrix-consistent: a flight +sellable as LH matches `LH+` even if its primary number is UA). To keep that +honest, `_leg_display` relabels a codeshare match to the matched identity — +`LH9403 (op UA58)` under `--routing LH+` — using `marketing_flights` + +`_match_carriers` (marketing-include filters only). + +## GF date-grid (calendar) — `fli.search.dates.SearchDates` + +Google's `GetCalendarGraph` RPC returns a whole date window's cheapest-per-date +prices in ONE call, and `DateSearchFilters` carries the full Tier-1 filter set +(airlines, stops, layover, max_duration, cabin, times, price). Verified +2026-06-14: `airlines=LH` / `stops=NON_STOP` change the grid prices, so Tier-1 +filters ARE honored. It returns `{date, price}` only — **no itineraries** — so +Tier-2 (`O:`/`-CODESHARE`/`~UA`/flight#) can't be post-filtered on a grid; those +calendars go to Matrix. This is the throttle-friendly calendar primitive (1 call +vs a per-date fan-out), so we prefer it; **no GF fan-out is needed** (Matrix's +`_calendar_split` already fans out for Tier-2 / multi-airport). + +**fli bug (bd work-bcdex):** `SearchDates.search()` splits windows >61 days into +chunks but rebuilds `DateSearchFilters` per chunk copying only +trip_type/passenger_info/segments/stops/seat_type/airlines/dates/duration — +**dropping `layover_restrictions`/`max_duration`/`price_limit`/`emissions`/`bags` +on chunks 2+.** When wiring the date-grid calendar, don't use fli's chunking: cap +each call to ≤61 days and chunk ourselves with the full filter set (or keep +windows ≤61d). Regression-test it; consider an upstream fix. + +## GF throttle (per-IP, dynamic) — handle reactively, not with a fixed cap + +Measured 2026-06-14 (instrumented, distinguishing genuine `code-13` from +transport errors): the GF RPC throttle is **per-IP and dynamic**, with two +limits — a per-second burst cap (~3–4 at ~10/s, but it floated as high as 30 a +run earlier) and a rolling allowance (~25–30 calls per ~2–3 min ≈ 10–12/min) — +and **fast recovery** (the call right after a block often returns data). Because +the ceiling moves, a fixed rate limiter is the wrong tool. The design is a +closed loop: `_classify` detects a real block (HTTP 200 + `ErrorResponse`/code-13 +body, vs a transport exception, vs cold-session empty), and `_one_call_with_retry` +backs off + retries on a genuine block (typed `GfThrottledError` on exhaustion). +One-shot `flight` processes can't share a proactive budget, but they DO share the +`code-13` signal, so per-process reactive backoff self-regulates even across +concurrent invocations. In the woven flow a persistent GF throttle degrades to +Matrix-only rather than erroring. (Datacenter VPN exits — e.g. PIA — are +pre-flagged and blocked on sight; only residential IPs work.) diff --git a/src/flight_cli/_enrich.py b/src/flight_cli/_enrich.py new file mode 100644 index 0000000..2f9cae8 --- /dev/null +++ b/src/flight_cli/_enrich.py @@ -0,0 +1,118 @@ +"""Reconcile a fast Google Flights result with the authoritative Matrix result. + +For a GF-serveable query we render GF immediately (~1s) then run Matrix and +repaint a merged table once it lands. This module is the pure reconcile step: +match itineraries across the two cash results and attribute each side's price. + +Matching is by flight number + departure date per slice — which works now that +the gflight adapter emits marketing flight numbers (work-fjibi.1), the same +identity Matrix uses. Matched rows carry both prices (they should agree; we show +both, attributed); Matrix-only rows are added (its fare coverage is broader), +GF-only rows are kept and flagged (ULCC / codeshare inventory Matrix misses). +The Matrix itinerary is authoritative for a matched row's structure. +""" + +from __future__ import annotations + +import re +from dataclasses import dataclass +from typing import TYPE_CHECKING + +if TYPE_CHECKING: + from .models import Itinerary, SearchResult + +_PRICE_DIGITS = re.compile(r"[\d,]*\d+") +_NO_PRICE = 10**12 # sort key for itineraries with no parseable price (last) + +Source = str # "both" | "matrix" | "gf" + + +@dataclass(frozen=True, slots=True) +class MergedRow: + """One row of the reconciled GF+Matrix view. + + `itinerary` is the structure to display (Matrix-authoritative when matched). + `gf_price` / `matrix_price` are the attributed price strings from each side + (None when that side didn't have this itinerary). `source` records which + backend(s) produced it.""" + + itinerary: Itinerary + gf_price: str | None + matrix_price: str | None + source: Source + + +def _price_int(price: str | None) -> int: + """Leading integer dollars from 'USD877.00' / '$877' / '877 USD'; _NO_PRICE + when absent (sorts such rows last).""" + if not price: + return _NO_PRICE + m = _PRICE_DIGITS.search(price) + if not m: + return _NO_PRICE + try: + return int(m.group(0).replace(",", "").split(".")[0]) + except ValueError: + return _NO_PRICE + + +def _itin_key(it: Itinerary) -> tuple[tuple[tuple[str, ...], str], ...] | None: + """Match key: per slice, (flight numbers, departure date). None when the + itinerary lacks the structure to match on (kept as a single-source row).""" + itn = it.itinerary + if itn is None or not itn.slices: + return None + parts: list[tuple[tuple[str, ...], str]] = [] + for s in itn.slices: + if not s.flights or not s.departure: + return None + parts.append((tuple(s.flights), s.departure[:10])) + return tuple(parts) + + +def merge_results(gf: SearchResult, matrix: SearchResult) -> list[MergedRow]: + """Reconcile GF + Matrix cash results into price-sorted merged rows.""" + gf_keyed: dict[object, Itinerary] = {} + gf_unkeyed: list[Itinerary] = [] + for it in gf.solutions: + k = _itin_key(it) + if k is None: + gf_unkeyed.append(it) + else: + gf_keyed.setdefault(k, it) + + matrix_keyed: dict[object, Itinerary] = {} + matrix_unkeyed: list[Itinerary] = [] + for it in matrix.solutions: + k = _itin_key(it) + if k is None: + matrix_unkeyed.append(it) + else: + matrix_keyed.setdefault(k, it) + + rows: list[MergedRow] = [] + # Matrix keys first (authoritative), then GF-only keys. + for k, m in matrix_keyed.items(): + g = gf_keyed.get(k) + rows.append( + MergedRow( + itinerary=m, # Matrix structure authoritative when matched + gf_price=g.price if g else None, + matrix_price=m.price, + source="both" if g else "matrix", + ) + ) + for k, g in gf_keyed.items(): + if k not in matrix_keyed: + rows.append(MergedRow(itinerary=g, gf_price=g.price, matrix_price=None, source="gf")) + rows.extend( + MergedRow(itinerary=it, gf_price=None, matrix_price=it.price, source="matrix") + for it in matrix_unkeyed + ) + rows.extend( + MergedRow(itinerary=it, gf_price=it.price, matrix_price=None, source="gf") + for it in gf_unkeyed + ) + + rows.sort(key=lambda r: _price_int(r.matrix_price or r.gf_price)) + return rows diff --git a/src/flight_cli/_gf_dategrid.py b/src/flight_cli/_gf_dategrid.py new file mode 100644 index 0000000..77e6ecf --- /dev/null +++ b/src/flight_cli/_gf_dategrid.py @@ -0,0 +1,163 @@ +"""Google Flights native date-grid (SearchDates / GetCalendarGraph) for fast, +Tier-1 calendars. + +Returns cheapest-price-per-date for a whole window in ONE call — far faster than +Matrix's calendar, and it sidesteps Matrix's compute-budget under-reporting +(MEMORY quirk #7). `DateSearchFilters` carries the full Tier-1 filter set, so +airlines/stops/layover/max_duration/cabin/times/price are honored server-side +(reusing `apply_gf_native_filters`). It returns `{date: price}` only — **no +itineraries** — so Tier-2 constraints (`O:`/`-CODESHARE`/`~UA`/flight#) can't be +honored on a grid; those calendars go to Matrix. + +We chunk windows to <=61 days OURSELVES with the full filter set, dodging the fli +`SearchDates` >61-day bug that drops filters on later chunks (bd work-bcdex). +Throttle-hardened via the shared `retry_throttled` (same code-13 detection + +backoff as the search path). fli is heavy, so `cli` imports this module lazily — +only when a GF calendar is actually run. +""" + +from __future__ import annotations + +import json +from datetime import timedelta +from typing import TYPE_CHECKING, Any + +from fli.models.airport import Airport # pyright: ignore[reportMissingTypeStubs] +from fli.models.google_flights.base import ( # pyright: ignore[reportMissingTypeStubs] + FlightSegment, + SeatType, + TripType, +) +from fli.models.google_flights.dates import ( # pyright: ignore[reportMissingTypeStubs] + DateSearchFilters, +) +from fli.models.google_flights.flights import ( # pyright: ignore[reportMissingTypeStubs] + PassengerInfo, +) +from fli.search.client import get_client # pyright: ignore[reportMissingTypeStubs] +from fli.search.dates import SearchDates # pyright: ignore[reportMissingTypeStubs] + +from ._gflight_ids import ( # shared GF-internal helpers (sibling module) + GfThrottledError, + _is_throttle_block, # pyright: ignore[reportPrivateUsage] + _persist_cookies, # pyright: ignore[reportPrivateUsage] + _seed_cookies_once, # pyright: ignore[reportPrivateUsage] + retry_throttled, +) +from .domain import Cabin +from .fli_bridge import ( + _fli_max_stops, # pyright: ignore[reportPrivateUsage] + apply_gf_native_filters, +) +from .routing_predicates import Tier, classify + +if TYPE_CHECKING: + from .domain import CalendarSearch + from .routing_predicates import Predicate + +_MAX_GRID_DAYS = 61 # GetCalendarGraph's per-request span limit + +_CABIN_TO_SEAT = { + Cabin.COACH: SeatType.ECONOMY, + Cabin.PREMIUM_COACH: SeatType.PREMIUM_ECONOMY, + Cabin.BUSINESS: SeatType.BUSINESS, + Cabin.FIRST: SeatType.FIRST, +} + + +def grid_can_serve(search: CalendarSearch) -> bool: + """Whether the GF date-grid can fully serve this calendar: one-way, + single-airport per leg, and only Tier-1 constraints (the grid has no + itineraries, so even Tier-2 can't be post-filtered — those go to Matrix). + Round-trip is excluded for now: a duration *range* doesn't map to the grid's + single-duration parameter.""" + if len(search.legs) != 1: + return False + leg = search.legs[0] + if len(leg.origins) != 1 or len(leg.destinations) != 1: + return False + constraints = classify(leg.route_language, leg.extension) + return all(p.tier is Tier.GF_NATIVE for p in constraints.predicates) + + +def _grid_filters( + search: CalendarSearch, from_iso: str, to_iso: str, predicates: list[Predicate] +) -> Any: + """Build a one-way DateSearchFilters for a <=61-day sub-window, with the + search's cabin/stops/pax plus the Tier-1 routing predicates applied.""" + leg = search.legs[0] + p = search.options.pax + extra_stops = search.options.max_extra_stops + filters = DateSearchFilters( + passenger_info=PassengerInfo( + adults=(p.adults + p.seniors + p.youth) or 1, children=p.children + ), + flight_segments=[ + FlightSegment( + departure_airport=[[getattr(Airport, leg.origins[0]), 0]], + arrival_airport=[[getattr(Airport, leg.destinations[0]), 0]], + travel_date=from_iso, + ) + ], + stops=_fli_max_stops(extra_stops if extra_stops is not None else 99), + seat_type=_CABIN_TO_SEAT[search.options.cabin], + trip_type=TripType.ONE_WAY, + from_date=from_iso, + to_date=to_iso, + ) + if predicates: + apply_gf_native_filters(filters, predicates) + return filters + + +def _parse_grid(parsed: str) -> dict[str, float]: + """{date: price} from the inner GetCalendarGraph payload. Each item in the + last array is `[date, _, [[_, price], ...], ...]`; bad-shaped items skipped.""" + out: dict[str, float] = {} + for item in json.loads(parsed)[-1]: + try: + day = item[0] + price = item[2][0][1] + except (IndexError, TypeError): + continue + if isinstance(day, str) and price is not None: + out[day] = float(price) + return out + + +def _one_grid_call(filters: Any) -> dict[str, float]: + """One GetCalendarGraph round-trip -> {date: price}. Raises GfThrottledError + on a genuine code-13 block; returns {} on a cold-session empty.""" + client = get_client() + _seed_cookies_once(client) + resp = client.post( + url=SearchDates.BASE_URL, + data=f"f.req={filters.encode()}", + impersonate="chrome", + allow_redirects=True, + ) + resp.raise_for_status() + body = resp.text + parsed = json.loads(body.lstrip(")]}'"))[0][2] + if not parsed: + if _is_throttle_block(body): + raise GfThrottledError("Google Flights rate-limited the date-grid request") + return {} + _persist_cookies(client) + return _parse_grid(parsed) + + +def date_grid(search: CalendarSearch) -> dict[str, float]: + """Cheapest price per departure date across the window (caller ensures + `grid_can_serve`). Chunks to <=61 days with the FULL filter set, throttle- + retries each, and merges. Raises GfThrottledError if the throttle persists.""" + leg = search.legs[0] + predicates = list(classify(leg.route_language, leg.extension).predicates) + out: dict[str, float] = {} + cursor = search.window.start + while cursor <= search.window.end: + chunk_end = min(cursor + timedelta(days=_MAX_GRID_DAYS - 1), search.window.end) + filters = _grid_filters(search, cursor.isoformat(), chunk_end.isoformat(), predicates) + out.update(retry_throttled(lambda f=filters: _one_grid_call(f))) + cursor = chunk_end + timedelta(days=1) + return out diff --git a/src/flight_cli/_gf_postfilter.py b/src/flight_cli/_gf_postfilter.py new file mode 100644 index 0000000..3502d6c --- /dev/null +++ b/src/flight_cli/_gf_postfilter.py @@ -0,0 +1,157 @@ +"""Post-filter Google Flights results against Tier-2 predicates the GF query +can't express natively. + +Runs only on the gflight path (Matrix legs don't carry the per-leg carrier +identity these predicates need). The gate (`gf_can_serve`) only routes a query +to GF when every Tier-2 predicate here is *supported* — anything this module +can't evaluate (min-layover, red-eyes, overnight stops) escalates the whole +query to Matrix rather than being silently dropped. + +Supported Tier-2 predicates: + - operating carrier include/exclude (`O:LH+`, `OPAIRLINES`, `-OPAIRLINES`) + - marketing-carrier exclude (`~UA+`, `-AIRLINES`) + - connection-airport exclude (`~DFW`, `-CITIES`) + - no codeshare (`-CODESHARE`) + - specific flight # / range (`UA882`, `UA1000-2000`) +""" + +from __future__ import annotations + +import re +from typing import TYPE_CHECKING + +from .routing_predicates import ( + CarrierPred, + ConnectionAirportPred, + ExcludeCodesharePred, + SpecificFlightPred, + Tier, +) + +if TYPE_CHECKING: + from collections.abc import Iterable, Sequence + + from .models import Itinerary, SearchResult, Slice + from .routing_predicates import ClassifiedConstraints, Predicate + +# Tier-2 predicate types this module can evaluate. Other Tier-2 predicates +# (ConnectTimePred min, red-eyes, overnights) need per-segment times we don't +# yet thread through, so they escalate to Matrix via `gf_can_serve`. +_SUPPORTED: tuple[type, ...] = ( + CarrierPred, + ConnectionAirportPred, + ExcludeCodesharePred, + SpecificFlightPred, +) + +_FLIGHT_RE = re.compile(r"^([A-Z0-9]{2})(\d+)$", re.IGNORECASE) + + +def can_postfilter(pred: Predicate) -> bool: + """True if this predicate is either not our concern (Tier 1 native / Tier 3 + Matrix-only, handled elsewhere) or a Tier-2 predicate we can evaluate here.""" + if pred.tier is not Tier.GF_POSTFILTER: + return True + return isinstance(pred, _SUPPORTED) + + +def gf_can_serve(constraints: ClassifiedConstraints) -> bool: + """Whether Google Flights alone can honor every predicate: no Tier-3, and + every Tier-2 predicate is post-filterable here.""" + if constraints.requires_matrix: + return False + return all(can_postfilter(p) for p in constraints.predicates) + + +def _parse_flight(flight: str) -> tuple[str, int] | None: + m = _FLIGHT_RE.match(flight) + return (m.group(1).upper(), int(m.group(2))) if m else None + + +def _leg_carriers(slc: Slice) -> list[tuple[set[str], str | None]]: + """Per leg: (marketing carrier set, operating carrier). The marketing set is + the booking carrier (from `flights[i]`) plus the codeshare sellers + (`legs[i].marketing_carriers`).""" + out: list[tuple[set[str], str | None]] = [] + n = max(len(slc.flights), len(slc.legs)) + for i in range(n): + flight = slc.flights[i] if i < len(slc.flights) else "" + leg = slc.legs[i] if i < len(slc.legs) else None + marketing: set[str] = {c.upper() for c in leg.marketing_carriers} if leg else set() + if parsed := _parse_flight(flight): + marketing.add(parsed[0]) + operating = leg.operating_carrier.upper() if leg and leg.operating_carrier else None + out.append((marketing, operating)) + return out + + +def _carrier_pred_passes(slc: Slice, pred: CarrierPred) -> bool: + legs = _leg_carriers(slc) + if pred.operating: + if pred.exclude: # -OPAIRLINES — no leg operated by these + return not any(op in pred.codes for _, op in legs) + return all(op in pred.codes for _, op in legs) # O:/OPAIRLINES — all operated by these + if pred.exclude: # ~UA / -AIRLINES — no leg sold by an excluded carrier + return not any(marketing & pred.codes for marketing, _ in legs) + # marketing include (LH+ / AIRLINES) — every leg sold by an allowed carrier. + # Applied natively too; this is the correctness backstop if an fli code didn't map. + return all(marketing & pred.codes for marketing, _ in legs) + + +def _slice_passes(slc: Slice, predicates: Iterable[Predicate]) -> bool: + for p in predicates: + if isinstance(p, CarrierPred): + if not _carrier_pred_passes(slc, p): + return False + elif isinstance(p, ConnectionAirportPred): + stop_codes = {s.code.upper() for s in slc.stops if s.code} + if p.exclude: + if stop_codes & p.codes: # ~DFW / -CITIES — no connection at an excluded airport + return False + elif stop_codes and not (stop_codes <= p.codes): + # connect-at include: every connection must be an allowed airport + return False + elif isinstance(p, ExcludeCodesharePred): + for marketing, op in _leg_carriers(slc): + # codeshare = booked carrier(s) differ from the operating metal + if op is not None and marketing and op not in marketing: + return False + elif isinstance(p, SpecificFlightPred): + flights = [f for fl in slc.flights if (f := _parse_flight(fl))] + if not any(c == p.carrier and p.low <= n <= p.high for c, n in flights): + return False + return True + + +def surviving_indices( + result: SearchResult, per_slice_predicates: Sequence[Sequence[Predicate]] +) -> list[int]: + """Indices of solutions that pass the per-slice predicates — lets a caller + filter a parallel list (e.g. the raw fli results) in lockstep.""" + return [ + i for i, it in enumerate(result.solutions) if _itinerary_passes(it, per_slice_predicates) + ] + + +def _itinerary_passes(it: Itinerary, per_slice_predicates: Sequence[Sequence[Predicate]]) -> bool: + itn = it.itinerary + if itn is None: + return True + for i, slc in enumerate(itn.slices): + preds = per_slice_predicates[i] if i < len(per_slice_predicates) else () + if not _slice_passes(slc, preds): + return False + return True + + +def apply_postfilter( + result: SearchResult, per_slice_predicates: Sequence[Sequence[Predicate]] +) -> SearchResult: + """Drop solutions whose slice `i` violates `per_slice_predicates[i]`. Mutates + and returns `result` (solutions + solutionCount).""" + if not any(per_slice_predicates): + return result + kept = [it for it in result.solutions if _itinerary_passes(it, per_slice_predicates)] + result.solutions = kept + result.solution_count = len(kept) + return result diff --git a/src/flight_cli/_gflight_ids.py b/src/flight_cli/_gflight_ids.py index ed1e201..1e2a60a 100644 --- a/src/flight_cli/_gflight_ids.py +++ b/src/flight_cli/_gflight_ids.py @@ -19,10 +19,11 @@ import logging import os import pathlib +import random import time from copy import deepcopy from dataclasses import dataclass -from typing import TYPE_CHECKING, Any +from typing import TYPE_CHECKING, Any, cast from fli.models import ( # pyright: ignore[reportMissingTypeStubs] FlightLeg, @@ -33,6 +34,8 @@ from fli.search.flights import SearchFlights # pyright: ignore[reportMissingTypeStubs] if TYPE_CHECKING: + from collections.abc import Callable + from fli.models.google_flights.flights import ( # pyright: ignore[reportMissingTypeStubs] FlightSearchFilters, ) @@ -52,6 +55,32 @@ _EMPTY_RETRY_ATTEMPTS = 4 _EMPTY_RETRY_BACKOFF_S = 1.0 # multiplied by attempt number: 1s, 2s, 3s between tries +# A genuine throttle is distinct from the cold-session empty above: Google +# answers HTTP 200 with an error envelope (code-13 / `ErrorResponse`) instead of +# data — it's rate-limiting this IP. Measured 2026-06-14, the limit is DYNAMIC +# (the ceiling drifts run-to-run) with FAST recovery, so a fixed rate cap is the +# wrong tool: we back off exponentially and retry, surfacing GfThrottledError +# only when that's exhausted (the caller can then degrade to Matrix). Backoff is +# jittered so concurrent one-shot `flight` processes — which share the per-IP +# signal but can't share a budget — don't all retry in lockstep and re-trip it. +_THROTTLE_RETRY_ATTEMPTS = 4 +_THROTTLE_BACKOFF_S = 1.0 # exponential base: ~1, 2, 4, 8s (plus 0-50% jitter) + + +class GfThrottledError(Exception): + """Google Flights rate-limited this IP (HTTP 200 + code-13 ErrorResponse). + + Distinct from a transport error and from a cold-session empty. Recovery is + usually fast; callers may retry shortly or fall back to Matrix.""" + + +def _is_throttle_block(body: str) -> bool: + """True when a non-data GF response is a genuine throttle (error envelope), + not a cold-session / no-results empty. The throttle body carries a + `type.googleapis.com/...ErrorResponse` marker; an empty body does not.""" + return "ErrorResponse" in body or "type.googleapis.com" in body + + # Persisted gflight session cookies. The cold-session empties above are almost # entirely "the session is missing Google's NID cookie" — a long-lived (~6mo) # session cookie a browser keeps across restarts. Empirically, seeding a saved @@ -85,6 +114,20 @@ _LEG_CABIN_IDX = 16 # int enum (see _CABIN) _LEG_AIRCRAFT_IDX = 17 # string +# Carrier identity (distinct from amenities). A leg tuple separates the OPERATING +# carrier (fl[22], the metal) from the MARKETING/booking carrier (fl[15], what a +# passenger books under). fl[18] is truthy when the operating carrier self-markets +# under its own code; falsy on operated-for (regional feeder) legs. Matrix surfaces +# the marketing identity too, so reading the booking carrier here keeps flight +# numbers consistent across backends. +_LEG_MARKETING_IDX = 15 # list[[code, number, _, name]] of selling carriers (None if none) +_LEG_SELF_MARKETED_IDX = 18 # truthy -> operating carrier markets under its own code +_LEG_OPERATING_IDX = 22 # [code, number, _, name] of the operating carrier +# Field layout within a [code, number, _, name] carrier tuple. +_CARRIER_CODE_IDX = 0 +_CARRIER_NUMBER_IDX = 1 +_CARRIER_NAME_IDX = 3 + _LEGROOM_CLASS: dict[int, str] = { 1: "AVERAGE", 2: "BELOW", @@ -109,6 +152,13 @@ class LegAmenities: wifi: str | None = None # "free" | "paid" | None (no ground-internet wifi) power: str | None = None video: str | None = None + # Carrier identity beyond fli's FlightLeg (which now carries the booking + # carrier). Operating carrier drives the "operated by" label + the `O:` + # routing filter; the marketing-carrier set drives marketing-carrier matches. + operating_carrier: str | None = None # IATA code of the metal, e.g. "EN" + operating_carrier_name: str | None = None # e.g. "Air Dolomiti" + marketing_carriers: tuple[str, ...] = () # IATA codes from fl[15] (selling carriers) + marketing_flights: tuple[str, ...] = () # full marketing flight #s, e.g. "LH9407" def _decode_power(amenities: Any) -> str | None: @@ -199,16 +249,81 @@ def _parse_pitch(raw: Any) -> int | None: return None +def _carrier_entry(raw: Any) -> tuple[str | None, str | None, str | None]: + """(code, number, name) from a `[code, number, _, name]` carrier tuple. + All-None on any malformed shape — Google's response drifts.""" + if not isinstance(raw, list): + return None, None, None + items = cast("list[Any]", raw) + if len(items) <= _CARRIER_NUMBER_IDX: # a valid entry has at least code + number + return None, None, None + code = items[_CARRIER_CODE_IDX] + number = items[_CARRIER_NUMBER_IDX] + name = items[_CARRIER_NAME_IDX] if len(items) > _CARRIER_NAME_IDX else None + return ( + code if isinstance(code, str) else None, + number if isinstance(number, str) else None, + name if isinstance(name, str) else None, + ) + + +def _marketing_codes(fl: list[Any]) -> tuple[str, ...]: + """IATA codes of the marketing (selling) carriers from fl[15], in order.""" + raw = fl[_LEG_MARKETING_IDX] if len(fl) > _LEG_MARKETING_IDX else None + if not isinstance(raw, list): + return () + entries = cast("list[Any]", raw) + return tuple(code for entry in entries if (code := _carrier_entry(entry)[0])) + + +def _marketing_flights(fl: list[Any]) -> tuple[str, ...]: + """Full marketing flight numbers from fl[15] (e.g. 'LH9407') — the codeshare + identities a flight is also sold under. Used for codeshare-aware display.""" + raw = fl[_LEG_MARKETING_IDX] if len(fl) > _LEG_MARKETING_IDX else None + if not isinstance(raw, list): + return () + out: list[str] = [] + for entry in cast("list[Any]", raw): + code, number, _ = _carrier_entry(entry) + if code and number: + out.append(f"{code}{number}") + return tuple(out) + + +def _resolve_booking(fl: list[Any]) -> tuple[str | None, str | None]: + """The (carrier code, flight number) a passenger books under. + + On an operated-for leg (a regional flies metal sold under a mainline's code: + fl[15] present AND fl[18] falsy) that's the marketing carrier (fl[15][0]). + Otherwise the operating carrier self-markets (fl[18] truthy) or there's no + codeshare (fl[15] empty), so it's the operating carrier (fl[22]). Matrix + surfaces this same marketing identity, so aligning here lets the cross-backend + flight#+date join fire on codeshares. Ground-truthed 2026-06-13 vs GF's own + headline: OS36 (fl18=[true] -> Austrian), Air Dolomiti EN8858 (fl18=null -> + Lufthansa LH9498), SWISS LX39 (no fl15 -> SWISS).""" + marketing = fl[_LEG_MARKETING_IDX] if len(fl) > _LEG_MARKETING_IDX else None + self_marketed = bool(fl[_LEG_SELF_MARKETED_IDX]) if len(fl) > _LEG_SELF_MARKETED_IDX else False + if isinstance(marketing, list) and marketing and not self_marketed: + code, number, _ = _carrier_entry(marketing[0]) + if code and number: + return code, number + operating = fl[_LEG_OPERATING_IDX] if len(fl) > _LEG_OPERATING_IDX else None + code, number, _ = _carrier_entry(operating) + return code, number + + def _parse_leg_amenities(fl: list[Any]) -> LegAmenities: - """Defensive read of indices 12-17 from a leg tuple. Returns an - all-None LegAmenities if any single field is missing or wrong type — - Google's response shape drifts and a partial extract is better than - dropping the whole flight.""" + """Defensive read of a leg tuple: amenities (indices 12-17) plus carrier + identity (operating fl[22], marketing fl[15]). Returns an all-None/empty + LegAmenities for any field that's missing or the wrong type — Google's + response shape drifts and a partial extract beats dropping the flight.""" amenities = fl[_LEG_AMENITIES_IDX] if len(fl) > _LEG_AMENITIES_IDX else None legroom_raw = fl[_LEG_LEGROOM_CLASS_IDX] if len(fl) > _LEG_LEGROOM_CLASS_IDX else None pitch_raw = fl[_LEG_PITCH_IDX] if len(fl) > _LEG_PITCH_IDX else None cabin_raw = fl[_LEG_CABIN_IDX] if len(fl) > _LEG_CABIN_IDX else None aircraft = fl[_LEG_AIRCRAFT_IDX] if len(fl) > _LEG_AIRCRAFT_IDX else None + operating_raw = fl[_LEG_OPERATING_IDX] if len(fl) > _LEG_OPERATING_IDX else None + op_code, _, op_name = _carrier_entry(operating_raw) return LegAmenities( aircraft=aircraft if isinstance(aircraft, str) and aircraft else None, pitch_inches=_parse_pitch(pitch_raw), @@ -217,6 +332,10 @@ def _parse_leg_amenities(fl: list[Any]) -> LegAmenities: wifi=_decode_wifi(amenities), power=_decode_power(amenities), video=_decode_video(amenities), + operating_carrier=op_code, + operating_carrier_name=op_name, + marketing_carriers=_marketing_codes(fl), + marketing_flights=_marketing_flights(fl), ) @@ -246,23 +365,34 @@ def _parse_flight_with_id(data: list[Any]) -> GFlightWithId: currency=currency, duration=data[0][9], stops=len(leg_tuples) - 1, - legs=[ - FlightLeg( - airline=SearchFlights._parse_airline(fl[22][0]), # pyright: ignore[reportPrivateUsage] - flight_number=fl[22][1], - departure_airport=SearchFlights._parse_airport(fl[3]), # pyright: ignore[reportPrivateUsage] - arrival_airport=SearchFlights._parse_airport(fl[6]), # pyright: ignore[reportPrivateUsage] - departure_datetime=SearchFlights._parse_datetime(fl[20], fl[8]), # pyright: ignore[reportPrivateUsage] - arrival_datetime=SearchFlights._parse_datetime(fl[21], fl[10]), # pyright: ignore[reportPrivateUsage] - duration=fl[11], - ) - for fl in leg_tuples - ], + legs=[_flight_leg(fl) for fl in leg_tuples], ) amenities = [_parse_leg_amenities(fl) for fl in leg_tuples] return GFlightWithId(flight=flight, flight_id=flight_id, amenities=amenities) +def _flight_leg(fl: list[Any]) -> FlightLeg: + """Build fli's FlightLeg using the BOOKING carrier (see `_resolve_booking`) + as airline/flight_number — not the operating carrier — so the surfaced flight + matches what a passenger books (and what Matrix returns). The operating + carrier is preserved separately on `LegAmenities`.""" + book_code, book_number = _resolve_booking(fl) + if book_code is None: + # No operating or marketing carrier in the tuple — malformed leg. Raise + # so _one_call's except skips this flight, matching the prior behaviour + # of indexing a missing fl[22][0]. + raise ValueError("leg tuple missing carrier identity") + return FlightLeg( + airline=SearchFlights._parse_airline(book_code), # pyright: ignore[reportPrivateUsage] + flight_number=book_number or "", + departure_airport=SearchFlights._parse_airport(fl[3]), # pyright: ignore[reportPrivateUsage] + arrival_airport=SearchFlights._parse_airport(fl[6]), # pyright: ignore[reportPrivateUsage] + departure_datetime=SearchFlights._parse_datetime(fl[20], fl[8]), # pyright: ignore[reportPrivateUsage] + arrival_datetime=SearchFlights._parse_datetime(fl[21], fl[10]), # pyright: ignore[reportPrivateUsage] + duration=fl[11], + ) + + def _cookie_path() -> pathlib.Path: """Where the warmed gflight session cookies live — the shared CLI cache dir (same `MATRIX_CACHE_DIR` override the response cache honors).""" @@ -347,9 +477,12 @@ def _one_call(filters: FlightSearchFilters) -> list[GFlightWithId]: allow_redirects=True, ) resp.raise_for_status() - parsed = json.loads(resp.text.lstrip(")]}'"))[0][2] + body = resp.text + parsed = json.loads(body.lstrip(")]}'"))[0][2] if not parsed: - return [] + if _is_throttle_block(body): + raise GfThrottledError("Google Flights rate-limited the request") + return [] # cold-session empty, or a genuinely flight-less leg # A truthy `parsed` means Google answered a warm session — save its cookies # (NID) so the next one-shot CLI process starts warm instead of cold. _persist_cookies(client) @@ -367,23 +500,48 @@ def _one_call(filters: FlightSearchFilters) -> list[GFlightWithId]: return out -def _one_call_with_retry(filters: FlightSearchFilters) -> list[GFlightWithId]: - """`_one_call`, but retried on an empty result to ride out the cold-session - empties described at `_EMPTY_RETRY_ATTEMPTS`. +def retry_throttled[T](call: Callable[[], T]) -> T: + """Run a GF call under two distinct retry policies (see the constants above); + shared by the search and date-grid paths. - Retries reuse fli's shared (warming) client — that's the whole point; a - fresh session would stay cold. A genuinely flight-less leg pays a few quick - retries of latency, which is rare and preferable to a spurious "no results". - """ - result: list[GFlightWithId] = [] - for attempt in range(1, _EMPTY_RETRY_ATTEMPTS + 1): - result = _one_call(filters) + - cold-session **falsy result** -> a few quick, linearly-spaced retries on the + same (warming) client; a fresh session would stay cold. Returns the falsy + result if it never warms. + - genuine **throttle** (GfThrottledError) -> exponential, jittered backoff; + re-raised when exhausted so the caller can degrade to Matrix. + + Transport errors propagate (fli's client already retried them).""" + empty_attempts = 0 + throttle_attempts = 0 + while True: + try: + result = call() + except GfThrottledError: + throttle_attempts += 1 + if throttle_attempts > _THROTTLE_RETRY_ATTEMPTS: + raise + base = _THROTTLE_BACKOFF_S * (2 ** (throttle_attempts - 1)) + backoff = base * (1 + random.random() * 0.5) # noqa: S311 — jitter, not crypto + log.debug( + "gflight throttled; backoff %.1fs (retry %d/%d)", + backoff, + throttle_attempts, + _THROTTLE_RETRY_ATTEMPTS, + ) + time.sleep(backoff) + continue if result: return result - if attempt < _EMPTY_RETRY_ATTEMPTS: - log.debug("empty gflight response; retry %d/%d", attempt, _EMPTY_RETRY_ATTEMPTS) - time.sleep(_EMPTY_RETRY_BACKOFF_S * attempt) - return result + empty_attempts += 1 + if empty_attempts >= _EMPTY_RETRY_ATTEMPTS: + return result # never warmed, or genuinely empty + log.debug("empty gflight response; retry %d/%d", empty_attempts, _EMPTY_RETRY_ATTEMPTS) + time.sleep(_EMPTY_RETRY_BACKOFF_S * empty_attempts) + + +def _one_call_with_retry(filters: FlightSearchFilters) -> list[GFlightWithId]: + """`_one_call` wrapped in the shared throttle / cold-session retry.""" + return retry_throttled(lambda: _one_call(filters)) def search_with_ids( diff --git a/src/flight_cli/cli.py b/src/flight_cli/cli.py index 2fa1bd9..d89aa3c 100644 --- a/src/flight_cli/cli.py +++ b/src/flight_cli/cli.py @@ -247,23 +247,40 @@ def _pick_backend( ) -> str: """Resolve --backend to a concrete backend. - auto: matrix iff a Matrix-only flag is set, else gflight. - Matrix-only set: --routing/--extension/--slice/--depart-times/--return-times, - any pax type beyond adults+children. PP overlay rides both backends now — - plain `--pp-only` stays on gflight for speed + ULCC inventory. - - Explicit --backend matrix: matrix. --backend gflight: gflight, unless a - Matrix-only flag is also set (error — the request is inexpressible on fli).""" - matrix_only = bool(routing or extension or slice_specs or depart_times or return_times) or ( + auto: matrix iff the request needs it, else gflight (~1s vs Matrix's ~45s). + `--routing`/`--extension` no longer force Matrix on their own — they're + parsed and classified, and Google Flights serves them when it can honor + every constraint (native filters + result post-filter; see routing_predicates + + _gf_postfilter). Only fare-construction (fare basis / booking class) or a + constraint GF can't reconstruct sends routing to Matrix. Hard-Matrix flags — + `--slice` (multi-city), `--depart-times`/`--return-times`, any pax type beyond + adults+children — always force Matrix (the GF bridge doesn't map them yet). + + Explicit --backend matrix: matrix. --backend gflight: gflight, unless the + request is inexpressible on GF (error).""" + from ._gf_postfilter import gf_can_serve # noqa: PLC0415 + from .routing_predicates import classify # noqa: PLC0415 + + hard_matrix = bool(slice_specs or depart_times or return_times) or ( seniors > 0 or youth > 0 or inf_seat > 0 or inf_lap > 0 ) + routing_needs_matrix = bool(routing or extension) and not gf_can_serve( + classify(routing, extension) + ) + matrix_only = hard_matrix or routing_needs_matrix + if backend == BACKEND_AUTO: return BACKEND_MATRIX if matrix_only else BACKEND_GFLIGHT if backend == BACKEND_GFLIGHT and matrix_only: + reason = ( + "--slice/--depart-times/--return-times or an extra pax type" + if hard_matrix + else "a --routing/--extension constraint Google Flights can't honor " + "(fare basis, booking class, or similar)" + ) raise typer.BadParameter( - "--backend gflight is incompatible with Matrix-only flags " - "(--routing/--extension/--slice/--depart-times/--return-times/" - "extra pax types). Drop them, or use --backend matrix.", + f"--backend gflight can't serve this request: {reason}. " + "Drop it, or use --backend matrix.", ) if backend not in _VALID_BACKENDS: raise typer.BadParameter(f"--backend must be one of {_VALID_BACKENDS}; got {backend!r}") @@ -876,6 +893,36 @@ def _fmt_legroom_lines(s: Slice) -> str: return "\n".join(r for r in rows if r) +def _render_date_grid( + grid: dict[str, float], + *, + origin: tuple[str, ...], + destination: tuple[str, ...], + sd: date, + ed: date, +) -> None: + """Render the GF native date-grid: cheapest fare per departure day (USD), + sorted cheapest-first. One-way only (the grid's shape).""" + if not grid: + return + console.print( + f"[bold]{len(grid)} priced days[/] · cheapest: " + f"[bold cyan]{min(grid.values()):.0f} (USD)[/] · " + f"window {sd.isoformat()} → {ed.isoformat()}" + ) + t = Table( + title=f"{','.join(origin)} → {','.join(destination)}: " + "lowest fare per departure day (Google Flights)", + show_header=True, + header_style="bold green", + ) + t.add_column("departure", justify="right") + t.add_column("min (USD)", justify="right") + for day, price in sorted(grid.items(), key=lambda kv: kv[1]): + t.add_row(day, f"{price:.0f}") + console.print(t) + + def _render_calendar( res: CalendarResult, *, @@ -1000,6 +1047,69 @@ def _run_matrix_path( _emit_urls(search, matrix_url=matrix_url, google_url=google_url, result=res, pick=pick) +def _gflight_results(legs: tuple[Leg, ...], opts: SearchOptions, top_n: int) -> list[Any]: + """Query Google Flights for `legs`, honoring routing/extension: Tier-1 + predicates narrow the fli query natively, the Tier-2 post-filter drops + violating solutions. Returns the (filtered) raw fli result list. + + `search` applies the same routing/extension to every leg, so the first leg's + constraints cover the trip for the native query; the post-filter is per slice. + """ + from ._gf_postfilter import surviving_indices # noqa: PLC0415 + from ._gflight_ids import search_with_ids # noqa: PLC0415 + from .fli_bridge import apply_gf_native_filters, to_fli_filter # noqa: PLC0415 + from .pp.gflight_adapter import fli_results_to_search_result # noqa: PLC0415 + from .routing_predicates import classify # noqa: PLC0415 + + fli_filter = to_fli_filter(SpecificDateSearch(legs=legs, options=opts)) + out_constraints = classify(legs[0].route_language, legs[0].extension) if legs else None + if out_constraints and out_constraints.predicates: + apply_gf_native_filters(fli_filter, out_constraints.predicates) + results: list[Any] = search_with_ids(fli_filter, top_n=top_n) or [] + per_slice_preds = [list(classify(lg.route_language, lg.extension).predicates) for lg in legs] + if results and any(per_slice_preds): + keep = set(surviving_indices(fli_results_to_search_result(results), per_slice_preds)) + results = [r for i, r in enumerate(results) if i in keep] + return results + + +_MERGE_SOURCE_TAG = {"both": "GF+MX", "matrix": "MX", "gf": "GF"} + + +def _render_merged(rows: list[Any], *, legs: tuple[Leg, ...], top_n: int) -> None: + """Render the reconciled GF+Matrix view: one row per itinerary with the GF + and Matrix prices attributed side-by-side and a source tag.""" + origin = legs[0].origins[0] if legs[0].origins else "?" + destination = legs[0].destinations[0] if legs[0].destinations else "?" + has_return = len(legs) >= _ROUND_TRIP_LEGS + t = Table( + title=f"Google Flights + Matrix · {origin}→{destination}" + + (" + return" if has_return else ""), + show_header=True, + header_style="bold green", + ) + t.add_column("#", justify="right") + t.add_column("src") + t.add_column("Matrix", justify="right") + t.add_column("Google", justify="right") + t.add_column("outbound") + t.add_column("return") + for i, row in enumerate(rows[:top_n], 1): + itn = row.itinerary.itinerary + slcs: list[Slice] = itn.slices if itn else [] + out = _fmt_slice_cell(slcs[0]) if slcs else "—" + ret = _fmt_slice_cell(slcs[1]) if len(slcs) > 1 else "—" + t.add_row( + str(i), + _MERGE_SOURCE_TAG.get(row.source, row.source), + _amount(row.matrix_price), + _amount(row.gf_price), + out, + ret, + ) + console.print(t) + + def _run_gflight_path( *, legs: tuple[Leg, ...], @@ -1018,25 +1128,23 @@ def _run_gflight_path( the existing PP matcher + renderer reuse cleanly. PP runs on the same (origin, dest, date) per leg as the matrix path. """ - # fli is heavy (selenium/selectolax); lazy-import so the rest of flight_cli - # doesn't pay the startup cost when not used. `search_with_ids` wraps fli's - # encoder + client but parses the response ourselves to capture the opaque - # per-flight ID (data[0][17]) — that's what PP's enableGoogleFlightMatching - # joins against to produce matchedGoogleFlightId in its response. - from ._gflight_ids import search_with_ids # noqa: PLC0415 - from .fli_bridge import to_fli_filter # noqa: PLC0415 + from ._gflight_ids import GfThrottledError # noqa: PLC0415 + from .pp.gflight_adapter import fli_results_to_search_result # noqa: PLC0415 - search = SpecificDateSearch(legs=legs, options=opts) try: - # Returns GFlightWithId | tuple[GFlightWithId, ...]. The .flight attribute - # exposes fli's FlightResult; .flight_id is Google's opaque ID. - results: list[Any] = search_with_ids(to_fli_filter(search), top_n=top_n) or [] + results = _gflight_results(legs, opts, top_n) + except GfThrottledError as e: + err.print( + "[yellow]Google Flights is rate-limiting this IP.[/] Wait a moment and " + "retry, or use [bold]--backend matrix[/]." + ) + raise typer.Exit(1) from e except Exception as e: err.print(f"[red]Google Flights query failed:[/] {e}") raise typer.Exit(1) from e if not results: - console.print("[yellow]Google Flights returned no results.[/]") + console.print("[yellow]Google Flights: no results (or none matched the routing).[/]") return # pyright: ignore[reportUnknownMemberType, reportUnknownVariableType, @@ -1055,12 +1163,10 @@ def _run_gflight_path( awards_only = sel.awards_only if sel is not None else False if not awards_only: - _render_gflight_table(results, legs=legs, top_n=top_n) + _render_gflight_table(results, legs=legs, top_n=top_n, match_carriers=_match_carriers(legs)) # Always adapt to SearchResult shape so the URL emission has segment # info for the pinned link (cheap: just shuffles existing fields). - from .pp.gflight_adapter import fli_results_to_search_result # noqa: PLC0415 - sr = fli_results_to_search_result(results) if run_pp: @@ -1087,6 +1193,107 @@ def _run_gflight_path( ) +def _run_enriched_path( + *, + legs: tuple[Leg, ...], + opts: SearchOptions, + top_n: int, + run_pp: bool, + sel: ProviderSelection | None, + matrix_url: bool, + google_url: bool, + pick: int | None, + rps: float, + impersonate: str, + no_cache: bool, +) -> None: + """GF-serveable query, progressive: dispatch Google Flights + Matrix + concurrently under one event loop, paint GF immediately (~1s), then repaint a + reconciled GF+Matrix table once Matrix lands (~45s). PP/awards + URLs run on + the Matrix (authoritative) result. `--fast` skips this for GF-only speed.""" + from ._enrich import merge_results # noqa: PLC0415 + from .pp.gflight_adapter import fli_results_to_search_result # noqa: PLC0415 + + matrix_search = SpecificDateSearch(legs=legs, options=opts) + awards_only = sel.awards_only if sel is not None else False + state: dict[str, Any] = {} + + async def _matrix(c: MatrixClient) -> None: + try: + state["matrix"] = await c.execute(matrix_search, cache=not no_cache) + except MatrixApiError as e: + state["matrix_err"] = e + + async def _go() -> None: + async with ( + MatrixClient(rps=rps, impersonate=impersonate) as c, + anyio.create_task_group() as tg, + ): + tg.start_soon(_matrix, c) + # Google Flights is sync (curl_cffi) — run it in a worker thread so the + # Matrix request progresses concurrently on the event loop. + try: + gf = await anyio.to_thread.run_sync(_gflight_results, legs, opts, top_n) + except Exception as e: # noqa: BLE001 - reported below; Matrix may still succeed + state["gf_err"] = e + gf = [] + state["gf"] = gf + # First paint, while Matrix is still in flight. + if gf and not awards_only: + _render_gflight_table( + gf, legs=legs, top_n=top_n, match_carriers=_match_carriers(legs) + ) + console.print("[dim]…refining with Matrix (authoritative fares)…[/]") + elif not gf and "gf_err" not in state: + console.print("[yellow]Google Flights: no results; awaiting Matrix…[/]") + + anyio.run(_go) + + gf: list[Any] = state.get("gf") or [] + if "gf_err" in state: + from ._gflight_ids import GfThrottledError # noqa: PLC0415 + + e = state["gf_err"] + if isinstance(e, GfThrottledError): + console.print("[dim]Google Flights rate-limited — showing Matrix only.[/]") + else: + err.print(f"[yellow]Google Flights query failed:[/] {e}") + matrix_res = state.get("matrix") + if matrix_res is None: + # Matrix failed; the GF table (if any) was already painted. + e = state.get("matrix_err") + if e is not None: + err.print(f"[red]Matrix returned an error ({e.kind}):[/] {e.message}") + if not gf: + raise typer.Exit(1) + return + matrix_res = cast("SearchResult", matrix_res) + + # Repaint: reconciled GF + Matrix, prices attributed. + if not awards_only: + merged = merge_results(fli_results_to_search_result(gf), matrix_res) + _render_merged(merged, legs=legs, top_n=top_n) + + if run_pp: + p = opts.pax + run_pp_for_search( + matrix_res, + legs=_build_pp_legs(legs), + num_passengers=p.adults + p.children + p.seniors + p.youth, + airlines=sel.pp_airlines() if sel is not None else None, + cabins=sel.pp_cabins() if sel is not None else None, + pp_only=awards_only, + json_out=False, + provider_filter=sel.provider_filter if sel is not None else None, + seats_sources=sel.seats_sources() if sel is not None else None, + cash_per_cabin=_cash_per_cabin_single(matrix_res, opts.cabin), + ) + + _emit_urls( + matrix_search, matrix_url=matrix_url, google_url=google_url, result=matrix_res, pick=pick + ) + + # ─────────────────────────── multi-cabin orchestration ───────────────────── # When --cabin selects multiple cabins, each cabin's per-query top-N is bumped @@ -1478,11 +1685,49 @@ def _merge_results_into_one( ) -def _render_gflight_table(results: list[Any], *, legs: tuple[Leg, ...], top_n: int) -> None: +def _match_carriers(legs: tuple[Leg, ...]) -> frozenset[str]: + """Marketing carrier codes the user filtered on (for codeshare-aware display). + Empty when there's no marketing-carrier include filter — operating (`O:`) and + exclude filters don't trigger codeshare relabeling.""" + from .routing_predicates import CarrierPred, classify # noqa: PLC0415 + + codes: set[str] = set() + for lg in legs: + for p in classify(lg.route_language, lg.extension).predicates: + if isinstance(p, CarrierPred) and not p.operating and not p.exclude: + codes |= p.codes + return frozenset(codes) + + +def _leg_display(leg: Any, amenity: Any, match_carriers: frozenset[str]) -> str: + """Per-leg label ' '. If the booking carrier isn't in the user's + carrier filter but the leg is sold under a codeshare that IS (e.g. UA58 sold as + LH9407 under `--routing LH+`), show the matched identity: 'LH9407 (op UA58)'.""" + code = getattr(leg.airline, "name", "") or "" + number = getattr(leg, "flight_number", "?") + booking = f"{code} {number}" + if not match_carriers or code in match_carriers: + return booking + raw_mf = getattr(amenity, "marketing_flights", ()) if amenity else () + mflights: tuple[str, ...] = tuple(raw_mf or ()) + for mf in mflights: + if mf[:2].upper() in match_carriers: + return f"{mf} (op {code}{number})" + return booking + + +def _render_gflight_table( + results: list[Any], + *, + legs: tuple[Leg, ...], + top_n: int, + match_carriers: frozenset[str] = frozenset(), +) -> None: """Render fli results as a rich table. Duck-typed: fli has no type stubs. Accepts our `GFlightWithId` wrappers — `.flight` is fli's FlightResult, - `.amenities` is per-leg legroom data parsed from Google's response.""" + `.amenities` is per-leg legroom data parsed from Google's response. + `match_carriers` enables codeshare-aware leg labels (see `_leg_display`).""" origin = legs[0].origins[0] if legs[0].origins else "?" destination = legs[0].destinations[0] if legs[0].destinations else "?" has_return = len(legs) >= _ROUND_TRIP_LEGS @@ -1505,8 +1750,8 @@ def _render_gflight_table(results: list[Any], *, legs: tuple[Leg, ...], top_n: i amenities = getattr(g, "amenities", []) or [] label = f"{i}{'a' if j == 0 else 'b'}" if len(items) > 1 else str(i) legs_str = " → ".join( - f"{getattr(leg.airline, 'name', leg.airline)} {getattr(leg, 'flight_number', '?')}" - for leg in fr.legs + _leg_display(leg, amenities[k] if k < len(amenities) else None, match_carriers) + for k, leg in enumerate(fr.legs) ) mins = fr.duration dur = f"{mins // 60}h{mins % 60:02d}m" @@ -1898,6 +2143,15 @@ def search( rich_help_panel=_GROUP_OUTPUT, ), no_cache: bool = _NO_CACHE_OPT, + fast: bool = typer.Option( + False, + "--fast/--enrich", + "--no-enrich/--no-fast", + help="Skip Matrix enrichment: show only the fast Google Flights result " + "(~1s) instead of also reconciling against Matrix. Default: enrich when " + "Google Flights can serve the query.", + rich_help_panel=_GROUP_BACKEND, + ), providers: str | None = typer.Option( None, "--providers", @@ -2036,7 +2290,10 @@ def search( run_awards = _should_run_awards(sel) if len(cabins_tuple) > 1: - if resolved == BACKEND_GFLIGHT: + # The multi-cabin gflight path doesn't apply the routing/extension + # filters yet, so a constrained multi-cabin search goes to Matrix to + # honor the routing correctly (the single-cabin gflight path filters). + if resolved == BACKEND_GFLIGHT and not (routing or extension): _run_gflight_path_multi( legs=legs, opts=opts, @@ -2066,6 +2323,24 @@ def search( return if resolved == BACKEND_GFLIGHT: + # GF can serve this query — paint it fast (~1s), then enrich against + # Matrix (authoritative) and repaint a merged table. `--fast` (or JSON + # output, which wants a single stable shape) takes the GF-only path. + if not fast and not json_out: + _run_enriched_path( + legs=legs, + opts=opts, + top_n=page_size, + run_pp=run_awards, + sel=sel, + matrix_url=matrix_url, + google_url=google_url, + pick=pick, + rps=_resolve_rps(rps), + impersonate=_resolve_impersonate(impersonate), + no_cache=_resolve_no_cache(no_cache), + ) + return _run_gflight_path( legs=legs, opts=opts, @@ -2387,6 +2662,15 @@ def calendar( rich_help_panel=_GROUP_OUTPUT, ), no_cache: bool = _NO_CACHE_OPT, + fast: bool = typer.Option( + False, + "--fast/--enrich", + "--no-enrich/--no-fast", + help="Skip the Matrix enrichment: show only the fast Google Flights " + "date-grid (one-way, single-airport, Tier-1 filters) instead of also " + "running the authoritative Matrix calendar.", + rich_help_panel=_GROUP_BACKEND, + ), max_per_query: int = typer.Option( 1, "--max-per-query", @@ -2443,6 +2727,38 @@ def calendar( ) window = CalendarWindow(start=sd, end=ed, duration_min=dmin, duration_max=dmax) search = CalendarSearch(legs=legs, options=opts, window=window) + + # Fast layer: the GF native date-grid (~1s, throttle-friendly, dodges Matrix's + # compute-budget under-reporting) for one-way / single-airport / Tier-1-only + # windows. Paint it first, then enrich with the authoritative Matrix calendar + # (full per-duration grid). `--fast` stops after the grid. The cheap pre-check + # avoids importing the fli-heavy module for the Matrix-only cases. + if not json_out and one_way and len(origins) == 1 and len(dests) == 1: + from ._gf_dategrid import date_grid, grid_can_serve # noqa: PLC0415 + + if grid_can_serve(search): + from ._gflight_ids import GfThrottledError # noqa: PLC0415 + + grid: dict[str, float] = {} + try: + grid = date_grid(search) + except GfThrottledError: + console.print("[dim]Google Flights rate-limited — Matrix only.[/]") + except Exception as e: # noqa: BLE001 — GF is the optional fast layer; Matrix still runs + err.print(f"[yellow]Google Flights date-grid failed:[/] {e}") + if grid: + _render_date_grid(grid, origin=origins, destination=dests, sd=sd, ed=ed) + if fast: + if grid: + _emit_urls(search, matrix_url=matrix_url, google_url=google_url) + else: + console.print("[yellow]No Google Flights grid; drop --fast for Matrix.[/]") + return + if grid: + console.print("[dim]…refining with Matrix (full grid + durations)…[/]") + + # Matrix (authoritative; also the only path for round-trip, multi-airport, + # Tier-2/3 routing, or when the grid was empty/throttled). # CalendarSearch → CalendarResult by client._parse_response dispatch. # On a multi-airport brownout, _run_calendar splits per-destination + merges. res, n_split = _run_calendar( diff --git a/src/flight_cli/fli_bridge.py b/src/flight_cli/fli_bridge.py index c38084e..2f7a4e0 100644 --- a/src/flight_cli/fli_bridge.py +++ b/src/flight_cli/fli_bridge.py @@ -8,7 +8,7 @@ from __future__ import annotations from datetime import timedelta -from typing import Any, assert_never +from typing import TYPE_CHECKING, Any, assert_never from .domain import ( Cabin, @@ -17,6 +17,19 @@ Search, SpecificDateSearch, ) +from .routing_predicates import ( + AlliancePred, + CarrierPred, + ConnectionAirportPred, + ConnectTimePred, + MaxDurationPred, + StopsPred, +) + +if TYPE_CHECKING: + from collections.abc import Iterable + + from .routing_predicates import Predicate _ROUND_TRIP_LEGS = 2 # 2 legs = round-trip; 1 = one-way @@ -117,3 +130,88 @@ def run_gflight_search(s: Search, *, top_n: int = 5) -> Any: ) return SearchFlights().search(to_fli_filter(s), top_n=top_n) + + +def _fli_max_stops(max_stops: int) -> Any: + """Map a stop count to fli's MaxStops enum (it tops out at 'two or fewer').""" + from fli.models.google_flights.base import ( # noqa: PLC0415 # pyright: ignore[reportMissingTypeStubs] + MaxStops, + ) + + return { + 0: MaxStops.NON_STOP, + 1: MaxStops.ONE_STOP_OR_FEWER, + 2: MaxStops.TWO_OR_FEWER_STOPS, + }.get(max_stops, MaxStops.ANY) + + +def apply_gf_native_filters(filters: Any, predicates: Iterable[Predicate]) -> bool: # noqa: PLR0912 - flat predicate dispatch + """Apply Tier-1 (GF-native) predicates onto an fli FlightSearchFilters in + place: marketing-carrier include -> airlines; alliance -> airlines; connect- + at airport -> layover_restrictions.airports; max layover -> layover max; + MAXDUR -> max_duration; nonstop / MAXSTOPS -> stops. + + Returns False if a predicate names a carrier or airport code fli can't map — + the caller falls back to Matrix rather than silently dropping the constraint. + Tier-2 / Tier-3 predicates are ignored here (the post-filter and gate own + those).""" + from fli.models.airline import ( # noqa: PLC0415 # pyright: ignore[reportMissingTypeStubs] + Airline, + ) + from fli.models.airport import ( # noqa: PLC0415 # pyright: ignore[reportMissingTypeStubs] + Airport, + ) + from fli.models.google_flights.base import ( # noqa: PLC0415 # pyright: ignore[reportMissingTypeStubs] + LayoverRestrictions, + ) + from fli.search.flights import ( # noqa: PLC0415 # pyright: ignore[reportMissingTypeStubs] + SearchFlights, + ) + + airlines: list[Any] = [] + layover_airports: list[Any] = [] + layover_max: int | None = None + airlines_ok = True + airports_ok = True + + for p in predicates: + if isinstance(p, CarrierPred): + if p.operating or p.exclude: + continue # Tier-2 — honored by the result post-filter + for code in sorted(p.codes): + try: + airlines.append(SearchFlights._parse_airline(code)) # pyright: ignore[reportPrivateUsage] + except AttributeError: + airlines_ok = False # unknown to fli; the post-filter enforces it instead + elif isinstance(p, AlliancePred): + for token in sorted(p.codes): + try: + airlines.append(Airline[token.upper().replace("-", "_")]) + except KeyError: + airlines_ok = False + elif isinstance(p, ConnectionAirportPred): + if p.exclude: + continue # Tier-2 + for code in sorted(p.codes): + try: + layover_airports.append(getattr(Airport, code)) + except AttributeError: + airports_ok = False + elif isinstance(p, StopsPred): + filters.stops = _fli_max_stops(p.max_stops) + elif isinstance(p, MaxDurationPred): + filters.max_duration = p.minutes + elif isinstance(p, ConnectTimePred) and p.max_minutes is not None: + layover_max = p.max_minutes + + # Apply a code-list dimension only if EVERY code mapped — a partial list + # would narrow the query and drop the unmapped carrier's/airport's flights + # (under-return). When skipped, the post-filter enforces it from result data. + if airlines and airlines_ok: + filters.airlines = airlines + use_airports = bool(layover_airports) and airports_ok + if use_airports or layover_max is not None: + filters.layover_restrictions = LayoverRestrictions( + airports=layover_airports if use_airports else None, max_duration=layover_max + ) + return airlines_ok and airports_ok diff --git a/src/flight_cli/models.py b/src/flight_cli/models.py index 8c52f32..c021401 100644 --- a/src/flight_cli/models.py +++ b/src/flight_cli/models.py @@ -98,6 +98,13 @@ class LegInfo(_Loose): wifi: str | None = None # "free" | "paid" | None (no ground-internet wifi) power: str | None = None # "plug" | "usb" | None video: str | None = None # "stream" | "ondemand" | None + # Carrier identity for the routing post-filter (gflight only; Matrix legs + # leave these empty). `Slice.flights[i]` carries the booking carrier+number; + # these add the operating carrier (fl[22]) and the marketing/codeshare set + # (fl[15]) so `O:` / `-CODESHARE` / marketing-exclude can be evaluated. + operating_carrier: str | None = None # IATA code of the metal, e.g. "EN" + marketing_carriers: list[str] = Field(default_factory=list[str]) # selling carrier codes + marketing_flights: list[str] = Field(default_factory=list[str]) # full codeshare flight #s class Slice(_Loose): diff --git a/src/flight_cli/pp/gflight_adapter.py b/src/flight_cli/pp/gflight_adapter.py index 8d36d05..2c52baa 100644 --- a/src/flight_cli/pp/gflight_adapter.py +++ b/src/flight_cli/pp/gflight_adapter.py @@ -59,6 +59,9 @@ def _leg_info(a: LegAmenities) -> LegInfo: wifi=a.wifi, power=a.power, video=a.video, + operating_carrier=a.operating_carrier, + marketing_carriers=list(a.marketing_carriers), + marketing_flights=list(a.marketing_flights), ) diff --git a/src/flight_cli/routing_predicates.py b/src/flight_cli/routing_predicates.py new file mode 100644 index 0000000..98cffa7 --- /dev/null +++ b/src/flight_cli/routing_predicates.py @@ -0,0 +1,362 @@ +"""Parse Matrix routing-language (`--routing`) and extension codes +(`--extension`) into a flat predicate set, classified by how Google Flights can +honor each one: + + - Tier 1 (GF_NATIVE): GF filters it server-side (airlines, alliances, + connecting airports, stops, max duration, max layover). + - Tier 2 (GF_POSTFILTER): not a GF query knob, but evaluable on the base + response payload (operating carrier, -CODESHARE, + min layover, redeyes/overnights, specific flight #). + - Tier 3 (MATRIX_ONLY): fare-construction (fare basis, booking class), + anything GF can neither request nor reconstruct, and + any token this parser doesn't confidently recognize. + +The Tier-3 bucket is the safety net: a query is served from GF alone only when +it carries NO Tier-3 predicate (see `ClassifiedConstraints.requires_matrix`). +We never honor part of a constraint on GF and silently drop the rest — an +unrecognized token escalates the whole query to Matrix. + +Routing language is *positional* (`BA AA` = BA then AA), so it's parsed +all-or-nothing per string: only single order-independent intents (one carrier +with a `+`/`*` quantifier, one connection-airport token, nonstop, one flight +number — placeholders ignored) are recognized; any ordered sequence, bare +single-segment carrier, country filter, or unknown token sends the whole +routing string to Matrix. Extension codes are order-independent and classified +per directive. Grammar: docs/memories/routing_language.md + extension_codes.md. +""" + +from __future__ import annotations + +import re +from dataclasses import dataclass, field +from enum import IntEnum + + +class Tier(IntEnum): + """How Google Flights can honor a predicate (higher = harder).""" + + GF_NATIVE = 1 + GF_POSTFILTER = 2 + MATRIX_ONLY = 3 + + +# ─────────────────────────── predicate types ─────────────────────────── + + +@dataclass(frozen=True, slots=True) +class CarrierPred: + """Marketing or operating carrier inclusion/exclusion. Marketing is a native + GF filter (include directly, exclude via airline-complement); operating has + no GF query knob so it's a post-filter on fl[22].""" + + codes: frozenset[str] + exclude: bool + operating: bool + + @property + def tier(self) -> Tier: + # Marketing *include* is a native GF filter. Exclude would need the + # route's carrier set to complement, and operating has no GF query knob — + # both are reliable post-filters on the result instead. + return Tier.GF_NATIVE if not (self.operating or self.exclude) else Tier.GF_POSTFILTER + + +@dataclass(frozen=True, slots=True) +class AlliancePred: + """One or more of oneworld/skyteam/star-alliance — native GF filter.""" + + codes: frozenset[str] + tier: Tier = field(default=Tier.GF_NATIVE, init=False) + + +@dataclass(frozen=True, slots=True) +class ConnectionAirportPred: + """Connect only at / never at these airports — native GF filter + (include list, or exclude via complement).""" + + codes: frozenset[str] + exclude: bool + + @property + def tier(self) -> Tier: + # Include = native connecting-airports filter; exclude is a post-filter + # (GF's connecting-airports control is an allow-list, not a deny-list). + return Tier.GF_POSTFILTER if self.exclude else Tier.GF_NATIVE + + +@dataclass(frozen=True, slots=True) +class StopsPred: + """Max connecting stops (0 = nonstop) — native GF filter.""" + + max_stops: int + tier: Tier = field(default=Tier.GF_NATIVE, init=False) + + +@dataclass(frozen=True, slots=True) +class MaxDurationPred: + """Max itinerary duration in minutes — native GF filter.""" + + minutes: int + tier: Tier = field(default=Tier.GF_NATIVE, init=False) + + +@dataclass(frozen=True, slots=True) +class ConnectTimePred: + """Layover-time bound in minutes. GF natively filters the *max* layover; a + *min* layover has no query knob, so a min bound is a post-filter.""" + + min_minutes: int | None + max_minutes: int | None + + @property + def tier(self) -> Tier: + return Tier.GF_POSTFILTER if self.min_minutes is not None else Tier.GF_NATIVE + + +@dataclass(frozen=True, slots=True) +class ExcludeRedeyesPred: + """No overnight (red-eye) flights — post-filter on segment times.""" + + tier: Tier = field(default=Tier.GF_POSTFILTER, init=False) + + +@dataclass(frozen=True, slots=True) +class ExcludeOvernightsPred: + """No overnight stops at hubs — post-filter on layover spans.""" + + tier: Tier = field(default=Tier.GF_POSTFILTER, init=False) + + +@dataclass(frozen=True, slots=True) +class ExcludeCodesharePred: + """No codeshare flights — post-filter (a marketing code != the operating + carrier on any leg).""" + + tier: Tier = field(default=Tier.GF_POSTFILTER, init=False) + + +@dataclass(frozen=True, slots=True) +class SpecificFlightPred: + """A specific flight number or range must appear — post-filter on flight #. + A single number has low == high.""" + + carrier: str + low: int + high: int + tier: Tier = field(default=Tier.GF_POSTFILTER, init=False) + + +@dataclass(frozen=True, slots=True) +class UnsupportedPred: + """A token we can neither request on GF nor reconstruct from its payload + (fare basis, mileage, country filter, ordered routing, unknown). Forces + Matrix. `reason` is for diagnostics / the GF-preview caveat.""" + + token: str + reason: str + tier: Tier = field(default=Tier.MATRIX_ONLY, init=False) + + +Predicate = ( + CarrierPred + | AlliancePred + | ConnectionAirportPred + | StopsPred + | MaxDurationPred + | ConnectTimePred + | ExcludeRedeyesPred + | ExcludeOvernightsPred + | ExcludeCodesharePred + | SpecificFlightPred + | UnsupportedPred +) + + +@dataclass(frozen=True, slots=True) +class ClassifiedConstraints: + """The full predicate set parsed from a slice's routing + extension.""" + + predicates: tuple[Predicate, ...] + + @property + def tier1(self) -> tuple[Predicate, ...]: + return tuple(p for p in self.predicates if p.tier is Tier.GF_NATIVE) + + @property + def tier2(self) -> tuple[Predicate, ...]: + return tuple(p for p in self.predicates if p.tier is Tier.GF_POSTFILTER) + + @property + def matrix_only(self) -> tuple[Predicate, ...]: + return tuple(p for p in self.predicates if p.tier is Tier.MATRIX_ONLY) + + @property + def requires_matrix(self) -> bool: + """True iff any predicate can't be honored by GF (native or post-filter). + When False, the query can be served from Google Flights alone.""" + return any(p.tier is Tier.MATRIX_ONLY for p in self.predicates) + + @property + def matrix_reasons(self) -> tuple[str, ...]: + return tuple(p.reason for p in self.predicates if isinstance(p, UnsupportedPred)) + + +# ─────────────────────────── routing parser ──────────────────────────── + +_ALLIANCES = frozenset({"oneworld", "skyteam", "star-alliance"}) + +# Only the *unconstrained* flank placeholders (F* = 0+, F+ = 1+ segments). The +# count-bearing ones (F, F?, X, X+, X?) change the itinerary shape, so a routing +# using them isn't reduced to a flat GF filter — it escalates to Matrix. +_RE_PLACEHOLDER = re.compile(r"^F[+*]$", re.IGNORECASE) +_RE_NONSTOP = re.compile(r"^N(?::([A-Za-z]{2}))?$", re.IGNORECASE) +_RE_CARRIER = re.compile(r"^(~?)(O:|C:)?([A-Za-z]{2})([+*])$", re.IGNORECASE) +_RE_FLIGHTNUM = re.compile(r"^(~?)([A-Za-z]{2})(\d+)(?:-(\d+))?[+*?]?$", re.IGNORECASE) +_RE_AIRPORT = re.compile(r"^(~?)(?:X:)?([A-Za-z]{3}(?:,[A-Za-z]{3})*)$", re.IGNORECASE) + + +def _airport_codes(group: str) -> frozenset[str]: + return frozenset(c.upper() for c in group.split(",")) + + +def _carrier_pred(tok: str) -> CarrierPred | None: + m = _RE_CARRIER.match(tok) + if not m: + return None + return CarrierPred( + frozenset({m.group(3).upper()}), + exclude=m.group(1) == "~", + operating=(m.group(2) or "").upper() == "O:", + ) + + +def _airport_pred(tok: str) -> ConnectionAirportPred | None: + m = _RE_AIRPORT.match(tok) + if not m: + return None + return ConnectionAirportPred(_airport_codes(m.group(2)), exclude=m.group(1) == "~") + + +def _parse_single_routing_token(tok: str) -> list[Predicate] | None: + """Predicates for a one-token routing, or None if it's not a recognized + single form (carrier with quantifier, nonstop, or a specific flight #).""" + if carrier := _carrier_pred(tok): + return [carrier] + if m := _RE_NONSTOP.match(tok): + preds: list[Predicate] = [StopsPred(max_stops=0)] + if cc := m.group(1): + preds.append(CarrierPred(frozenset({cc.upper()}), exclude=False, operating=False)) + return preds + if (m := _RE_FLIGHTNUM.match(tok)) and m.group(1) != "~": + low = int(m.group(3)) + high = int(m.group(4)) if m.group(4) else low + return [SpecificFlightPred(carrier=m.group(2).upper(), low=low, high=high)] + return None + + +def parse_routing(routing: str) -> list[Predicate]: + """Parse one slice's routing-language string into flat predicates. + + Routing language is positional, so only unambiguous order-independent forms + map to GF; anything else (ordered chains like `BA AA` / `DFW DEN`, bare + single-segment carriers, country filters, count placeholders, unknowns) + becomes a single Tier-3 UnsupportedPred — the whole routing goes to Matrix, + never partially honored. + + Mapped forms (case-insensitive): + - single token `LH+` / `~UA+` / `O:LH+` -> carrier include/exclude + - single token `N` / `N:UA` -> nonstop (+ carrier) + - single token `UA882` / `UA1000-2000` -> specific flight # + - `F* X:LHR F*` / `F* ~DFW F*` / `F* DFW,DEN F*` -> connect at / not at + """ + tokens = routing.split() + if not tokens: + return [] + if len(tokens) == 1 and (single := _parse_single_routing_token(tokens[0])) is not None: + return single + # Multi-token: only the canonical flanked-airport idiom maps — F*/F+ + # placeholders around exactly one connection-airport token (`F* X:LHR F*`). + # A flanked carrier would mean "at least one" (not "all"), so it's excluded. + if len(tokens) > 1: + core = [t for t in tokens if not _RE_PLACEHOLDER.match(t)] + if len(core) == 1 and (airport := _airport_pred(core[0])): + return [airport] + return [UnsupportedPred(token=routing, reason=f"routing {routing!r} not GF-expressible")] + + +# ─────────────────────────── extension parser ────────────────────────── + +_RE_HHMM = re.compile(r"^(\d{1,2}):([0-5]\d)$") + + +def _parse_hhmm(arg: str) -> int | None: + if m := _RE_HHMM.match(arg.strip()): + return int(m.group(1)) * 60 + int(m.group(2)) + return None + + +def _carrier_codes(args: list[str]) -> frozenset[str]: + return frozenset(a.upper() for a in args) + + +def _parse_extension_code(directive: str) -> Predicate | None: # noqa: PLR0911, PLR0912 - flat keyword dispatch over the extension grammar + """Parse one extension directive (already split on ';'). None for an empty + directive.""" + parts = directive.split() + if not parts: + return None + keyword = parts[0].upper() + args = parts[1:] + raw = directive.strip() + + match keyword: + case "MAXSTOPS" if args and args[0].isdigit(): + return StopsPred(max_stops=int(args[0])) + case "MAXDUR" if args and (mins := _parse_hhmm(args[0])) is not None: + return MaxDurationPred(minutes=mins) + case "MAXCONNECT" if args and (mins := _parse_hhmm(args[0])) is not None: + return ConnectTimePred(min_minutes=None, max_minutes=mins) + case "MINCONNECT" if args and (mins := _parse_hhmm(args[0])) is not None: + return ConnectTimePred(min_minutes=mins, max_minutes=None) + case "-OVERNIGHTS": + return ExcludeOvernightsPred() + case "-REDEYES": + return ExcludeRedeyesPred() + case "-CODESHARE": + return ExcludeCodesharePred() + case "ALLIANCE" if args: + codes = frozenset(c.lower() for c in " ".join(args).split("|") if c.strip()) + if codes <= _ALLIANCES: + return AlliancePred(codes=codes) + return UnsupportedPred(token=raw, reason=f"unknown alliance in {raw!r}") + case "AIRLINES" if args: + return CarrierPred(_carrier_codes(args), exclude=False, operating=False) + case "-AIRLINES" if args: + return CarrierPred(_carrier_codes(args), exclude=True, operating=False) + case "OPAIRLINES" if args: + return CarrierPred(_carrier_codes(args), exclude=False, operating=True) + case "-OPAIRLINES" if args: + return CarrierPred(_carrier_codes(args), exclude=True, operating=True) + case "-CITIES" if args: + return ConnectionAirportPred(_carrier_codes(args), exclude=True) + case _: + return UnsupportedPred(token=raw, reason=f"extension {raw!r} not expressible on GF") + + +def parse_extension(extension: str) -> list[Predicate]: + """Parse one slice's extension-codes string (semicolon-separated).""" + out: list[Predicate] = [] + for directive in extension.split(";"): + if pred := _parse_extension_code(directive): + out.append(pred) + return out + + +def classify(routing: str | None, extension: str | None) -> ClassifiedConstraints: + """Parse and classify a slice's routing + extension into a predicate set.""" + preds: list[Predicate] = [] + if routing: + preds.extend(parse_routing(routing)) + if extension: + preds.extend(parse_extension(extension)) + return ClassifiedConstraints(predicates=tuple(preds)) diff --git a/tests/test_backend_dispatch.py b/tests/test_backend_dispatch.py index c20d088..0a1e862 100644 --- a/tests/test_backend_dispatch.py +++ b/tests/test_backend_dispatch.py @@ -47,9 +47,7 @@ def test_auto_plain_search_picks_gflight() -> None: @pytest.mark.parametrize( "flag,value", [ - ("routing", "LH+"), - ("extension", "MAXCONNECT 2:00"), - ("slice_specs", ["JFK-LHR:2026-08-15"]), + ("slice_specs", ["JFK-LHR:2026-08-15"]), # multi-city ("depart_times", "morning"), ("return_times", "evening"), ("seniors", 1), @@ -58,9 +56,40 @@ def test_auto_plain_search_picks_gflight() -> None: ("inf_lap", 1), ], ) -def test_auto_matrix_only_flag_picks_matrix(flag: str, value: object) -> None: - # pyright: ignore[reportArgumentType] — `flag` is a parametrize key; types are - # heterogeneous (str/int/list/bool). _call's kwargs accept object. +def test_auto_hard_matrix_flag_picks_matrix(flag: str, value: object) -> None: + """Flags the GF bridge can't map at all always force Matrix.""" + assert _call(**{flag: value}) == BACKEND_MATRIX # pyright: ignore[reportArgumentType] + + +@pytest.mark.parametrize( + "flag,value", + [ + ("routing", "LH+"), # marketing carrier (native) + ("routing", "F* X:FRA F*"), # via airport (native) + ("routing", "O:LH+"), # operating carrier (Tier-2 post-filter) + ("extension", "MAXCONNECT 2:00"), # layover max (native) + ("extension", "ALLIANCE star-alliance; MAXSTOPS 1"), # native + ("extension", "-CODESHARE"), # Tier-2 post-filter + ], +) +def test_auto_gf_serveable_routing_picks_gflight(flag: str, value: object) -> None: + """Routing/extension GF can honor (native + post-filter) stays on gflight.""" + assert _call(**{flag: value}) == BACKEND_GFLIGHT # pyright: ignore[reportArgumentType] + + +@pytest.mark.parametrize( + "flag,value", + [ + ("extension", "F bc=y"), # fare basis (Tier 3) + ("extension", "MAXMILES 8000"), # mileage (Tier 3) + ("routing", "BA AA"), # ordered carrier chain — not GF-expressible + ("extension", "MINCONNECT 1:00"), # min layover — unsupported Tier-2 + ("extension", "-REDEYES"), # red-eyes — unsupported Tier-2 (no per-seg times) + ], +) +def test_auto_non_serveable_routing_picks_matrix(flag: str, value: object) -> None: + """Routing GF can't honor (fare construction, ordered chains, unsupported + Tier-2) falls back to Matrix.""" assert _call(**{flag: value}) == BACKEND_MATRIX # pyright: ignore[reportArgumentType] @@ -75,15 +104,23 @@ def test_auto_matrix_only_flag_picks_matrix(flag: str, value: object) -> None: def test_explicit_matrix_always_wins() -> None: assert _call(BACKEND_MATRIX) == BACKEND_MATRIX assert _call(BACKEND_MATRIX, routing="LH+") == BACKEND_MATRIX + assert _call(BACKEND_MATRIX, extension="F bc=y") == BACKEND_MATRIX def test_explicit_gflight_with_plain_search() -> None: assert _call(BACKEND_GFLIGHT) == BACKEND_GFLIGHT -def test_explicit_gflight_rejects_matrix_only_flags() -> None: - with pytest.raises(typer.BadParameter, match="incompatible"): - _call(BACKEND_GFLIGHT, routing="LH+") +def test_explicit_gflight_allows_gf_serveable_routing() -> None: + assert _call(BACKEND_GFLIGHT, routing="LH+") == BACKEND_GFLIGHT + assert _call(BACKEND_GFLIGHT, extension="MAXCONNECT 2:00") == BACKEND_GFLIGHT + + +def test_explicit_gflight_rejects_unserveable_request() -> None: + with pytest.raises(typer.BadParameter, match="can't serve"): + _call(BACKEND_GFLIGHT, extension="F bc=y") + with pytest.raises(typer.BadParameter, match="can't serve"): + _call(BACKEND_GFLIGHT, slice_specs=["JFK-LHR:2026-08-15"]) def test_unknown_backend_rejected() -> None: diff --git a/tests/test_codeshare_display.py b/tests/test_codeshare_display.py new file mode 100644 index 0000000..51d820f --- /dev/null +++ b/tests/test_codeshare_display.py @@ -0,0 +1,61 @@ +# pyright: reportPrivateUsage=false +"""Tests for codeshare-aware leg labels (_leg_display) and _match_carriers.""" + +from __future__ import annotations + +from types import SimpleNamespace +from typing import Any + +from flight_cli.cli import _leg_display, _match_carriers +from flight_cli.domain import Leg + + +def _leg(code: str, number: str, marketing_flights: tuple[str, ...] = ()) -> tuple[Any, Any]: + leg = SimpleNamespace(airline=SimpleNamespace(name=code), flight_number=number) + amenity = SimpleNamespace(marketing_flights=marketing_flights) + return leg, amenity + + +def test_relabels_codeshare_to_matched_identity() -> None: + """UA58 sold as LH9407 under `--routing LH+` -> show the LH identity.""" + leg, amenity = _leg("UA", "58", marketing_flights=("LH9407",)) + assert _leg_display(leg, amenity, frozenset({"LH"})) == "LH9407 (op UA58)" + + +def test_passthrough_when_booking_carrier_already_matches() -> None: + leg, amenity = _leg("LH", "455", marketing_flights=()) + assert _leg_display(leg, amenity, frozenset({"LH"})) == "LH 455" + + +def test_passthrough_when_no_carrier_filter() -> None: + leg, amenity = _leg("UA", "58", marketing_flights=("LH9407",)) + assert _leg_display(leg, amenity, frozenset()) == "UA 58" + + +def test_passthrough_when_no_matching_codeshare() -> None: + # Booking UA, filter BA, no BA codeshare -> leave it as the booking identity. + leg, amenity = _leg("UA", "58", marketing_flights=("LH9407",)) + assert _leg_display(leg, amenity, frozenset({"BA"})) == "UA 58" + + +def test_handles_missing_amenity() -> None: + leg, _ = _leg("UA", "58") + assert _leg_display(leg, None, frozenset({"LH"})) == "UA 58" + + +# ─────────────────────────── _match_carriers ─────────────────────────── + + +def test_match_carriers_from_marketing_include() -> None: + legs = (Leg.of(["SFO"], ["FRA"], None, route_language="LH+"),) + assert _match_carriers(legs) == frozenset({"LH"}) + + +def test_match_carriers_empty_for_operating_filter() -> None: + # O:LH is an operating filter — codeshare relabeling doesn't apply. + legs = (Leg.of(["SFO"], ["FRA"], None, route_language="O:LH+"),) + assert _match_carriers(legs) == frozenset() + + +def test_match_carriers_empty_without_routing() -> None: + assert _match_carriers((Leg.of(["SFO"], ["FRA"], None),)) == frozenset() diff --git a/tests/test_enrich.py b/tests/test_enrich.py new file mode 100644 index 0000000..c532460 --- /dev/null +++ b/tests/test_enrich.py @@ -0,0 +1,80 @@ +# pyright: reportCallIssue=false +# DIVERGE: pydantic Field(alias=...) on _Loose models trips basedpyright into +# treating alias names as required kwargs. Same posture as tests/pp/test_match.py. +"""Tests for reconciling GF + Matrix cash results (_enrich.merge_results).""" + +from __future__ import annotations + +from flight_cli._enrich import merge_results +from flight_cli.models import ( + Itinerary, + ItineraryDetails, + ItineraryExt, + SearchResult, + Slice, +) + + +def _it(price: str, flights: list[str], dep: str = "2026-08-15T08:00") -> Itinerary: + return Itinerary( + ext=ItineraryExt(price=price), + itinerary=ItineraryDetails(slices=[Slice(flights=flights, departure=dep)]), + ) + + +def _sr(*its: Itinerary) -> SearchResult: + return SearchResult(solutionCount=len(its), solutions=list(its)) + + +def _first_flight(it: Itinerary) -> str | None: + itn = it.itinerary + return itn.slices[0].flights[0] if itn and itn.slices else None + + +def test_matched_itinerary_carries_both_prices_matrix_authoritative() -> None: + gf = _sr(_it("USD500.00", ["LH455"], dep="2026-08-15T08:00")) + # Same flight + date, different time + price -> still a match (flight# + date). + matrix = _sr(_it("USD505.00", ["LH455"], dep="2026-08-15T09:30")) + (row,) = merge_results(gf, matrix) + assert row.source == "both" + assert row.gf_price == "USD500.00" + assert row.matrix_price == "USD505.00" + assert row.itinerary.price == "USD505.00" # Matrix structure is authoritative + + +def test_matrix_only_row() -> None: + (row,) = merge_results(_sr(), _sr(_it("USD600.00", ["AF83"]))) + assert row.source == "matrix" + assert row.matrix_price == "USD600.00" + assert row.gf_price is None + + +def test_gf_only_row() -> None: + (row,) = merge_results(_sr(_it("USD380.00", ["UA58"])), _sr()) + assert row.source == "gf" + assert row.gf_price == "USD380.00" + assert row.matrix_price is None + + +def test_merge_sorts_by_best_price_and_tags_sources() -> None: + gf = _sr(_it("USD380.00", ["UA58"]), _it("USD500.00", ["LH455"])) + matrix = _sr(_it("USD505.00", ["LH455"]), _it("USD900.00", ["AF83"])) + rows = merge_results(gf, matrix) + assert [(r.source, _first_flight(r.itinerary)) for r in rows] == [ + ("gf", "UA58"), # 380 — GF-only (ULCC/codeshare) + ("both", "LH455"), # 500/505 — matched + ("matrix", "AF83"), # 900 — Matrix-only + ] + + +def test_unkeyed_itineraries_stay_single_source() -> None: + # No flights -> unmatchable -> kept as a single-source row, not merged. + gf = _sr(_it("USD100.00", [])) + matrix = _sr(_it("USD100.00", [])) + rows = merge_results(gf, matrix) + assert len(rows) == 2 + assert {r.source for r in rows} == {"gf", "matrix"} + + +def test_empty_inputs() -> None: + assert merge_results(_sr(), _sr()) == [] diff --git a/tests/test_gf_dategrid.py b/tests/test_gf_dategrid.py new file mode 100644 index 0000000..94ef261 --- /dev/null +++ b/tests/test_gf_dategrid.py @@ -0,0 +1,111 @@ +# pyright: reportPrivateUsage=false +"""Tests for the GF native date-grid foundation (gate, parse, chunking).""" + +from __future__ import annotations + +import json +from datetime import date +from typing import TYPE_CHECKING, Any + +from flight_cli import _gf_dategrid +from flight_cli._gf_dategrid import _parse_grid, date_grid, grid_can_serve +from flight_cli.domain import CalendarSearch, CalendarWindow, Leg + +if TYPE_CHECKING: + import pytest + + +def _cal( + *, + legs: tuple[Leg, ...] | None = None, + routing: str | None = None, + ext: str | None = None, + start: date = date(2026, 8, 10), + end: date = date(2026, 8, 25), +) -> CalendarSearch: + return CalendarSearch( + legs=legs or (Leg.of(["SFO"], ["FRA"], route_language=routing, extension=ext),), + window=CalendarWindow(start=start, end=end, duration_min=5, duration_max=7), + ) + + +# ─────────────────────────── gate ────────────────────────────────────── + + +def test_grid_serves_oneway_single_airport_tier1() -> None: + assert grid_can_serve(_cal()) # no routing + assert grid_can_serve(_cal(routing="LH+")) # Tier-1 marketing carrier + assert grid_can_serve(_cal(routing="F* X:FRA F*")) # Tier-1 via-airport + + +def test_grid_declines_tier2_and_tier3() -> None: + assert not grid_can_serve(_cal(routing="O:LH+")) # operating -> Tier-2 (no itineraries) + assert not grid_can_serve(_cal(ext="F bc=y")) # fare basis -> Tier-3 + + +def test_grid_declines_multi_airport_and_round_trip() -> None: + round_trip = (Leg.of(["SFO"], ["FRA"]), Leg.of(["FRA"], ["SFO"])) + assert not grid_can_serve(_cal(legs=(Leg.of(["SFO", "OAK"], ["FRA"]),))) # multi-airport + assert not grid_can_serve(_cal(legs=round_trip)) + + +# ─────────────────────────── parse ───────────────────────────────────── + + +def test_parse_grid_extracts_date_price() -> None: + payload = json.dumps( + [None, [["2026-08-10", None, [["x", 524]]], ["2026-08-11", None, [["x", 530.0]]]]] + ) + assert _parse_grid(payload) == {"2026-08-10": 524.0, "2026-08-11": 530.0} + + +def test_parse_grid_skips_malformed_items() -> None: + payload = json.dumps([None, [["2026-08-10", None, [["x", 600]]], ["bad"], [None, None, None]]]) + assert _parse_grid(payload) == {"2026-08-10": 600.0} + + +# ─────────────────────────── chunking (work-bcdex) ───────────────────── + + +def test_date_grid_chunks_over_61_days_and_merges(monkeypatch: pytest.MonkeyPatch) -> None: + """A >61-day window is split into ≤61-day chunks we drive ourselves (with the + full filter set), not fli's filter-dropping chunker.""" + chunks: list[tuple[str, str]] = [] + + def fake_filters(_search: Any, from_iso: str, to_iso: str, _preds: Any) -> Any: + chunks.append((from_iso, to_iso)) + return (from_iso, to_iso) + + def fake_call(filters: Any) -> dict[str, float]: + from_iso, _to = filters + return {from_iso: 100.0} + + monkeypatch.setattr(_gf_dategrid, "_grid_filters", fake_filters) + monkeypatch.setattr(_gf_dategrid, "_one_grid_call", fake_call) + + # 2026-08-10 .. 2026-10-20 = 72 days -> two chunks (61 + 11). + out = date_grid(_cal(start=date(2026, 8, 10), end=date(2026, 10, 20))) + + assert len(chunks) == 2 + assert chunks[0] == ("2026-08-10", "2026-10-09") # first 61 days + assert chunks[1] == ("2026-10-10", "2026-10-20") # remainder + for from_iso, to_iso in chunks: + span = date.fromisoformat(to_iso).toordinal() - date.fromisoformat(from_iso).toordinal() + 1 + assert span <= _gf_dategrid._MAX_GRID_DAYS + assert out == {"2026-08-10": 100.0, "2026-10-10": 100.0} + + +def test_date_grid_single_chunk_under_61_days(monkeypatch: pytest.MonkeyPatch) -> None: + calls = {"n": 0} + + def fake_call(_f: Any) -> dict[str, float]: + calls["n"] += 1 + return {"d": 1.0} + + def fake_filters(*_a: object) -> object: + return None + + monkeypatch.setattr(_gf_dategrid, "_grid_filters", fake_filters) + monkeypatch.setattr(_gf_dategrid, "_one_grid_call", fake_call) + date_grid(_cal()) # 16-day window -> single chunk + assert calls["n"] == 1 diff --git a/tests/test_gf_native_filters.py b/tests/test_gf_native_filters.py new file mode 100644 index 0000000..cae7538 --- /dev/null +++ b/tests/test_gf_native_filters.py @@ -0,0 +1,114 @@ +# pyright: reportMissingTypeStubs=false, reportUnknownMemberType=false, reportUnknownVariableType=false, reportUnknownArgumentType=false +"""Tests for mapping Tier-1 predicates onto fli's native FlightSearchFilters.""" + +from __future__ import annotations + +from typing import Any + +from fli.models.airport import Airport +from fli.models.google_flights.base import MaxStops, SeatType, TripType +from fli.models.google_flights.flights import ( + FlightSearchFilters, + FlightSegment, + PassengerInfo, +) + +from flight_cli.fli_bridge import apply_gf_native_filters +from flight_cli.routing_predicates import ( + AlliancePred, + CarrierPred, + ConnectionAirportPred, + ConnectTimePred, + MaxDurationPred, + StopsPred, + classify, +) + + +def _filters() -> Any: + return FlightSearchFilters( + passenger_info=PassengerInfo(adults=1), + flight_segments=[ + FlightSegment( + departure_airport=[[Airport.JFK, 0]], + arrival_airport=[[Airport.LHR, 0]], + travel_date="2026-08-15", + ) + ], + stops=MaxStops.ANY, + seat_type=SeatType.ECONOMY, + trip_type=TripType.ONE_WAY, + ) + + +def _names(airlines: Any) -> list[str]: + return sorted(a.name for a in airlines) + + +def test_marketing_carrier_include_maps_to_airlines() -> None: + f = _filters() + lh = CarrierPred(frozenset({"LH"}), exclude=False, operating=False) + assert apply_gf_native_filters(f, [lh]) + assert _names(f.airlines) == ["LH"] + assert f.encode() # the mapped filter still serializes to a valid TFS request + + +def test_alliance_maps_to_airline_token() -> None: + f = _filters() + assert apply_gf_native_filters(f, [AlliancePred(frozenset({"star-alliance"}))]) + assert _names(f.airlines) == ["STAR_ALLIANCE"] + assert f.encode() + + +def test_connect_airport_and_max_layover_map_to_layover_restrictions() -> None: + f = _filters() + preds = [ + ConnectionAirportPred(frozenset({"FRA"}), exclude=False), + ConnectTimePred(min_minutes=None, max_minutes=120), + ] + assert apply_gf_native_filters(f, preds) + assert [a.name for a in f.layover_restrictions.airports] == ["FRA"] + assert f.layover_restrictions.max_duration == 120 + assert f.encode() + + +def test_nonstop_and_maxdur_map_to_stops_and_duration() -> None: + f = _filters() + assert apply_gf_native_filters(f, [StopsPred(max_stops=0), MaxDurationPred(minutes=600)]) + assert f.stops is MaxStops.NON_STOP + assert f.max_duration == 600 + + +def test_unknown_carrier_code_returns_false() -> None: + f = _filters() + # 'XX' isn't a real IATA carrier in fli's enum -> can't map -> escalate. + xx = CarrierPred(frozenset({"XX"}), exclude=False, operating=False) + assert not apply_gf_native_filters(f, [xx]) + + +def test_unknown_airport_code_returns_false() -> None: + f = _filters() + zzz = ConnectionAirportPred(frozenset({"ZZZ"}), exclude=False) + assert not apply_gf_native_filters(f, [zzz]) + + +def test_tier2_predicates_are_ignored_by_native_mapper() -> None: + f = _filters() + preds = [ + CarrierPred(frozenset({"UA"}), exclude=True, operating=False), # exclude -> Tier 2 + CarrierPred(frozenset({"LH"}), exclude=False, operating=True), # operating -> Tier 2 + ConnectionAirportPred(frozenset({"DFW"}), exclude=True), # exclude -> Tier 2 + ] + assert apply_gf_native_filters(f, preds) + assert f.airlines is None # nothing native applied + assert f.layover_restrictions is None + + +def test_integration_classify_then_apply() -> None: + c = classify("LH+", "MAXSTOPS 1; MAXCONNECT 2:00") + f = _filters() + assert apply_gf_native_filters(f, c.predicates) + assert _names(f.airlines) == ["LH"] + assert f.stops is MaxStops.ONE_STOP_OR_FEWER + assert f.layover_restrictions.max_duration == 120 + assert f.encode() diff --git a/tests/test_gf_postfilter.py b/tests/test_gf_postfilter.py new file mode 100644 index 0000000..aa28661 --- /dev/null +++ b/tests/test_gf_postfilter.py @@ -0,0 +1,176 @@ +# pyright: reportCallIssue=false +# DIVERGE: pydantic Field(alias=...) on _Loose models trips basedpyright into +# treating alias names as required kwargs even though populate_by_name=True is +# set. Same posture as tests/pp/test_match.py + pp/gflight_adapter.py. +"""Tests for the Tier-2 Google Flights post-filter.""" + +from __future__ import annotations + +from typing import TYPE_CHECKING + +from flight_cli._gf_postfilter import apply_postfilter, can_postfilter, gf_can_serve +from flight_cli.models import ( + Itinerary, + ItineraryDetails, + ItineraryExt, + LegInfo, + SearchResult, + Slice, + SliceEndpoint, +) +from flight_cli.routing_predicates import ( + CarrierPred, + ConnectionAirportPred, + ConnectTimePred, + ExcludeCodesharePred, + ExcludeRedeyesPred, + Predicate, + SpecificFlightPred, + classify, +) + +if TYPE_CHECKING: + from collections.abc import Sequence + + +def _slice(legs: Sequence[tuple[str, str | None, list[str]]], stops: Sequence[str] = ()) -> Slice: + """legs = [(flight_number, operating_carrier, marketing_carriers)].""" + return Slice( + flights=[f for f, _, _ in legs], + stops=[SliceEndpoint(code=s) for s in stops], + legs=[LegInfo(operating_carrier=op, marketing_carriers=list(mk)) for _, op, mk in legs], + ) + + +def _result(*slices: Slice) -> SearchResult: + sols = [ + Itinerary(ext=ItineraryExt(price="USD100.00"), itinerary=ItineraryDetails(slices=[s])) + for s in slices + ] + return SearchResult(solutionCount=len(sols), solutions=sols) + + +def _filter(result: SearchResult, *preds: Predicate) -> list[str]: + """Apply preds to slice 0 and return the surviving first-flight numbers.""" + out = apply_postfilter(result, [list(preds)]) + flights: list[str] = [] + for it in out.solutions: + itn = it.itinerary + if itn is not None: + flights.append(itn.slices[0].flights[0]) + return flights + + +# ─────────────────────────── operating carrier ───────────────────────── + + +def test_operating_include_keeps_only_all_matching() -> None: + res = _result( + _slice([("LH400", "LH", ["LH"])]), # operated by LH + _slice([("UA100", "UA", ["UA"])]), # operated by UA + ) + kept = _filter(res, CarrierPred(frozenset({"LH"}), exclude=False, operating=True)) + assert kept == ["LH400"] + + +def test_operating_exclude_drops_matching() -> None: + res = _result(_slice([("LH400", "LH", ["LH"])]), _slice([("UA100", "UA", ["UA"])])) + kept = _filter(res, CarrierPred(frozenset({"UA"}), exclude=True, operating=True)) + assert kept == ["LH400"] + + +# ─────────────────────────── marketing exclude ───────────────────────── + + +def test_marketing_exclude_drops_by_booking_or_codeshare() -> None: + res = _result( + _slice([("UA100", "UA", ["UA"])]), # booked UA -> excluded + _slice([("LH9498", "EN", ["LH"])]), # LH/Air Dolomiti, no UA -> kept + _slice([("LH900", "LH", ["UA"])]), # UA codeshare in marketing set -> excluded + ) + kept = _filter(res, CarrierPred(frozenset({"UA"}), exclude=True, operating=False)) + assert kept == ["LH9498"] + + +# ─────────────────────────── connection airport ──────────────────────── + + +def test_connection_exclude_drops_via_airport() -> None: + res = _result( + _slice([("AA1", "AA", ["AA"]), ("AA2", "AA", ["AA"])], stops=["DFW"]), + _slice([("AA3", "AA", ["AA"]), ("AA4", "AA", ["AA"])], stops=["ORD"]), + ) + kept = _filter(res, ConnectionAirportPred(frozenset({"DFW"}), exclude=True)) + assert kept == ["AA3"] + + +# ─────────────────────────── codeshare ───────────────────────────────── + + +def test_codeshare_exclude_drops_operated_for_legs() -> None: + res = _result( + _slice([("LH9498", "EN", ["LH"])]), # LH marketed, EN operated -> codeshare + _slice([("LH400", "LH", ["LH"])]), # LH on LH metal -> not codeshare + ) + kept = _filter(res, ExcludeCodesharePred()) + assert kept == ["LH400"] + + +# ─────────────────────────── specific flight ─────────────────────────── + + +def test_specific_flight_number_and_range() -> None: + res = _result(_slice([("UA882", "UA", ["UA"])]), _slice([("UA999", "UA", ["UA"])])) + assert _filter(res, SpecificFlightPred("UA", 882, 882)) == ["UA882"] + res2 = _result(_slice([("UA882", "UA", ["UA"])]), _slice([("UA3000", "UA", ["UA"])])) + assert _filter(res2, SpecificFlightPred("UA", 1000, 2000)) == [] + + +# ─────────────────────────── per-slice scoping ───────────────────────── + + +def test_predicates_apply_only_to_their_slice() -> None: + """Outbound `~UA` must not filter on the return slice.""" + out_clean_ret_ua = Itinerary( + ext=ItineraryExt(price="USD1.00"), + itinerary=ItineraryDetails( + slices=[_slice([("LH1", "LH", ["LH"])]), _slice([("UA9", "UA", ["UA"])])] + ), + ) + out_ua = Itinerary( + ext=ItineraryExt(price="USD2.00"), + itinerary=ItineraryDetails( + slices=[_slice([("UA1", "UA", ["UA"])]), _slice([("LH9", "LH", ["LH"])])] + ), + ) + res = SearchResult(solutionCount=2, solutions=[out_clean_ret_ua, out_ua]) + out = apply_postfilter(res, [[CarrierPred(frozenset({"UA"}), exclude=True, operating=False)]]) + # only the itinerary with UA on the OUTBOUND slice is dropped + assert len(out.solutions) == 1 + assert out.solution_count == 1 + itn = out.solutions[0].itinerary + assert itn is not None + assert itn.slices[0].flights == ["LH1"] + + +# ─────────────────────────── gate helpers ────────────────────────────── + + +def test_can_postfilter_supported_vs_unsupported() -> None: + assert can_postfilter(CarrierPred(frozenset({"LH"}), exclude=False, operating=True)) + assert can_postfilter(ExcludeCodesharePred()) + assert not can_postfilter(ConnectTimePred(min_minutes=60, max_minutes=None)) # min layover + assert not can_postfilter(ExcludeRedeyesPred()) + + +def test_gf_can_serve() -> None: + assert gf_can_serve(classify("O:LH+", "AIRLINES BA AF; MAXSTOPS 1")) + assert not gf_can_serve(classify("LH+", "F bc=y")) # Tier-3 fare basis + assert not gf_can_serve(classify("LH+", "MINCONNECT 1:00")) # unsupported Tier-2 + assert not gf_can_serve(classify("LH+", "-REDEYES")) + + +def test_apply_postfilter_no_predicates_is_noop() -> None: + res = _result(_slice([("UA1", "UA", ["UA"])])) + out = apply_postfilter(res, [[]]) + assert out.solution_count == 1 diff --git a/tests/test_gflight_carrier.py b/tests/test_gflight_carrier.py new file mode 100644 index 0000000..2612140 --- /dev/null +++ b/tests/test_gflight_carrier.py @@ -0,0 +1,137 @@ +# pyright: reportPrivateUsage=false +"""Tests for marketing/operating carrier resolution in _gflight_ids. + +A leg tuple carries the OPERATING carrier at fl[22] and the MARKETING (selling) +carriers at fl[15]; fl[18] is truthy when the operating carrier self-markets. +The booking carrier a passenger sees (and what Matrix returns) is the marketing +carrier on operated-for regional legs, else the operating carrier. Ground-truthed +2026-06-13 against Google Flights' own headline labels (see _resolve_booking).""" + +from __future__ import annotations + +from typing import Any + +from flight_cli._gflight_ids import ( + LegAmenities, + _carrier_entry, + _marketing_codes, + _parse_leg_amenities, + _resolve_booking, +) + + +def _leg( + *, + operating: list[Any] | None, + marketing: list[list[Any]] | None, + self_marketed: object, +) -> list[Any]: + """33-element leg tuple with only the carrier-identity indices populated.""" + leg: list[Any] = [None] * 33 + leg[15] = marketing + leg[18] = self_marketed + leg[22] = operating + return leg + + +# Ground-truth shapes captured 2026-06-13. +_EN8858 = _leg( # Air Dolomiti operating for Lufthansa (operated-for) + operating=["EN", "8858", None, "Air Dolomiti"], + marketing=[["LH", "9498", None, "Lufthansa"]], + self_marketed=None, +) +_OS36 = _leg( # Austrian self-marketing, United codeshare + operating=["OS", "36", None, "Austrian"], + marketing=[["UA", "9820", None, "United"]], + self_marketed=[True], +) +_LX39 = _leg( # SWISS mainline, no codeshare + operating=["LX", "39", None, "SWISS"], + marketing=None, + self_marketed=[True], +) +_AF83 = _leg( # Air France self-marketing, multiple codeshares + operating=["AF", "83", None, "Air France"], + marketing=[["DL", "83", None, "Delta"], ["KL", "1066", None, "KLM"]], + self_marketed=[True], +) + + +# ─────────────────────────── booking carrier ─────────────────────────── + + +def test_booking_operated_for_uses_marketing() -> None: + """Air Dolomiti operating for LH -> book as Lufthansa LH9498, not EN8858.""" + assert _resolve_booking(_EN8858) == ("LH", "9498") + + +def test_booking_self_marketed_uses_operating() -> None: + """Austrian self-sells OS36 (UA is a codeshare) -> book as Austrian.""" + assert _resolve_booking(_OS36) == ("OS", "36") + + +def test_booking_mainline_uses_operating() -> None: + """No codeshare -> the operating carrier is the booking carrier.""" + assert _resolve_booking(_LX39) == ("LX", "39") + + +def test_booking_multi_codeshare_self_marketed_uses_operating() -> None: + """AF83 self-marketed with DL+KL codeshares -> book as Air France.""" + assert _resolve_booking(_AF83) == ("AF", "83") + + +def test_booking_falls_back_to_operating_on_malformed_marketing() -> None: + leg = _leg(operating=["EN", "8858"], marketing=[["LH"]], self_marketed=None) + assert _resolve_booking(leg) == ("EN", "8858") + + +def test_booking_short_tuple_no_indexerror() -> None: + assert _resolve_booking([None, None]) == (None, None) + + +# ─────────────────────────── carrier extraction ──────────────────────── + + +def test_amenities_capture_operating_and_marketing_codeshare() -> None: + a = _parse_leg_amenities(_EN8858) + assert a.operating_carrier == "EN" + assert a.operating_carrier_name == "Air Dolomiti" + assert a.marketing_carriers == ("LH",) + assert a.marketing_flights == ("LH9498",) + + +def test_amenities_mainline_has_no_marketing_partners() -> None: + a = _parse_leg_amenities(_LX39) + assert a.operating_carrier == "LX" + assert a.operating_carrier_name == "SWISS" + assert a.marketing_carriers == () + + +def test_amenities_multi_codeshare_collects_all_marketing_codes() -> None: + a = _parse_leg_amenities(_AF83) + assert a.marketing_carriers == ("DL", "KL") + assert a.marketing_flights == ("DL83", "KL1066") + + +def test_amenities_carrier_fields_none_on_empty_leg() -> None: + a = _parse_leg_amenities([None] * 33) + assert a == LegAmenities() # all defaults — no carrier identity, no amenities + + +# ─────────────────────────── helpers ─────────────────────────────────── + + +def test_carrier_entry_full_and_malformed() -> None: + assert _carrier_entry(["LH", "9498", None, "Lufthansa"]) == ("LH", "9498", "Lufthansa") + assert _carrier_entry(["EN", "8858"]) == ("EN", "8858", None) + assert _carrier_entry(["LH"]) == (None, None, None) + assert _carrier_entry(None) == (None, None, None) + + +def test_marketing_codes_skips_malformed_entries() -> None: + leg = _leg( + operating=["AF", "83", None, "Air France"], + marketing=[["DL", "83"], ["bad"], ["KL", "1"]], + self_marketed=[True], + ) + assert _marketing_codes(leg) == ("DL", "KL") diff --git a/tests/test_gflight_throttle.py b/tests/test_gflight_throttle.py new file mode 100644 index 0000000..e35afad --- /dev/null +++ b/tests/test_gflight_throttle.py @@ -0,0 +1,121 @@ +# pyright: reportPrivateUsage=false +"""Tests for GF throttle detection + the two-policy retry in _gflight_ids.""" + +from __future__ import annotations + +from typing import Any, cast + +import pytest + +from flight_cli import _gflight_ids +from flight_cli._gflight_ids import GfThrottledError, _is_throttle_block + +# A genuine throttle body: HTTP 200 wrapper with a code-13 ErrorResponse. +_BLOCK_BODY = ( + ')]}\'\n\n[["wrb.fr",null,null,null,null,[13,null,' + '[["type.googleapis.com/travel.frontend.flights.ErrorResponse",[[null]]]]]]]' +) +_FILTERS = cast("Any", None) # patched _one_call ignores its arg + + +@pytest.fixture(autouse=True) +def _no_sleep_no_jitter( # pyright: ignore[reportUnusedFunction] - autouse pytest fixture + monkeypatch: pytest.MonkeyPatch, +) -> None: + def _noop(*_a: object) -> None: + return None + + def _zero() -> float: + return 0.0 + + monkeypatch.setattr(_gflight_ids.time, "sleep", _noop) + monkeypatch.setattr(_gflight_ids.random, "random", _zero) + + +# ─────────────────────────── detection ───────────────────────────────── + + +def test_is_throttle_block_true_on_error_envelope() -> None: + assert _is_throttle_block(_BLOCK_BODY) + + +def test_is_throttle_block_false_on_empty_or_data() -> None: + assert not _is_throttle_block("") + assert not _is_throttle_block(')]}\'\n[["wrb.fr",null,"realpayloadhere"]]') + + +# ─────────────────────────── throttle retry ──────────────────────────── + + +def test_retry_recovers_from_transient_throttle(monkeypatch: pytest.MonkeyPatch) -> None: + calls = {"n": 0} + data: list[Any] = [object()] + + def fake(_f: Any) -> list[Any]: + calls["n"] += 1 + if calls["n"] <= 2: + raise GfThrottledError("throttled") + return data + + monkeypatch.setattr(_gflight_ids, "_one_call", fake) + assert _gflight_ids._one_call_with_retry(_FILTERS) is data + assert calls["n"] == 3 # two blocks (backoff+retry) then success + + +def test_retry_raises_when_throttle_persists(monkeypatch: pytest.MonkeyPatch) -> None: + calls = {"n": 0} + + def fake(_f: Any) -> list[Any]: + calls["n"] += 1 + raise GfThrottledError("throttled") + + monkeypatch.setattr(_gflight_ids, "_one_call", fake) + with pytest.raises(GfThrottledError): + _gflight_ids._one_call_with_retry(_FILTERS) + assert calls["n"] == _gflight_ids._THROTTLE_RETRY_ATTEMPTS + 1 + + +# ─────────────────────────── cold-session empty retry ────────────────── + + +def test_retry_returns_empty_after_cold_retries(monkeypatch: pytest.MonkeyPatch) -> None: + calls = {"n": 0} + + def fake(_f: Any) -> list[Any]: + calls["n"] += 1 + return [] + + monkeypatch.setattr(_gflight_ids, "_one_call", fake) + assert _gflight_ids._one_call_with_retry(_FILTERS) == [] + assert calls["n"] == _gflight_ids._EMPTY_RETRY_ATTEMPTS # bounded, no raise + + +def test_retry_recovers_from_cold_empty(monkeypatch: pytest.MonkeyPatch) -> None: + calls = {"n": 0} + data: list[Any] = [object()] + + def fake(_f: Any) -> list[Any]: + calls["n"] += 1 + return [] if calls["n"] == 1 else data + + monkeypatch.setattr(_gflight_ids, "_one_call", fake) + assert _gflight_ids._one_call_with_retry(_FILTERS) is data + assert calls["n"] == 2 + + +def test_throttle_and_empty_policies_are_independent(monkeypatch: pytest.MonkeyPatch) -> None: + # A throttle, then a cold empty, then data — both retry paths cooperate. + seq: list[Any] = ["throttle", [], [object()]] + calls = {"n": 0} + + def fake(_f: Any) -> list[Any]: + item = seq[calls["n"]] + calls["n"] += 1 + if item == "throttle": + raise GfThrottledError("throttled") + return item + + monkeypatch.setattr(_gflight_ids, "_one_call", fake) + out = _gflight_ids._one_call_with_retry(_FILTERS) + assert out == seq[2] + assert calls["n"] == 3 diff --git a/tests/test_routing_predicates.py b/tests/test_routing_predicates.py new file mode 100644 index 0000000..3ead7dd --- /dev/null +++ b/tests/test_routing_predicates.py @@ -0,0 +1,246 @@ +"""Tests for the routing-language / extension-code parser + tier classifier.""" + +from __future__ import annotations + +from flight_cli.routing_predicates import ( + AlliancePred, + CarrierPred, + ConnectionAirportPred, + ConnectTimePred, + ExcludeCodesharePred, + ExcludeOvernightsPred, + ExcludeRedeyesPred, + MaxDurationPred, + SpecificFlightPred, + StopsPred, + Tier, + UnsupportedPred, + classify, + parse_extension, + parse_routing, +) + +# ─────────────────────────── routing: carriers ───────────────────────── + + +def test_routing_carrier_include() -> None: + (p,) = parse_routing("LH+") + assert p == CarrierPred(frozenset({"LH"}), exclude=False, operating=False) + assert p.tier is Tier.GF_NATIVE + + +def test_routing_carrier_exclude_is_postfilter() -> None: + """Exclude has no GF allow-list knob (it'd need the route's carrier set to + complement), so it's honored as a reliable post-filter.""" + (p,) = parse_routing("~UA+") + assert p == CarrierPred(frozenset({"UA"}), exclude=True, operating=False) + assert p.tier is Tier.GF_POSTFILTER + + +def test_routing_operating_carrier_is_postfilter() -> None: + (p,) = parse_routing("O:LH+") + assert p == CarrierPred(frozenset({"LH"}), exclude=False, operating=True) + assert p.tier is Tier.GF_POSTFILTER + + +def test_routing_is_case_insensitive() -> None: + assert parse_routing("lh+") == parse_routing("LH+") + + +def test_routing_bare_carrier_without_quantifier_escalates() -> None: + """`LH` alone = exactly one direct LH segment — segment-count semantics GF + can't honor, so it goes to Matrix (use `LH+` for all-LH).""" + (p,) = parse_routing("LH") + assert isinstance(p, UnsupportedPred) + assert p.tier is Tier.MATRIX_ONLY + + +# ─────────────────────────── routing: stops / flights ────────────────── + + +def test_routing_nonstop() -> None: + assert parse_routing("N") == [StopsPred(max_stops=0)] + + +def test_routing_nonstop_on_carrier() -> None: + preds = parse_routing("N:UA") + assert StopsPred(max_stops=0) in preds + assert CarrierPred(frozenset({"UA"}), exclude=False, operating=False) in preds + + +def test_routing_specific_flight() -> None: + (p,) = parse_routing("UA882") + assert p == SpecificFlightPred(carrier="UA", low=882, high=882) + assert p.tier is Tier.GF_POSTFILTER + + +def test_routing_flight_range() -> None: + (p,) = parse_routing("UA1000-2000+") + assert p == SpecificFlightPred(carrier="UA", low=1000, high=2000) + + +def test_routing_flight_exclusion_escalates() -> None: + (p,) = parse_routing("~UA882+") + assert isinstance(p, UnsupportedPred) + + +# ─────────────────────────── routing: connection airports ────────────── + + +def test_routing_via_airport_idiom() -> None: + (p,) = parse_routing("F* X:LHR F*") + assert p == ConnectionAirportPred(frozenset({"LHR"}), exclude=False) + assert p.tier is Tier.GF_NATIVE + + +def test_routing_via_airport_alternatives() -> None: + (p,) = parse_routing("F* DFW,DEN F*") + assert p == ConnectionAirportPred(frozenset({"DFW", "DEN"}), exclude=False) + + +def test_routing_avoid_airport() -> None: + (p,) = parse_routing("F* ~DFW F*") + assert p == ConnectionAirportPred(frozenset({"DFW"}), exclude=True) + + +def test_routing_bare_single_airport_escalates() -> None: + """`X:DFW` alone = exactly one connection at DFW; GF can only do via-DFW + (any stops), a superset — so escalate rather than over-return.""" + (p,) = parse_routing("X:DFW") + assert isinstance(p, UnsupportedPred) + + +# ─────────────────────────── routing: escalation ─────────────────────── + + +def test_routing_ordered_carrier_chain_escalates() -> None: + (p,) = parse_routing("BA AA") + assert isinstance(p, UnsupportedPred) + assert p.tier is Tier.MATRIX_ONLY + + +def test_routing_ordered_airport_chain_escalates() -> None: + (p,) = parse_routing("DFW DEN") + assert isinstance(p, UnsupportedPred) + + +def test_routing_country_filter_escalates() -> None: + (p,) = parse_routing("~l:nUS+") + assert isinstance(p, UnsupportedPred) + + +def test_routing_flanked_carrier_escalates() -> None: + """`F* LH+ F*` means at-least-one-LH (not all-LH); GF airlines=LH would + under-return, so escalate.""" + (p,) = parse_routing("F* LH+ F*") + assert isinstance(p, UnsupportedPred) + + +def test_routing_empty_is_no_predicates() -> None: + assert parse_routing("") == [] + assert parse_routing(" ") == [] + + +# ─────────────────────────── extension codes ─────────────────────────── + + +def test_extension_maxstops() -> None: + assert parse_extension("MAXSTOPS 1") == [StopsPred(max_stops=1)] + + +def test_extension_maxdur_hhmm() -> None: + (p,) = parse_extension("MAXDUR 18:00") + assert p == MaxDurationPred(minutes=1080) + + +def test_extension_maxconnect_is_native_minconnect_is_postfilter() -> None: + (mx,) = parse_extension("MAXCONNECT 2:00") + assert mx == ConnectTimePred(min_minutes=None, max_minutes=120) + assert mx.tier is Tier.GF_NATIVE + (mn,) = parse_extension("MINCONNECT 1:30") + assert mn == ConnectTimePred(min_minutes=90, max_minutes=None) + assert mn.tier is Tier.GF_POSTFILTER + + +def test_extension_alliance() -> None: + (p,) = parse_extension("ALLIANCE star-alliance") + assert p == AlliancePred(codes=frozenset({"star-alliance"})) + assert p.tier is Tier.GF_NATIVE + + +def test_extension_alliance_multiple_and_unknown() -> None: + (ok,) = parse_extension("ALLIANCE oneworld|skyteam") + assert ok == AlliancePred(codes=frozenset({"oneworld", "skyteam"})) + (bad,) = parse_extension("ALLIANCE galactic") + assert isinstance(bad, UnsupportedPred) + + +def test_extension_airlines_include_exclude_operating() -> None: + assert parse_extension("AIRLINES BA AF") == [ + CarrierPred(frozenset({"BA", "AF"}), exclude=False, operating=False) + ] + assert parse_extension("-AIRLINES AA") == [ + CarrierPred(frozenset({"AA"}), exclude=True, operating=False) + ] + (op,) = parse_extension("OPAIRLINES UA") + assert op == CarrierPred(frozenset({"UA"}), exclude=False, operating=True) + assert op.tier is Tier.GF_POSTFILTER + + +def test_extension_cities_exclude() -> None: + (p,) = parse_extension("-CITIES DFW ORD") + assert p == ConnectionAirportPred(frozenset({"DFW", "ORD"}), exclude=True) + + +def test_extension_exclusion_flags() -> None: + assert parse_extension("-REDEYES") == [ExcludeRedeyesPred()] + assert parse_extension("-OVERNIGHTS") == [ExcludeOvernightsPred()] + assert parse_extension("-CODESHARE") == [ExcludeCodesharePred()] + + +def test_extension_fare_basis_and_mileage_are_matrix_only() -> None: + for code in ("F bc=y", "MAXMILES 8000", "PADCONNECT 0:30", "-NOFIRSTCLASS", "AIRCRAFT T:359"): + (p,) = parse_extension(code) + assert isinstance(p, UnsupportedPred), code + assert p.tier is Tier.MATRIX_ONLY + + +def test_extension_multiple_semicolon_separated() -> None: + preds = parse_extension("ALLIANCE star-alliance; -REDEYES; MAXSTOPS 1") + assert len(preds) == 3 + assert AlliancePred(codes=frozenset({"star-alliance"})) in preds + assert ExcludeRedeyesPred() in preds + assert StopsPred(max_stops=1) in preds + + +def test_extension_malformed_args_escalate() -> None: + (p,) = parse_extension("MAXSTOPS notanumber") + assert isinstance(p, UnsupportedPred) + + +# ─────────────────────────── classify + gate ─────────────────────────── + + +def test_classify_all_gf_expressible_does_not_require_matrix() -> None: + c = classify("LH+", "MAXSTOPS 1; -REDEYES; MAXCONNECT 2:00") + assert not c.requires_matrix + assert {p.tier for p in c.predicates} == {Tier.GF_NATIVE, Tier.GF_POSTFILTER} + + +def test_classify_fare_basis_requires_matrix() -> None: + c = classify("LH+", "F bc=y") + assert c.requires_matrix + assert c.matrix_reasons # carries a human-readable reason for the caveat + + +def test_classify_partitions_tiers() -> None: + c = classify("O:LH+", "AIRLINES BA AF; MINCONNECT 1:00") + assert any(isinstance(p, CarrierPred) and p.operating for p in c.tier2) + assert any(isinstance(p, CarrierPred) and not p.operating for p in c.tier1) + assert not c.requires_matrix + + +def test_classify_empty_is_empty() -> None: + c = classify(None, None) + assert c.predicates == () + assert not c.requires_matrix