feat: OTel context sharing#3970
Conversation
|
Benchmarks [ profiler ]Benchmark execution time: 2026-06-30 21:14:11 Comparing candidate commit 110757e in PR branch Found 0 performance improvements and 3 performance regressions! Performance is the same for 23 metrics, 10 unstable metrics.
|
Benchmarks [ tracer ]Benchmark execution time: 2026-06-30 22:15:16 Comparing candidate commit 110757e in PR branch Found 0 performance improvements and 10 performance regressions! Performance is the same for 184 metrics, 0 unstable metrics.
|
…ntext # Conflicts: # Cargo.lock # Makefile
…trace-php into levi/otel-thread-context
Benchmarks [ appsec ]Benchmark execution time: 2026-06-30 21:38:45 Comparing candidate commit 110757e in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 12 metrics, 0 unstable metrics.
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 259658d91b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| span->root = DDTRACE_G(active_stack)->root_span; | ||
|
|
||
| ddtrace_set_global_span_properties(span); | ||
| ddtrace_update_otel_thread_context(); |
There was a problem hiding this comment.
Refresh OTel context after trace ID changes
Updating the TLS record only on span open/close/switch leaves it stale when an existing root span's trace ID is replaced later, e.g. by DDTrace\consume_distributed_tracing_headers() or DDTrace\set_distributed_tracing_context(), which mutate root_span->trace_id and only call ddtrace_update_root_id_properties(). In requests that consume distributed headers after the autoroot span already exists and do not open another span, OTel readers keep seeing the originally generated trace ID until the next span transition.
Useful? React with 👍 / 👎.
| listener->finish_user_req(listener, &span->std); | ||
| } | ||
| #ifdef __linux__ | ||
| ddtrace_clear_otel_thread_context_root_span(); |
There was a problem hiding this comment.
Only clear the matching user-request override
This clears the global OTel root override regardless of which user-request span is finishing. If a second notify_start() begins before the first span is closed, the AppSec listener finishes/replaces the previous request but the first span still has notify_user_req_end; closing that older span later calls this path and detaches the override for the newer active user request, so subsequent OTel/profiling context falls back to the outer stack.
Useful? React with 👍 / 👎.
bwoebi
left a comment
There was a problem hiding this comment.
I think the approach proposed by this PR is flawed / more expensive than needed.
The PR tries to synchronize state between otel context and root span.
The otel context is, at it's core, a pointer to some struct.
All we need is making the otel context part of the root_span_data struct.
Any updates to the trace_id / the roots span id -> update the local root span id / trace id part of the context, any updates to active -> update the span_id of the context.
And if there's no root span data, we just write NULL to the TLS variable.
I'm very not fond of "synchronizing" with otel context, any updates to the otel context should in my opinion simply happen in-place. There's also no reason for a "root span override".
It feels also way heavier than it should be. Updating a span id due to open/close/drop should be one 64 bit write. That's it.
Switching the span stack should cause one pointer write of the span stacks root span otel context to the TLS pointer. That's it, too.
Further, this PR at the very least forgot to handle ddtrace_update_root_id_properties().
|
I also don't understand why this functionality depends on libdatadog. The code should be IMHO simpler and less LoC if we just manage this manually. |
The OTel thread-context record is now owned and maintained by the PHP tracer itself rather than going through libdatadog's OTel thread-context record/update helpers. The tracer keeps its own local record and drives all updates through granular helpers. Storage and layout: - Store the record in ddtrace_root_span_data, placed before the embedded ddtrace_span_data, because of requirements of Zend object layout. - Move the record to sit after sampling_rule to reduce struct padding in the root prefix. - Make the record naturally 8-byte aligned: trace_id is stored as uint64_t[2], while span_id and valid are C11 atomics. Compile-time assertions guard record size, field offsets, and root placement. Update paths: - Drive context updates through granular helpers: stack attach, trace-id update, active span-id update, root-scoped attributes, and detach. - Keep root-specific detach-if-current behavior: freeing a root span clears the TLS pointer only when it currently points at that root's embedded record. Stack, fiber, close, drop, trace-id, and attribute mutation paths call the granular API directly. Tests and dependencies: - Update the affected appsec integration tests. - Restore the libdatadog gitlink and Cargo.lock to the merge-base / origin-master state, since this implementation no longer depends on the upstream libdatadog OTel thread-context record/update APIs.
ce8e5c8 to
158be36
Compare
|
The failure in dd-gitlab/check-big-regressions is a direct consequence of this change (7 root spans x 1000 revs x 640 bytes = 4 480 000 bytes = 4.49 MB). This is by design |
| ddtrace_span_stack *stack = DDTRACE_G(active_stack); | ||
| if (stack && stack->root_span) { | ||
| ddtrace_otel_update_attribute_values(stack->root_span); | ||
| } |
There was a problem hiding this comment.
extract that snippet into dd_alter_prop, it's only used for service/env/version anyway.
| } | ||
|
|
||
| // Match tracer/serializer.c hostname publishing: DD_HOSTNAME wins, then gethostname(). | ||
| #ifdef __linux__ |
There was a problem hiding this comment.
Did you mean to put this #ifdef to the top of the function? There's no point in half the function.
There was a problem hiding this comment.
the idea was to call gethostname only on linux, but i've now matched what's tracer/serializer.c, let's hope it doesn't blow up on windows where the function is apparently slightly different and depends on a separate header
|
A call to datadog_publish_otel_process_context() is missing in |
139376b to
8b1c9f4
Compare
5b6fa67 to
7b3e832
Compare
|
Dep on libdatadog: DataDog/libdatadog#2176 |
| bool runtime_id_changed = false; | ||
| datadog_sidecar_handle_fork(&runtime_id_changed); | ||
| if (runtime_id_changed) { | ||
| datadog_publish_configured_otel_process_context(); | ||
| } |
There was a problem hiding this comment.
You need to do it unconditionally, the libdatadog code uses libc::MFD_CLOEXEC.
There was a problem hiding this comment.
If you want to make it dependent on datadog_sidecar_instance_id (not sure why you would), please check this in datadog_publish_configured_otel_process_context, not here.
| } | ||
|
|
||
| *runtime_id_changed = datadog_sidecar_instance_id != NULL; | ||
| datadog_force_new_instance_id(); |
There was a problem hiding this comment.
Feel free to move this to the start of the function so that it happens unconditionally.
|
FYI: there's a bug in the libdatadog implementation of publishing the threadlocal attributes to the process context, where thread-context-related attributes are stored in the wrong place. DataDog/libdatadog#2167 fixes it and is being merged, but until you can depend on a version that includes it, an external context reader might not be able to properly resolve the thread context records. |
Description
This adds OTel context sharing using libdatadog. It adds it to both tracing (producer) and profiling (consumer).
This is mostly AI-written with a little human review as I went. It is not ready for merge, but it did pass some local end-to-end experimentation.
Reviewer checklist