Skip to content

Expand attribute value type to support complex values everywhere and cleanup surrounding code#5266

Open
DylanRussell wants to merge 36 commits into
open-telemetry:mainfrom
DylanRussell:extended_attributes
Open

Expand attribute value type to support complex values everywhere and cleanup surrounding code#5266
DylanRussell wants to merge 36 commits into
open-telemetry:mainfrom
DylanRussell:extended_attributes

Conversation

@DylanRussell

@DylanRussell DylanRussell commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Description

This PR updates the code to support complex values everywhere as described in this OTEP -- basically it adds support for None type, heterogeneous arrays, and maps as attribute values.

I cleaned up the surrounding code, including refactoring BoundedAttributes.. this was in part possible because we no longer have to support 2 sets of attribute values.

BoundedAttributes is used for span attributes (and span event attributes and span link attributes), log attributes, instrumentation scope attributes, and resource attributes, but i think it's completely optional for metric points -- all the various metric point classes just accept a generic attributes value, and I think it's up to instrumentations to decide on whether to use it or not.. I don't see any guidance requiring people do it, and I don't see any usages of BoundedAttributes in contrib.

For the bytes type I preserve the existing behavior by decoding it to a string via utf-8. If that fails, then I just pass it through as bytes, which is one of the new extended attribute fields.. It is valid to pass non-utf-8 bytes as an attribute value now.

I updated the encoding of None so that it is always encoded as an empty AnyValue.. I believe this is in-line with the spec: open-telemetry/opentelemetry-specification#4392 - previously we would do this for Arrays but not Maps and not when None is encoded as a primitive..

I added a warning log when we cast types to str() as a fallback (#4808 has some context on why we do that) -- I think instrumentations should be passing a well formed AnyValue type into set_attribute, we don't know the best way to cast these unknown types..

One thing that's missing from the spec is if leaf nodes should count towards the max attribute total count, I think that is the intent but it was not clearly documented (see #4587 and #4587) -- I can clarify and clean that up in a follow up PR though, as this one is already pretty big..

The failing public symbols check is due to the expanded attribute value, it's an additive change..

Type of change

Please delete options that are not relevant.

  • [ x] New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Unit tests

Does This PR Require a Contrib Repo Change?

Yes I had to submit open-telemetry/opentelemetry-python-contrib#4646 -- if someone attempts to use an old (anything less than the next release of) opentelemetry-instrumentation-logging with the next release of opentelemetry-api it will not work.. I'm confused about what to do to resolve that issue though..

Checklist:

  • [ x] Followed the style guidelines of this project
  • [ x] Changelogs have been updated
  • [ x] Unit tests have been added
  • [ x] Documentation has been updated

@DylanRussell DylanRussell requested a review from a team as a code owner June 1, 2026 20:00

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands OpenTelemetry Python’s attribute value model across the API, SDK, and OTLP exporters to support “complex” values everywhere (e.g., None, heterogeneous arrays, and maps), aligning behavior with the referenced OTEP and updating surrounding utilities/tests accordingly.

Changes:

  • Broadens AnyValue/AttributeValue/Attributes typing and updates public API surfaces (tracing/logs/events/metrics) to accept complex attribute values.
  • Refactors attribute cleaning/storage (BoundedAttributes) and metric attribute-keying (_hash_attributes) to handle complex values consistently.
  • Updates OTLP proto/json encoding to represent None as an empty AnyValue and adjusts unit tests/benchmarks for the new semantics.

Reviewed changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
opentelemetry-sdk/tests/trace/test_trace.py Updates span attribute/event tests for complex values and bytes behavior.
opentelemetry-sdk/tests/resources/test_resources.py Updates resource attribute validation tests for new cleaning/stringify behavior.
opentelemetry-sdk/tests/metrics/test_view_instrument_match.py Updates metrics tests to use _hash_attributes aggregation keys.
opentelemetry-sdk/tests/metrics/test_measurement_consumer.py Adjusts a concurrency test to pass attributes explicitly.
opentelemetry-sdk/src/opentelemetry/sdk/util/instrumentation.py Switches instrumentation scope attributes typing to Attributes.
opentelemetry-sdk/src/opentelemetry/sdk/resources/init.py Aligns resource attribute typing/imports with new Attributes definition.
opentelemetry-sdk/src/opentelemetry/sdk/metrics/_internal/_view_instrument_match.py Introduces _hash_attributes and deep-copies measurement attributes prior to aggregation.
opentelemetry-sdk/src/opentelemetry/sdk/environment_variables/init.py Clarifies docs for attribute value length limits with string/bytes focus.
opentelemetry-sdk/src/opentelemetry/sdk/_logs/_internal/_exceptions.py Updates exception-attribute merging to the unified Attributes type.
opentelemetry-sdk/src/opentelemetry/sdk/_logs/_internal/init.py Updates logging translation checks/warnings for AnyValue casting.
opentelemetry-sdk/src/opentelemetry/sdk/_events/init.py Updates event logger provider attribute typing to Attributes.
opentelemetry-sdk/benchmarks/metrics/test_benchmark_metrics_histogram.py Expands benchmarks to cover complex/mapping/array attribute shapes.
opentelemetry-api/tests/logs/test_proxy.py Updates logs proxy tests to Attributes.
opentelemetry-api/tests/events/test_proxy_event.py Updates events proxy tests to Attributes.
opentelemetry-api/tests/attributes/test_attributes.py Reworks attribute tests around new _clean_attribute_value/BoundedAttributes.
opentelemetry-api/src/opentelemetry/util/types.py Redefines AnyValue/AttributeValue and widens Attributes accordingly; removes _ExtendedAttributes.
opentelemetry-api/src/opentelemetry/trace/span.py Updates span API type hints to use collections.abc.Mapping and types.Attributes in NoOp implementation.
opentelemetry-api/src/opentelemetry/attributes/init.py Major refactor: new recursive cleaner and BoundedAttributes as a dict subclass.
opentelemetry-api/src/opentelemetry/_logs/_internal/init.py Updates log API typing from _ExtendedAttributes to Attributes.
opentelemetry-api/src/opentelemetry/_events/init.py Updates event API typing from _ExtendedAttributes to Attributes.
exporter/opentelemetry-exporter-otlp-proto-grpc/benchmarks/test_benchmark_trace_exporter.py Updates patch target to TraceServiceStub.
exporter/opentelemetry-exporter-otlp-proto-common/tests/test_log_encoder.py Refactors OTLP proto log encoder tests around new null/AnyValue behavior.
exporter/opentelemetry-exporter-otlp-proto-common/tests/test_attribute_encoder.py Adjusts encoder failure tests to use an unencodable object.
exporter/opentelemetry-exporter-otlp-proto-common/src/opentelemetry/exporter/otlp/proto/common/_internal/_log_encoder/init.py Stops using allow_null; encodes null as empty AnyValue.
exporter/opentelemetry-exporter-otlp-proto-common/src/opentelemetry/exporter/otlp/proto/common/_internal/init.py Simplifies _encode_value and _encode_attributes around null-as-empty-AnyValue.
exporter/opentelemetry-exporter-otlp-json-common/tests/test_log_encoder.py Updates JSON log encoder tests for null-as-empty-AnyValue body.
exporter/opentelemetry-exporter-otlp-json-common/tests/test_common_encoder.py Updates JSON common encoder tests for null and unencodable values.
exporter/opentelemetry-exporter-otlp-json-common/src/opentelemetry/exporter/otlp/json/common/_internal/_log_encoder/init.py Removes allow_null usage in JSON log encoding.
exporter/opentelemetry-exporter-otlp-json-common/src/opentelemetry/exporter/otlp/json/common/_internal/init.py Simplifies _encode_value and _encode_attributes around null-as-empty-AnyValue.
docs/conf.py Updates Sphinx type-hint resolution aliasing; now also touches metrics internals.
.changelog/5266.added Adds changelog entry for extended/complex attribute support across packages.
Comments suppressed due to low confidence (1)

opentelemetry-api/src/opentelemetry/trace/span.py:96

  • The note in Span.set_attributes says the behavior of None value attributes is undefined, but this PR explicitly adds support for None/null attribute values across the API/SDK. Please update the docstring to reflect the new supported semantics (even if None remains discouraged).
    def set_attributes(
        self, attributes: Mapping[str, types.AttributeValue]
    ) -> None:
        """Sets Attributes.

        Sets Attributes with the key and value passed as arguments dict.

        Note: The behavior of `None` value attributes is undefined, and hence
        strongly discouraged. It is also preferred to set attributes at span
        creation, instead of calling this method later since samplers can only
        consider information already present during span creation.
        """

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread opentelemetry-api/src/opentelemetry/attributes/__init__.py Outdated
Comment thread opentelemetry-api/src/opentelemetry/attributes/__init__.py Outdated
Comment thread opentelemetry-api/src/opentelemetry/attributes/__init__.py
Comment on lines +107 to +111
attributes = {}
if measurement.attributes:
# Deepcopy to prevent the user from modifying the attributes after the
# measurement has been consumed.
attributes = copy.deepcopy(measurement.attributes)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using BoundAttributes for metrics appears to be completely optional, so we can't rely on that to ensure attribute values (and the attributes Map as a whole) are immutable

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attributes is always just a map which can be mutated.... I don't see how a shallow copy is better than a deep copy

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do share this concern though.. We would save a lot of copying by just using the mutable types passed through

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't this code preexisting anyway?

@DylanRussell DylanRussell Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cleaned up this code a tiny bit..

I'm pretty sure this was the intent of the previous code (because there was a simple test that failed) which was:

 elif measurement.attributes is not None:
            attributes = dict(measurement.attributes)

But that doesn't copy lists/dicts

Comment thread docs/conf.py Outdated
Comment thread opentelemetry-sdk/src/opentelemetry/sdk/_logs/_internal/__init__.py

@MikeGoldsmith MikeGoldsmith left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. I am concerned this is a fairly big change so we need to make sure it's done safely.

Comment thread opentelemetry-api/src/opentelemetry/util/types.py
Comment thread opentelemetry-api/src/opentelemetry/attributes/__init__.py Outdated
Comment thread opentelemetry-api/src/opentelemetry/attributes/__init__.py Outdated
Comment thread opentelemetry-api/src/opentelemetry/util/types.py Outdated
Comment thread docs/conf.py Outdated

# Provide AnyValue in opentelemetry.attributes module's namespace so the
# "AnyValue" forward reference in opentelemetry.util.types._ExtendedAttributes
# TODO: Is there a better way to do this ? I have to do this re-import thing

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this comment about, can we remove it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raajheshkannaa do you know a better way to resolve this issue ? I tried copying this approach but I'm still getting failures and it's a pain

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That block is the workaround I added in #5017, expanded. AnyValue is a recursive alias, so it carries the string "AnyValue", and get_type_hints() evaluates that against the globals of whichever module defined each documented object. Most of those modules import it only under TYPE_CHECKING, so the name is missing at docs-build time and you get a NameError. The chained assignment patches each of those namespaces by hand.

You can collapse the whole thing. eval() falls back to builtins after module globals, so one binding covers every module and the per-module list (and the TODO) goes away:

# Docs-only: resolves the "AnyValue" forward reference wherever
# get_type_hints() evaluates it. AnyValue is a recursive alias, so
# modules that import it only under TYPE_CHECKING hit a NameError.
import builtins

from opentelemetry.util.types import AnyValue

builtins.AnyValue = AnyValue

I built your branch head with this swapped in, Python 3.11, pinned sphinx-autodoc-typehints==1.25.2, sphinx-build -W: passes clean, same as your current version. So this is a drop-in simplification, not a behavior change. It's also less brittle, since it doesn't depend on enumerating which module defines each object (the _internal ones are easy to miss).

The properly clean fix is importing AnyValue at runtime in the modules that annotate with it, but that touches a lot of files and is better as a follow-up.



_logger = logging.getLogger(__name__)
class _DuplicateFilter(logging.Filter):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like duplicating this duplicate filter here, and having to add it to the logger here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just leave it out ? These logs could be noisy. I can't import the other DuplicateFilter without taking a dependency on opentelemetry-sdk. There isn't a shared code package I can use either.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be more in favor of leaving it out, the 60 second deduplication window seems arbitrary here too. Furthermore, for these logs in particular is there even a possibility of creating a recursive loop with a log exporter?

"""
if isinstance(value, (type(None), bool, int, float)):
return value
if isinstance(value, bytes):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this what the spec says should happen? In my view, if bytes are passed in, we should leave them as-is.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec doesn't say to do it. The current code converts bytes to str this way though, if we change it could break people.

I suppose an alternative is to do it like this temporarily (and log a warning), saying that in the future we will leave the field as bytes and then switch over after a few releases.. WDYT?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer we just implement it exactly as the spec expects. Yes, it would technically be a breaking change, but the original behavior can be classified as a bug.

# converting to string raises.
) != _InvalidAttributeValue.INVALID_VALUE:
cleaned_mapping[key] = cleaned_value
return MappingProxyType(cleaned_mapping)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is MappingProxyType needed here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's needed TBH.. It's immutable.. Since Mappings/Sequences can be mutable, do we want to allow people to mutate these outside of __setitem__ and have the attribute value be mutated? My thinking was no because that bypasses the validation / cleaning of attributes.. But the user would have to reach into this data structure and get the Mapping/Sequence and then mutate it, and if they do that maybe it's intentional.. This also introduces some overhead obviously because we make a copy.. WDYT?

Comment on lines +114 to +117
try:
return str(value)
except Exception:
raise TypeError(
f"Invalid type {type(value).__name__} for attribute value. "
f"Expected one of {[_type_name(valid_type) for valid_type in _VALID_ANY_VALUE_TYPES]} or a "
"sequence of those types",
)


def _clean_extended_attribute(
key: str, value: types.AnyValue, max_len: int | None
) -> types.AnyValue:
"""Checks if attribute value is valid and cleans it if required.

The function returns the cleaned value or None if the value is not valid.
except Exception: # pylint: disable=broad-exception-caught
return _InvalidAttributeValue.INVALID_VALUE

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec states that if we can't convert to string, we should fallback to an empty any value

If the source data cannot be serialized neither to a string nor to a byte sequence then it SHOULD by converted to an empty AnyValue.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so in accordance with that I switched the logic to fallback to None which will ultimately get converted to an empty AnyValue..

type(value),
)
try:
return str(value)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably define a helper type here, and use an isinstance() check before serializing.

@runtime_checkable
class Stringable(Protocol):
    def __str__(self) -> str: ...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That didn't work.. a float for example doesn't have it's own __str__ method, but can be casted to a string somehow.. I think the default object str must know how to produce a valid string

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't float be handled earlier up though, this is the fallback case right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked into this some more.. if a method doesn't have a __str__, Python will then use the objects __repr__ method (this is what happens with float).. If a method doesn't have a __repr__ either, it then does a fallback to object types __repr__ method which returns a string like __main__.obj at memory address 10289585859.. so it's pretty much guaranteed we get a string when calling str. That being said the default object __repr__ string is not great, so I'm going to update the code to look for a custom __str__/__repr__ method and drop the value if that isn't there

Comment thread opentelemetry-api/src/opentelemetry/util/types.py Outdated
# For more background, see: https://github.com/open-telemetry/opentelemetry-python/pull/4216
if not record.args and not isinstance(record.msg, str):
# if record.msg is not a value we can export, cast it to string
if not isinstance(record.msg, _VALID_ANY_VALUE_TYPES):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we changing the behavior of logging bodies here? I'd prefer to scope the changes to just attributes.

if isinstance(value, Sequence):
return tuple(_hash_attributes(v) for v in value)
if isinstance(value, Mapping):
return tuple((k, _hash_attributes(value[k])) for k in sorted(value))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we handle the case where value is not comparable?

_encode_key_value(str(k), v, allow_null=allow_null)
for k, v in value.items()
]
values=[_encode_key_value(str(k), v) for k, v in value.items()]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on your other comment, we shouldn't massage any values here right ??

So a custom processor can send bad data here.. In this PR we are cleaning attribute values in the SDK, but obviously that isn't the only path that can feed into the OTLP exporters.. It is the only path we have strict control over...

@herin049 herin049 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this is a very significant change impacting all signal types, I'd recommend breaking this into 3 PRs:

  1. Encoding logic for converting from the new AnyValue to Protobuf/ProtoJSON object with extensive testing.
  2. Add support for logs/traces
  3. Add support for metrics (optional)

Note that the original OTEP mentions that SDKs MAY perform 3. so it's best to do this last.

@github-project-automation github-project-automation Bot moved this to Reviewed PRs that need fixes in Python PR digest Jun 10, 2026
@DylanRussell

Copy link
Copy Markdown
Contributor Author

I'm open to breaking up the PR a bit and taking some stuff out, but i'd like to remove the old way of encoding attributes in the same PR as we add the new way, and move all attributes to the new way..

@herin049 herin049 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI it might be beneficial to also look over #4587



_logger = logging.getLogger(__name__)
class _DuplicateFilter(logging.Filter):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be more in favor of leaving it out, the 60 second deduplication window seems arbitrary here too. Furthermore, for these logs in particular is there even a possibility of creating a recursive loop with a log exporter?

"""
if isinstance(value, (type(None), bool, int, float)):
return value
if isinstance(value, bytes):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer we just implement it exactly as the spec expects. Yes, it would technically be a breaking change, but the original behavior can be classified as a bug.

return None
if max_len is not None and isinstance(value, str):
value = value[:max_len]
if isinstance(value, (type(None), bool, int, float)):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
if isinstance(value, (type(None), bool, int, float)):
if isinstance(value, (NoneType, bool, int, float)):

@DylanRussell

Copy link
Copy Markdown
Contributor Author

Yeah I have read through #4587 already..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Reviewed PRs that need fixes

Development

Successfully merging this pull request may close these issues.

6 participants