Skip to content

Flow PHP - SEAL Adapter #2408

Merged
norberttech merged 14 commits into
flow-php:1.xfrom
MrHDOLEK:2168-proposal-replace-meilisearch-elasticsearch-with-seal
Jun 17, 2026
Merged

Flow PHP - SEAL Adapter #2408
norberttech merged 14 commits into
flow-php:1.xfrom
MrHDOLEK:2168-proposal-replace-meilisearch-elasticsearch-with-seal

Conversation

@MrHDOLEK

@MrHDOLEK MrHDOLEK commented May 31, 2026

Copy link
Copy Markdown
Contributor

The Elasticsearch and Meilisearch adapters are replaced by a single SEAL adapter
(flow-php/etl-adapter-seal) built on top of SEAL
— a PHP Search Engine Abstraction Layer. One adapter now covers Elasticsearch, OpenSearch,
Meilisearch, Algolia, Solr, Typesense, RediSearch and Loupe: the user builds a
CmsIg\Seal\EngineInterface and passes it to the DSL, exactly like the PostgreSQL/Doctrine
adapters take a client. Tests are backend-agnostic (Memory adapter) and prove every Flow
Entry type survives a round-trip through the adapter.

Resolves: #2168

Change Log


Added

  • SEAL adapter (flow-php/etl-adapter-seal) — a single, strongly typed integration for SEAL search engines (Elasticsearch, OpenSearch, Meilisearch, Algolia, Solr, Typesense, RediSearch, Loupe)
  • from_seal() extractor and to_seal() loader, working with any SEAL EngineInterface
  • to_seal_schema() and seal_schema_to_flow() DSL — recursive, bi-directional Flow Schema ↔ SEAL Schema conversion (nested structures, lists and maps)
  • seal_create_index(), seal_drop_index(), seal_create_schema() and seal_drop_schema() DSL index-lifecycle helpers

Fixed

Changed

Removed

  • Elasticsearch adapter (flow-php/etl-adapter-elasticsearch) — superseded by the SEAL adapter; use SEAL with the Elasticsearch backend (cmsig/seal-elasticsearch-adapter)

Deprecated

Security

@codecov

codecov Bot commented May 31, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 85.42714% with 29 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.12%. Comparing base (5408caf) to head (0f4398e).
⚠️ Report is 1 commits behind head on 1.x.

Additional details and impacted files
@@             Coverage Diff              @@
##                1.x    #2408      +/-   ##
============================================
- Coverage     85.15%   85.12%   -0.03%     
- Complexity    21207    21217      +10     
============================================
  Files          1601     1598       -3     
  Lines         65488    65485       -3     
============================================
- Hits          55764    55747      -17     
- Misses         9724     9738      +14     
Components Coverage Δ
etl 88.41% <ø> (ø)
cli 89.40% <ø> (ø)
lib-array-dot 81.44% <ø> (ø)
lib-azure-sdk 64.44% <ø> (ø)
lib-doctrine-dbal-bulk 93.61% <ø> (ø)
lib-filesystem 85.03% <ø> (ø)
lib-types 90.06% <ø> (ø)
lib-parquet 70.10% <ø> (ø)
lib-parquet-viewer 82.26% <ø> (ø)
lib-snappy 89.38% <ø> (-0.45%) ⬇️
lib-dremel 0.00% <ø> (ø)
lib-postgresql 88.59% <ø> (ø)
lib-telemetry 85.95% <ø> (ø)
bridge-filesystem-async-aws 92.74% <ø> (ø)
bridge-filesystem-azure 90.45% <ø> (ø)
bridge-monolog-http 96.82% <ø> (ø)
bridge-monolog-telemetry 94.11% <ø> (ø)
bridge-openapi-specification 92.07% <ø> (ø)
symfony-http-foundation 78.57% <ø> (ø)
bridge-psr18-telemetry 100.00% <ø> (ø)
bridge-psr3-telemetry 97.84% <ø> (ø)
bridge-psr7-telemetry 100.00% <ø> (ø)
bridge-telemetry-otlp 89.89% <ø> (ø)
bridge-symfony-http-foundation-telemetry 89.47% <ø> (ø)
bridge-symfony-filesystem-bundle 90.66% <ø> (ø)
bridge-symfony-filesystem-cache 98.14% <ø> (ø)
bridge-symfony-postgresql-bundle 93.83% <ø> (ø)
bridge-symfony-postgresql-cache 94.41% <ø> (ø)
bridge-symfony-postgresql-messenger 98.80% <ø> (ø)
bridge-symfony-postgresql-session 93.65% <ø> (ø)
bridge-symfony-telemetry-bundle 80.80% <ø> (ø)
adapter-chartjs 84.05% <ø> (ø)
adapter-csv 91.16% <ø> (ø)
adapter-doctrine 90.79% <ø> (ø)
adapter-google-sheet 99.18% <ø> (ø)
adapter-http 72.34% <ø> (ø)
adapter-json 88.63% <ø> (ø)
adapter-logger 50.00% <ø> (ø)
adapter-parquet 77.70% <ø> (ø)
adapter-text 74.13% <ø> (ø)
adapter-xml 83.40% <ø> (ø)
adapter-avro 0.00% <ø> (ø)
adapter-excel 94.21% <ø> (ø)
adapter-postgresql 91.11% <ø> (ø)
adapter-seal 85.42% <85.42%> (∅)
bridge-phpunit-postgresql 75.30% <ø> (ø)
bridge-phpunit-telemetry 80.08% <ø> (ø)
bridge-phpstan-types 0.00% <ø> (ø)
bridge-postgresql-valinor 100.00% <ø> (ø)
🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@norberttech

Copy link
Copy Markdown
Member

@MrHDOLEK as we discussed offline, please find some feedback and guidance below.

Extractor / Loader

In general I like the idea of having one extractor that accepts Seal EngineInterface which is aligned with how Doctrine Adapter works right now. Same goes for the loaders.

It would also work very well with frameworks as seal provides a bundle for symfony that literally configures engine so to use it with flow it would be a simple matter of dependency injection.

What to test

I had to think about it a bit more but I dont think we should retest every backend that seal is testing. Doctrine adapter is testing MySQL, PostgreSql nad Sqlite because flow-php/docrtine-dbal-bulk which extends doctrine default behaviors that needs to be tested.

At this time seal provides following adapters:

I would chose one of them (that requires the least configuration and is the easiest to setup in docker) and cover it with proper integration tests that would confirm that moving data from flow to seal works as expected.
Heck you can even use MemoryAdapter in tests unless other adapters have some features that requires some custom configuration.
The only thing you might want to research first is are there any differences between seal adapters. Flow needs to be flexible enough to use all of them, so if one requires something custom, flow extractor/loader needs to be able to pass it through (but I dont think thats the case).

How to test

So in general we want to cover in tests that seal adapter can handle all types of flow Entries. You can achieve that by using one of the two extractors:

They both expose static method schema() : Schema but here is where the tricky part starts. I believe you need to be able to first create an index in tests.
And that brings us to a missing piece of this PR - SchemaConverter.

SchemaConverter

Schema converters are recursive algorithms that can convert schema in both directiosn Flow to Seal and Seal to Flow. You can find some inspirations here:

It might be the easiest to use LLM to help you create one based on those 3 examples for Seal (it's a recursive brain damaging exercise that might not be worth spending time on).

Of course schema converters would need a DSL method.

Search Engine in Tests

So in this PR you are using traits to configure backends in tests, which is fine but there is a different pattern which I found cleaner and easier to maintain.

Contexts.

Here is a good example of DatabaseContext that is used in DatabaseTableListCommandTest

If there will be more than one integration tests you can extract an abstract SealTestCase extends FlowTestCase and setup SealContext in the setUp method making it available through sealContext() : SealContext method.

In case of any questions, you know where to find me 😁

@MrHDOLEK MrHDOLEK marked this pull request as ready for review June 9, 2026 21:00
@MrHDOLEK MrHDOLEK requested a review from norberttech as a code owner June 9, 2026 21:00
Comment thread documentation/upgrading.md Outdated
Comment thread phpunit.xml.dist Outdated
Comment thread src/adapter/etl-adapter-seal/.github/workflows/readonly.yaml Outdated
Comment thread src/adapter/etl-adapter-seal/src/Flow/ETL/Adapter/Seal/SealExtractor.php Outdated

@norberttech norberttech left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much better! I left some comments, nothing critical, but there is one gap related to Delete operation.

Please let me know if you have any questions about those comments!

Comment thread src/adapter/etl-adapter-seal/tests/Flow/ETL/Adapter/Seal/Tests/SealTestCase.php Outdated
Comment thread documentation/components/adapters/seal.md Outdated
Comment thread src/adapter/etl-adapter-seal/src/Flow/ETL/Adapter/Seal/functions.php Outdated
Comment thread src/adapter/etl-adapter-seal/src/Flow/ETL/Adapter/Seal/SealLoader.php Outdated
Comment thread compose.yml.dist
Comment thread src/adapter/etl-adapter-seal/src/Flow/ETL/Adapter/Seal/SealExtractor.php Outdated
@MrHDOLEK MrHDOLEK force-pushed the 2168-proposal-replace-meilisearch-elasticsearch-with-seal branch from 0cd33ec to 7bcf17c Compare June 16, 2026 06:11

@norberttech norberttech left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome @MrHDOLEK !

If you feel that work is completed, and this adapter is now usable I'm happy to merge it 🚀

@norberttech norberttech changed the title #2168 - proposal replace meilisearch elasticsearch with seal Flow PHP - SEAL Adapter Jun 16, 2026
@norberttech norberttech added this to the 0.40.0 milestone Jun 16, 2026
@norberttech norberttech moved this from Todo to In Progress in Roadmap Jun 16, 2026
@MrHDOLEK

Copy link
Copy Markdown
Contributor Author

@norberttech I found an issue while testing with the 20k dataset.

SEAL cannot read data beyond 10K from elastic because it only supports offset-based pagination, and Elasticsearch blocks offsets outside the 10K result window. To work around this problem, I added cursor-based pagination.

@norberttech

Copy link
Copy Markdown
Member

SEAL cannot read data beyond 10K from elastic because it only supports offset-based pagination, and Elasticsearch blocks offsets outside the 10K result window. To work around this problem, I added cursor-based pagination.

That's excellent finding!

Does other engines also comes with such limitations? Is this keyset pagination going to work only with Elastic or all seal adapters?
When someone tries to read above 10k values without keyset, is the exception error going to point him to the keyset? Are docs updated?
What is the cost of moving to such pagination? Is it really keyset or maybe a cursor pagination? Are there other types of pagination supported?

@alexander-schranz if by any chance you could take a look at this integration, your feedback would be greatly appreciated 🙏

Comment on lines +116 to +122
$search = $this->engine->createSearchBuilder($this->index);

if ($this->searchBuilder !== null) {
($this->searchBuilder)($search);
}

$search->addSortBy($cursorField, $direction->value)->limit($this->pageSize)->offset(0);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be out of the loop


foreach ($search->getResult() as $document) {
$documents[] = $document;
$cursor = $this->cursorValue($document, $cursorField);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this value be only check when count is equal page size? Cause otherwise it will not be used anyway yet it will check 999 times value with page limit of 1k and only 999 results, no?

@MrHDOLEK

Copy link
Copy Markdown
Contributor Author

SEAL cannot read data beyond 10K from elastic because it only supports offset-based pagination, and Elasticsearch blocks offsets outside the 10K result window. To work around this problem, I added cursor-based pagination.

That's excellent finding!

Does other engines also comes with such limitations? Is this keyset pagination going to work only with Elastic or all seal adapters? When someone tries to read above 10k values without keyset, is the exception error going to point him to the keyset? Are docs updated? What is the cost of moving to such pagination? Is it really keyset or maybe a cursor pagination? Are there other types of pagination supported?

@alexander-schranz if by any chance you could take a look at this integration, your feedback would be greatly appreciated 🙏

Algolia — 1 000 (paginationLimitedTo, max 20k)
Meilisearch — 1 000 (maxTotalHits)
Typesense — max_per_page 250 + limit_hits ~500
2. A raw Elasticsearch error is returned. There is currently no error handling (catch) for this exception on the SEAL side.
3. I would not worry about performance here. The real problem is its pagination limitations.
4. This could be solved by adding some primitive scroller in SEAL.

@norberttech

Copy link
Copy Markdown
Member
  1. This could be solved by adding some primitive scroller in SEAL.

In general pagination is a complex problem, especially over large datasets.
There are 3 patterns I'm using in PostgreSql and Doctrine adapters.

Limit Offset Pagination - usually works up to 5-10k rows
Cursor Pagination - you need to understand cursors first as they might expire while you are paginating and they might highly impact the performance (cursors should be configurable)
KeySet Pagination - my usual default but storage must support it.

I guess the biggest question is if SEAL would let us now build something that would leverage KeySet pagination without any changes in SEAL or maybe some adjustments are first required there?

@alexander-schranz

alexander-schranz commented Jun 16, 2026

Copy link
Copy Markdown

@norberttech I can just give a quick response here without having a deeper look at the Pull Request yet. Limitations on Elasticsearch can be controlled by Elasticsearch settings, so if somebody want to fetch more they can increase it on Elasticsearch instance config. There are some discussions to allow configure more settings via DSN params, but that will take a longer time.

While there is an issue around adding a Scroller: PHP-CMSIG/search#535 it won't solve the Elasticsearch 10k issue, as @MrHDOLEK mention the offset is blocked by Elastic and the Scroller issue is designed that it not need be implemented per Adapter = uses Limit / Offset like Scrolling. Maybe something we need to rethink.

KeySet Pagination

Are you in control of the Schema and the Data? If I understand the KeySet Pagination correctly you would require additional field or a value which increases. If document id is integer maybe sort by id, else a datetime field or something which can be sorted by and is possible unique or atleast not exist more as 250 (typesense default limit).

@norberttech How are you handling KeySet Pagination in doctrine if the identifier are not auto incremented like uuids?

@norberttech

norberttech commented Jun 16, 2026

Copy link
Copy Markdown
Member

@norberttech I can just give a quick response here without having a deeper look at the Pull Request yet.

Thank you!

Limitations on Elasticsearch can be controlled by Elasticsearch settings,

That's unfortunately not acceptable solution in data processing world, it does not scale and it's also not very complex to hit that limit even when the scale is not massive + there is a limit of how much you can increase those limits up to. In fact, 10k rows is actually a micro dataset, even increasing to 100k - this is nothing.

@norberttech How are you handling KeySet Pagination in doctrine if the identifier are not auto incremented like uuids?

Yes, KeySet pagination requires something you can sort by, it can be either auto-incremented id, Uuid V7, additional timestamp column.
I don't think I ever had to deal with a dataset that wouldn't have at least one of those, and even if not, adding something as small as createdAt timestamp solves the problem usually.

Are you in control of the Schema and the Data?

Of course, if you don't have a control over it, you can't build an reliable ETL pipeline.

@norberttech

Copy link
Copy Markdown
Member

I took a walk and tried to think about it from a different perspective :D

Instead of trying to solve pagination problem, maybe we should abandon the idea of having SEALExtractor in the first place?

ETL - stands for Extract / Transform / Load - but what use case one could have to Extract entire index?

SearchEngines are rarely a source where data lives (and if they do, it's a bigger architectural problem), so streaming data from them (as this is what we Extractors are doing).

So how about we keep SealLoader which is anyway has only real use case and strong justification I can think about now, drop the idea of SealExtractor and only tackle this once we get a real and solid use case for it?

@stloyd @MrHDOLEK - what do you guys think about it?

@norberttech norberttech modified the milestones: 0.40.0, 0.41.0 Jun 16, 2026
@alexander-schranz

alexander-schranz commented Jun 17, 2026

Copy link
Copy Markdown

Yes, KeySet pagination requires something you can sort by, it can be either auto-incremented id, Uuid V7, additional timestamp column.

Do you configure in Flow Doctrine which column is used then for the KeySet pagination? If you do that, you can do the same for the SEAL Extractor it would work atleast for integer fields and datetime fields. For string fields the support is differently Meilisearch, Algolia has no support for lexicalgraphical string comparison > 'ABC' so uuid would not work.

So code for such case would look like:

$indexName = 'blog'; // From Flow Config
$keySetPaginationField = 'createdAt'; // From Flow Config
$keySetValue = 0; // From Flow Config; ?

do {
    $result = $this->engine->createSearchBuilder($indexName)
        ->addFilter(Condition::greaterThan($keySetPaginationField, $keySetValue))
        ->addSortBy($keySetPaginationField, 'asc')
        ->setLimit(100)
        ->getResult();
    
    foreach ($result as $document) {
        $keySetValue = $document[$keySetValue]
    
        yield $document;
    }
} while (\count($result) > 0);

This way you should never hit the limits by the search engines.

@norberttech

Copy link
Copy Markdown
Member

So code for such case would look like:

$indexName = 'blog'; // From Flow Config
$keySetPaginationField = 'createdAt'; // From Flow Config
$keySetValue = 0; // From Flow Config; ?

do {
    $result = $this->engine->createSearchBuilder($indexName)
        ->addFilter(Condition::greaterThan($keySetPaginationField, $keySetValue))
        ->addSortBy($keySetPaginationField, 'asc')
        ->setLimit(100)
        ->getResult();
    
    foreach ($result as $document) {
        $keySetValue = $document[$keySetValue]
    
        yield $document;
    }
} while (\count($result) > 0);

This way you should never hit the limits by the search engines.

Yes, that should work just fine! In postgresql/doctrine we do something like this:

<?php

df()
    ->read(from_pgsql_key_set(
        $client,
        "SELECT id, name, email FROM users",
        pgsql_pagination_key_set(pgsql_pagination_key_asc('id')),
        pageSize: 1000
    ))
    ->write([to_output](https://flow-php.com/documentation/dsl/core/to-output/#dsl-function)())
    ->run();

So the extractor expects a pagination keyset pgsql_pagination_key_set() which expects a list of columns with order pgsql_pagination_key_asc('id')

But after some internal conversations we decided to put a side for now Extractor until someone will come with a legit use case for it. This should reduce also a maintenance and code volume of this adapter.

Thanks for your input @alexander-schranz !

@alexander-schranz

Copy link
Copy Markdown

@norberttech why it makes me a little bit sad, I totally understand that decision if there is no need currently for Search Index based Adapter in Flow.

If you need something always feel free to ping me. I added the findings of this discussion to the Scroller issue of SEAL: PHP-CMSIG/search#535 (comment) and will sure take that in mind when implementing such thing.

If there is any other feedback you or the others have around SEAL it is always welcome.

@norberttech

Copy link
Copy Markdown
Member

@norberttech why it makes me a little bit sad, I totally understand that decision if there is no need currently for Search Index based Adapter in Flow.

Oh we are not dropping the adapter, it's extremaly value and I wanted to have it since I found your project, we are just reducing the scope of this pr slightly by dropping Extraction part of the (ETL) - Transform and Load stays!

If you have a legit use cases for traversing over search engine results in ETL pipeline, we are more than happy to implement that extractor with the keyset pagination embedded, just say a word 😊

This whole adapter even without Extractor will bring a massive value, I just feel like at this point Extractor might increase the complexity of the adapter, that might not be worth the effort without a valid use case first.

SealLoader is unquestionable valuable addition to the framework 💪 Thank you for building SEAL!

@alexander-schranz

Copy link
Copy Markdown

Oh we are not dropping the adapter, it's extremaly value and I wanted to have it since I found your project, we are just reducing the scope of this pr slightly by dropping Extraction part of the (ETL) - Transform and Load stays!

I see that make sense. 😃

If you have a legit use cases for traversing over search engine results in ETL pipeline, we are more than happy to implement that extractor with the keyset pagination embedded, just say a word 😊

I will! If there are any changes in SEAL around the paginiation I will keep you uptodate when we add KeySet Pagination / Scroller to SEAL Core.

SealLoader is unquestionable valuable addition to the framework 💪 Thank you for building SEAL!

Nice to hear, looking forward to it. Let me know when it is released, I'm thinking to add a Section to the SEAL docs mentioning the SEAL Adapter in Flow PHP ETL.

@norberttech norberttech merged commit b59e45c into flow-php:1.x Jun 17, 2026
36 of 39 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in Roadmap Jun 17, 2026
@norberttech

Copy link
Copy Markdown
Member

@alexander-schranz the PR was merged, SEAL Adapter is now available as 1.x-dev and will become available under 0.41.0 release in next 2 weeks!

Documentation is available here: https://flow-php.com/documentation/components/adapters/seal/
Packagist: https://packagist.org/packages/flow-php/etl-adapter-seal

I'm going to spread the word at my social media in a moment!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[Proposal]: Replace Meilisearch & Elasticsearch with SEAL

4 participants