Skip to content

[Bug]: Redis AuthenticationError when calling /crawl endpoint #2040

Description

@choozm

crawl4ai version

0.9.0 (Docker server image unclecode/crawl4ai:latest)

Expected Behavior

  • After setting CRAWL4AI_API_TOKEN and SECRET_KEY env variables as per 0.9.0 requirement, calling /crawl endpoint with simple crawler_config should return the website html source.
  • I use the test python script attached below to call the /crawl endpoint.

Current Behavior

  • Calling /crawl endpoint produced this error in crawl4ai:

redis.exceptions.AuthenticationError: HELLO must be called with the client already authenticated, otherwise the HELLO <proto> AUTH <user> <pass> option can be used to authenticate the client and select the RESP protocol version at the same time

  • See the log file: crawl4ai.log

  • Note: If I use crawl4ai 0.8.9 by setting docker env variable TAG=0.8.9 and keep everything else the same, the /crawl endpoint works.

Is this reproducible?

Yes

Inputs Causing the Bug

URL: https://www.kernel.org/category/contact-us.html

"crawler_config": {
        "type": "CrawlerRunConfig",
        "params": {
            "scraping_strategy": {"type": "LXMLWebScrapingStrategy", "params": {}},
            "table_extraction": {"type": "DefaultTableExtraction", "params": {}},
            "only_text": True,
            "delay_before_return_html": 1.0,
            "page_timeout": 420000,
            "exclude_social_media_links": True,
            "exclude_social_media_domains": [
                "facebook.com",
                "twitter.com",
                "x.com",
                "linkedin.com",
                "instagram.com",
                "pinterest.com",
                "tiktok.com",
                "snapchat.com",
                "reddit.com",
            ],
            "exclude_external_links": True,
            "exclude_external_images": True,
            "stream": False,
        },
}

"browser_config": {
        "type": "BrowserConfig",
        "text_mode": True,
        "user_agent_mode": "random",
        "extra_args": ["--disable-gpu"],
}

Steps to Reproduce

1. Start container to run crawl4ai 0.9.0 with 4 environment variables set:
- CRAWL4AI_API_TOKEN=lorem-ipsum-dolor-sit-amet-consectetuer
- SECRET_KEY=lorem-ipsum-dolor-sit-amet-consectetuer
- REDIS_PASSWORD=lorem-ipsum-dolor-sit-amet-consectetuer
- TAG=latest
2. Run test python script to crawl an example website.
3. See this error from python script: `Request failed: 500 Server Error: Internal Server Error for url: http://192.168.0.111:11235/crawl`
4. See this error in crawl4ai log: 
`redis.exceptions.AuthenticationError: HELLO must be called with the client already authenticated, otherwise the HELLO <proto> AUTH <user> <pass> option can be used to authenticate the client and select the RESP protocol version at the same time`

See 'Error logs & Screenshots' section for:
- Docker compose file
- Test python script to call /crawl endpoint
- Crawl4ai log
- Crawl4ai config.yml

Code snippets

import requests

url = "http://192.168.0.111:11235/crawl"
token = "lorem-ipsum-dolor-sit-amet-consectetuer"  # Replace with your actual token

# 1. Define the Authorization header
headers = {"Authorization": f"Bearer {token}", "Content-Type": "application/json"}

# 2. Define the payload
payload = {
    "urls": ["https://www.kernel.org/category/contact-us.html"],
    "crawler_config": {
        "type": "CrawlerRunConfig",
        "params": {
            "scraping_strategy": {"type": "LXMLWebScrapingStrategy", "params": {}},
            "table_extraction": {"type": "DefaultTableExtraction", "params": {}},
            "only_text": True,
            "delay_before_return_html": 1.0,
            "page_timeout": 420000,
            "exclude_social_media_links": True,
            "exclude_social_media_domains": [
                "facebook.com",
                "twitter.com",
                "x.com",
                "linkedin.com",
                "instagram.com",
                "pinterest.com",
                "tiktok.com",
                "snapchat.com",
                "reddit.com",
            ],
            "exclude_external_links": True,
            "exclude_external_images": True,
            "stream": False,
        },
    },
    "browser_config": {
        "type": "BrowserConfig",
        "text_mode": True,
        "user_agent_mode": "random",
        "extra_args": ["--disable-gpu"],
    },
}

try:
    # 3. Pass both headers and JSON payload to the post method
    response = requests.post(url, json=payload, headers=headers, timeout=5)
    response.raise_for_status()

    print(f"Status Code: {response.status_code}")
    print(response.json())

except requests.exceptions.RequestException as e:
    print(f"Request failed: {e}")

OS

Linux

Python version

3.12.3

Browser

Chrome

Browser version

149.0.7827.200 (Official Build) (64-bit)

Error logs & Screenshots (if applicable)

Metadata

Metadata

Assignees

No one assigned

    Labels

    🐞 BugSomething isn't working🩺 Needs TriageNeeds attention of maintainers

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions