Skip to content

Identify ourselves when fetching files from an outside site #414

@duckduckgrayduck

Description

@duckduckgrayduck

A growing number of websites have started to block generic python-requests user agents but do not block well meaning and identified unique user agents. Similar to identifying ourselves in cross client calls, we should also identify ourselves when a user provides a URL to fetch and import into DocumentCloud. We should also add some logging mechanism to be able to audit this and detect when sites are blocking users from importing their documents into DocumentCloud.

This serves as the dual function to be more reliable and also to perhaps file a public records request on why we're blocked given we host primary source materials for free to the public.

Metadata

Metadata

Assignees

No one assigned

    Labels

    planningReview at project planning meeting

    Type

    No fields configured for Task.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions