Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the crawler-preference spec. #560

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

SadieCat
Copy link
Contributor

@SadieCat SadieCat commented Nov 10, 2024

Rendered link.


This is a skeleton for an idea I've had recently. I'm fully expecting this to require revisions and expansion before its production ready so please feel free to propose changes.

An alternate solution I was considering was advertising a plain CRAWLER token and then bots can detect that execute a CRAWLER <name> command and get back a response about whether that specific crawler is allowed on the network. I'm not sure if that overengineering things though.

Problem

Its very hard to find IRC channels because there's no useful comprehensive database of channels. A few exist (i.e. netsplit) but they rely on admins manually adding them which isn't great.

Its possible to crawl the entire address space for networks (and IRCStats currently does this) to collect data but many IRC admins have historically resisted making that information public for privacy reasons.

Solution

This specification adds a way for networks to declare that they are okay with bots crawling them. It also allows them to specify how often they'd like to be crawled. This allows networks with privacy concerns to opt-out of scanning.

I've put a WIP module with support for this on the InspIRCd Testnet (testnet.inspircd.org).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant