-
Notifications
You must be signed in to change notification settings - Fork 4
Filter config
NOTE: rss-funnel is under development. This document attempts to describe the up-to-date filters with the code in latest master
branch. The described features here may not fully align with that of the latest released version. If a filter documented here is not present in the version you use, try the nightly
image.
This filter fetches the full HTML of the article from the article's original source and fill it in the body of the article. You probably want to specify simplify
property or add a simplify_html
filter after this filter.
Configuration type: Object
Properties:
- parallelism (optional, number): The number of parallel requests to make. Default to
20
. - simplify (optional, boolean): Whether to simplify the HTML using readability. Default to
false
. - append_mode (optional, boolean): Whether to append the full text to the existing content. If not, the content is replaced with the full text. Default to
false
. - keep_element (optional, string): Specify a selector to only keep the given element in the HTML.
- client (optional, [Client config][Client config]): you can specify special http settings for the request like setting Cookies or User-Agent.
This filter simplifies the HTML content of an article using readability and replaces the content of the article with the simplified version.
Configuration type: Object. Only an empty object without property is accepted.
This filter is typically used after the full_text
filter, which fetches the full HTML of the article from its link
. And alternatively, you can specify simplify
option in full_text
filter to achieve the same effect.
This filter removes HTML elements matching the CSS selectors.
Configuration type: Array of strings. Each string is a CSS selector.
This filter keeps HTML elements matching the given CSS selector.
Configuration type: string. The string is a CSS selector.
This filter splits one article into multiple ones. It is useful for splitting aggregated RSS feeds into individual articles (like Hacker News Daily) and generating feed from an HTML source (in which case the HTML page is parsed as the singleton article of a feed).
Each article is split by the given CSS selector. You must specify the css selectors for various fields, including title
, link
, content
, as well as author
.
Configuration type: Object
Properties:
-
title_selector
(required, string): The CSS selector for the title element. ThetextContent
of each selected element is taken as the title. -
link_selector
(optional, string): The CSS selector for the link to the article. The url is extracted out of thehref
attribute of the selected elements. If the link selector is not specified, it takes on the value oftitle_selector
. -
description_selector
(optional, string): The CSS selector for the description. TheinnerHTML
of each selected element is taken as the description. -
author_selector
(optional, string): The CSS selector for the author. -
date_selector
(optional, string): The CSS selector for the publication date. The date is parsed from thetextContent
of the element or any of its attributes. Valid date formats are RFC3339 (e.g.1996-12-19T16:39:57-08:00
) and RFC2822 (e.g.Tue, 19 Dec 1996 16:39:57 -0800
).
The selectors are evaluated against the article's description
(or content
) parsed as HTML.
The user must ensure that all selectors match the same number of elements, otherwise rss-funnel will have no way to match up the selectors.
This filter allows you to redact or replace text in the content of the articles. The operations are executed in the order specified.
Configuration type: Array of "operations".
Operations:
-
remove
(string): Remove the matched text. -
remove_regex
(string): Remove the text matching the given regular expression. -
replace
(object): Replace the matched text with the given string.- keys:
-
from
(string): The text to replace. -
to
(string): The replacement. -
case_sensitive
(optional, boolean): Specifies whether the matching should be case-sensitive or not (default: false).
-
- keys:
-
replace_regex
(object): Replace the text matching the given regular expression with the given string.- keys:
-
from
(string): The regular expression to match. Use(?<name>...)
for named capture groups. -
to
(string): The replacement. Use$name
to refer to the named captured groups. Or use$1
,$2
, etc. to refer to the groups by index. -
case_sensitive
(optional, boolean): Specifies whether the matching should be case-sensitive or not (default: false).
-
- keys:
Note that due to syntax limitations, there is no way to specify remove
or remove_regex
with case sensitivity. If you need to do so, you can use replace
or replace_regex
with an empty to
field.
The keep_only/discard filter enables users to selectively retain or discard posts based on specified keywords or patterns.
Configuration Type: String, Array of Strings, or Object
- field (optional, enum): One of
title
,description
, orany
(default:any
). Ifany
is chosen, the filter applies to both the title and description of a post. - matches (string, or a list of strings): Regular expressions to match in the selected
field
of a post. - contains (string, or a list of strings): Plain strings to identify in the selected
field
of a post. - case_sensitive (optional, boolean): Specifies whether the matching should be case-sensitive or not (default: false).
For simple matching of non-regex keywords across any fields, users can directly specify the string or a list of strings as a short-hand. Example:
- path: /show-or-ask-hn.xml
source: <https://news.ycombinator.com/rss>
filters:
- discard:
- crypto
- blockchain
- discard: openai
- keep_only:
field: title
matches:
- '^Show HN:'
- '^Ask HN:'
case_sensitive: true
This example demonstrates discarding posts containing the keywords "crypto" and "blockchain" while keeping only those with titles starts with "Show HN:" or "Ask HN:".
Limit the number of posts. Config type: an integer or a duration string.
This filter can operate in two modes:
- count mode (integer): only the first
n
posts are kept - duration mode (duration string): only posts published within
duration
are kept
The format of duration string follows the same format as duration strings used in other places. Find more information about the duration format at duration_str.
Examples:
- path: /hackernews-fresh.xml
source: https://news.ycombinator.com/rss
filters:
- limit: 8h
- path: /hackernews-first-10.xml
source: https://news.ycombinator.com/rss
filters:
- limit: 10
Highlight matching keywords or any regular expression patterns in the posts' description.
Configuration Type: Object
- keywords (optional, list of strings): A list of literal strings to match on. Either
keywords
orpatterns
must be specified. - patterns (optional, list of strings): A list of regular expressions. Either
keywords
orpatterns
must be specified. - bg_color (optional, string): The background color to the highlighted text (default: "#ffff00")
- case_sensitive (optional, boolean): Specifies whether the matching should be case-sensitive or not (default: false).
Merge articles from other feeds into the current feed. This is useful for merging multiple feeds into one.
Configuration type: Object, or a single source string, or an array of sources
Properties:
- source (required, string or array of strings/objects): The URL or source of the feed(s) to merge. See the source syntax documentation for more information.
- parallelism (optional, number): The number of concurrent requests to make for fetching multiple sources (default: 20).
- client (optional, Client config): You can specify special HTTP settings for the request, like setting Cookies or User-Agent.
- filters (optional, list of filters): The filters to apply to the merged feed. The filters are applied in the order specified.
Example of merging multiple feeds:
- path: /merge.xml
source: https://example.com/feed1.xml
filters:
- merge:
source:
- https://example.com/feed2.xml
- https://example.com/feed3.xml
- https://example.com/feed4.xml
client:
user_agent: My Custom User-Agent
parallelism: 10
filters:
- remove_element:
- .ads
# or, if you don't need extra configuration:
- path: /merge.xml
source: https://example.com/feed1.xml
filters:
- merge:
- https://example.com/feed2.xml
- https://example.com/feed3.xml
- https://example.com/feed4.xml
In this example, the feeds from https://example.com/feed2.xml
, https://example.com/feed3.xml
, and https://example.com/feed4.xml
are merged into the current feed (https://example.com/feed1.xml
). The merged feed then has the .ads
elements removed using the remove_element
filter. The parallelism
option is set to 10
, which means up to 10 feeds will be fetched concurrently.
Example of merging a feed created "from scratch":
- path: /from-scratch.xml
source:
format: rss
title: My Custom Feed
link: https://example.com
description: This is a custom feed created from scratch
filters:
- merge:
source:
- https://example.com/feed1.xml
- https://example.com/feed2.xml
In this example, a new feed is created "from scratch" with a custom title, link, and description. The articles from https://example.com/feed1.xml
and https://example.com/feed2.xml
are then merged into this custom feed.
The modify_post
filter allows you to modify individual posts in the feed using JavaScript code.
Configuration Type: string
The string should be the JavaScript code that modifies the post
variable in-place. You can also read from the feed
variable in this filter. If you want to remove the article, set post = null
or return null
.
Example:
- path: /modify-title.xml
source: https://example.com/feed.xml
filters:
- modify_post: post.title = `${post.title} (modified)`
You can use console.log(string)
function to print debugging info to the stdout.
You can also early return from the filter by using an if
statement and returning. Only the modifications made before the early return will be applied.
- path: /early-return.xml
source: https://example.com/feed.xml
filters:
- modify_post: |
if (post.title.includes("skip")) {
return;
}
post.title = `${post.title} (modified)`
The actual fields of post
can be found at:
You can use the "Json" mode on the inspector UI to view the JSON representation of the posts you're manipulating.
You can also use await
inside the code to perform asynchronous operations. See the JavaScript API documentation for more details.
For an example of using await
with fetch
, check out the DeArrow YouTube feed in the Cookbook.
The modify_feed
filter allows you to modify the entire feed using JavaScript code.
Configuration Type: string
The string should be the JavaScript code that modifies the feed
variable in-place.
Example:
- path: /set-title.xml
source: https://tokio.rs/_next/static/feed.xml
filters:
- modify_feed: feed.title.value = "My Modified Tokio Blog Feed"
You can use console.log(string)
function to print debugging info to the stdout.
You can also early return from the filter by using an if
statement and returning. Only the modifications made before the early return will be applied.
The actual fields of feed
can be found at:
You can use the "Json" mode on the inspector UI to view the JSON representation of the feed you're manipulating.
You can also use await
inside the code to perform asynchronous operations. See the JavaScript API documentation for more details.
Note: This filter is deprecated. It is recommended to use the modify_post
or modify_feed
filters instead, as they provide a more streamlined interface for modifying posts and feeds, respectively.
Configuration type: string. The string is the JavaScript code to run.
You must define either (or both) of the two global functions: modify_feed
and modify_post
.
The convert_to
filter allows you to convert the format of the feed from RSS to Atom, or vice versa. This can be helpful when you want to use the modify_post
or modify_feed
filters on feeds of different formats, as it allows you to write your JavaScript code in a uniform way, targeting a specific feed format.
Configuration type: string
The string should be either rss
or atom
, specifying the format you want to convert the feed to.
Example:
- path: /rss-feed.xml
source: https://example.com/atom-feed.xml
filters:
- convert_to: rss
- modify_post: |
// JavaScript code targeting RSS format
post.title = `${post.title} (modified)`;
In this example, the original Atom feed from https://example.com/atom-feed.xml
is first converted to the RSS format using the convert_to: rss
filter. The modify_post
filter then modifies the post titles, and the JavaScript code is written for the RSS format.
Note: The conversion between feed formats is a best-effort process and may not be perfect, as there are many misaligned fields between the two formats. Some information or metadata may be lost or transformed during the conversion process.
It's generally recommended to use the convert_to
filter before using modify_post
or modify_feed
filters, as it allows you to write your JavaScript code in a consistent manner, targeting a specific feed format. This can make your code more readable and maintainable, especially when working with feeds from various sources and formats.
Rewrite image URLs to use a proxy, helping bypass image loading restrictions set by some websites.
Configuration type: Object or empty object ({}
) for default settings
Properties:
-
domains
(optional, list of strings): Domains to apply the proxy to. Supports globbing. -
selector
(optional, string): CSS selector for image tags to rewrite (default: "img"). -
proxy
(optional, string): Proxy to use for fetching images (e.g., "socks5://localhost:9150"). -
referer
(optional, string): Referer header for image requests. Options: "none", "image_url", "image_url_domain", or a custom string. -
user_agent
(optional, string): User-Agent header for image requests. Options: "none", "transparent", or a custom string. -
external
(optional, object): Use an external proxy service instead of the built-in one.-
base
(required, string): Base URL of the external proxy service. -
urlencode
(optional, boolean): Whether to URL-encode the image URLs.
-
For more detailed configuration and usage information, refer to the full Image proxy documentation.
Examples:
# Use default settings
- image_proxy: {}
# Internal proxy with custom settings
- image_proxy:
domains:
- "*.example.com"
selector: "img.proxy-me"
referer: image_url_domain
user_agent: "Custom User Agent String"
# External proxy
- image_proxy:
external:
base: "https://external-proxy.example.com/proxy?url="
urlencode: true
Find magnet links in the body of entries and save them in the enclosure (RSS) or link (Atom). The resulting feed can be used in a torrent client.
Configuration type: Object or empty object ({}
) for default settings
Properties:
-
info_hash
(optional, boolean): Match any[a-fA-F0-9]{40}
or[a-fA-F0-9]{68}
as the info hash and construct a magnet link (default: false). -
override_existing
(optional, boolean): Whether to override existing magnet links in the enclosure/link (default: false).
Example:
# Use default settings
- magnet: {}
# Custom configuration
- magnet:
info_hash: true
override_existing: true
The note
filter is a special filter that has no effect on the feed or its articles. It serves only documentation purposes, allowing you to add notes or comments to your filter configuration.
Configuration type: string
The string should be the note or comment you want to add.
Example:
- path: /feed.xml
source: https://example.com/feed.xml
filters:
- note: This feed is for demonstration purposes only
- remove_element:
- .ads