K
kepano
Defuddle is an open-source JS library I built to parse and extract the main content and metadata from web pages. It can also return the content as Markdown.
I built Defuddle while working on Obsidian Web Clipper[1] (also MIT-licensed) because Mozilla's Readability[2] appears to be mostly abandoned, and didn't work well for many sites.
It's still very much a work in progress, but I thought I'd share it today, in light of the announcement that Mozilla is shutting down Pocket. This library could be helpful to anyone building a read-it-later app.
Defuddle is also available as a CLI:
GitHub - kepano/defuddle-cli: Command line utility to extract clean html, markdown and metadata from web pages.
[1] GitHub - obsidianmd/obsidian-clipper: Highlight and capture the web in your favorite browser. The official Web Clipper extension for Obsidian.
[2] GitHub - mozilla/readability: A standalone version of the readability lib
Comments URL: Show HN: Defuddle, an HTML-to-Markdown alternative to Readability | Hacker News
Points: 158
# Comments: 37
Continue reading...
I built Defuddle while working on Obsidian Web Clipper[1] (also MIT-licensed) because Mozilla's Readability[2] appears to be mostly abandoned, and didn't work well for many sites.
It's still very much a work in progress, but I thought I'd share it today, in light of the announcement that Mozilla is shutting down Pocket. This library could be helpful to anyone building a read-it-later app.
Defuddle is also available as a CLI:
GitHub - kepano/defuddle-cli: Command line utility to extract clean html, markdown and metadata from web pages.
[1] GitHub - obsidianmd/obsidian-clipper: Highlight and capture the web in your favorite browser. The official Web Clipper extension for Obsidian.
[2] GitHub - mozilla/readability: A standalone version of the readability lib
Comments URL: Show HN: Defuddle, an HTML-to-Markdown alternative to Readability | Hacker News
Points: 158
# Comments: 37
Continue reading...