• This site is a private, non-commercial website. As such, you're welcome here as long as you were invited. If you would like an invite, reach out to Cliff Spark

Show HN: Defuddle, an HTML-to-Markdown alternative to Readability

  • Thread starter Thread starter kepano
  • Start date Start date
K

kepano

Defuddle is an open-source JS library I built to parse and extract the main content and metadata from web pages. It can also return the content as Markdown.
I built Defuddle while working on Obsidian Web Clipper[1] (also MIT-licensed) because Mozilla's Readability[2] appears to be mostly abandoned, and didn't work well for many sites.
It's still very much a work in progress, but I thought I'd share it today, in light of the announcement that Mozilla is shutting down Pocket. This library could be helpful to anyone building a read-it-later app.
Defuddle is also available as a CLI:
GitHub - kepano/defuddle-cli: Command line utility to extract clean html, markdown and metadata from web pages.
[1] GitHub - obsidianmd/obsidian-clipper: Highlight and capture the web in your favorite browser. The official Web Clipper extension for Obsidian.
[2] GitHub - mozilla/readability: A standalone version of the readability lib



Comments URL: Show HN: Defuddle, an HTML-to-Markdown alternative to Readability | Hacker News

Points: 158

# Comments: 37

Continue reading...
 
Back
Top