Show HN: Chonkie – A Fast, Lightweight Text Chunking Library for RAG

  • Thread starter Thread starter bhavnicksm
  • Start date Start date
B

bhavnicksm

I built Chonkie because I was tired of rewriting chunking code for RAG applications. Existing libraries were either too bloated (80MB+) or too basic, with no middle ground.
Core features:
  • 21MB default install vs 80-171MB alternatives
  • 33x faster token chunking than popular alternatives
  • Supports multiple chunking strategies: token, word, sentence, and semantic
  • Works with all major tokenizers (transformers, tokenizers, tiktoken)
  • Zero external dependencies for basic functionality
Technical optimizations:
  • Uses tiktoken with multi-threading for faster tokenization
  • Implements aggressive caching and precomputation
  • Running mean pooling for efficient semantic chunking
  • Modular dependency system (install only what you need)
Benchmarks and code: GitHub - bhavnicksm/chonkie: 🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library
Looking for feedback on the architecture and performance optimizations. What other chunking strategies would be useful for RAG applications?



Comments URL: Show HN: Chonkie – A Fast, Lightweight Text Chunking Library for RAG | Hacker News

Points: 60

# Comments: 19

Continue reading...
 
Back
Top