B
bhavnicksm
I built Chonkie because I was tired of rewriting chunking code for RAG applications. Existing libraries were either too bloated (80MB+) or too basic, with no middle ground.
Core features:
CHONK your texts with Chonkie
- The no-nonsense RAG chunking library
Looking for feedback on the architecture and performance optimizations. What other chunking strategies would be useful for RAG applications?
Comments URL: Show HN: Chonkie – A Fast, Lightweight Text Chunking Library for RAG | Hacker News
Points: 60
# Comments: 19
Continue reading...
Core features:
- 21MB default install vs 80-171MB alternatives
- 33x faster token chunking than popular alternatives
- Supports multiple chunking strategies: token, word, sentence, and semantic
- Works with all major tokenizers (transformers, tokenizers, tiktoken)
- Zero external dependencies for basic functionality
- Uses tiktoken with multi-threading for faster tokenization
- Implements aggressive caching and precomputation
- Running mean pooling for efficient semantic chunking
- Modular dependency system (install only what you need)


Looking for feedback on the architecture and performance optimizations. What other chunking strategies would be useful for RAG applications?
Comments URL: Show HN: Chonkie – A Fast, Lightweight Text Chunking Library for RAG | Hacker News
Points: 60
# Comments: 19
Continue reading...