• This site is a private, non-commercial website. As such, you're welcome here as long as you were invited. If you would like an invite, reach out to Cliff Spark

Show HN: I made a website to semantically search ArXiv papers

  • Thread starter Thread starter Quizzical4230
  • Start date Start date
Q

Quizzical4230

As a grad student (and an ADHDer), I had trouble doing literature review systematically. To combat this, I made a website that finds similar papers using the meaning of the thing I am looking for.
I used MixedBread's [^1] embedding model to generate vectors from the abstracts. I store and search similar vectors using Milvus [^2] and finally use Gradio [^3] to serve the frontend. I update the vector database weekly by pulling the metadata dataset from Kaggle [^4].
To speed up the search process on my free oracle instance, I binarise the embeddings and use Hamming distance as a metric.
I would love your feedback on the site :) Happy Holidays!
[1]: https://www.mixedbread.ai/docs/embeddings/mxbai-embed-large-... [2]: The High-Performance Vector Database Built for Scale | Milvus [3]: Gradio [4]: arXiv Dataset



Comments URL: Show HN: I made a website to semantically search ArXiv papers | Hacker News

Points: 203

# Comments: 53

Continue reading...
 
Back
Top