G
gregpr07
Hey HN,
I made Browser-Use, an open-source tool that lets (all Langchain supported) LLMs execute tasks directly in the browser just with function calling.
It allows you to build agents that interact with web elements using natural language prompts. We created a layer that simplifies website interaction for LLMs by extracting xPaths and interactive elements like buttons and input fields (and other fancy things). This enables you to design custom web automation and scraping functions without manual inspection through DevTools.
Hasn't this been done a lot of times? Good question, as a general SaaS tool yes, but I think a lot of people are going to try to make their own web automation agents from scratch, so the idea is to provide groundwork/library for the hard part so that not everyone has to repeat these steps:
The vision: create repeatable tasks on the web just by prompting your agent and not care about the hows.
To better showcase the power of text extraction we made a few demos such as:
We are Gregor & Magnus and we built this in 5 days.
Comments URL: Show HN: I wrote an open-source browser alternative for Computer Use for any LLM | Hacker News
Points: 56
# Comments: 33
Continue reading...
I made Browser-Use, an open-source tool that lets (all Langchain supported) LLMs execute tasks directly in the browser just with function calling.
It allows you to build agents that interact with web elements using natural language prompts. We created a layer that simplifies website interaction for LLMs by extracting xPaths and interactive elements like buttons and input fields (and other fancy things). This enables you to design custom web automation and scraping functions without manual inspection through DevTools.
Hasn't this been done a lot of times? Good question, as a general SaaS tool yes, but I think a lot of people are going to try to make their own web automation agents from scratch, so the idea is to provide groundwork/library for the hard part so that not everyone has to repeat these steps:
- parse html in a LLM friendly way (clickable items + screenshots)
- provide a nice function calls for everything inside the browser
- create reusable agent classes
The vision: create repeatable tasks on the web just by prompting your agent and not care about the hows.
To better showcase the power of text extraction we made a few demos such as:
- Applying for multiple software engineering jobs in San Francisco
- Opening new tabs to search for images of Albert Einstein, Oprah Winfrey, and Steve Jobs
- Finding the cheapest one-way flight from London to Kyrgyzstan for December 25th
We are Gregor & Magnus and we built this in 5 days.
Comments URL: Show HN: I wrote an open-source browser alternative for Computer Use for any LLM | Hacker News
Points: 56
# Comments: 33
Continue reading...