M
mountainriver
Hey HN,
We are working to apply the ideas of R1 to computer use. The primary struggle is creating reliable neural reward models since hard-verification rewards are not available at scale in GUI interactions.
Our team is currently deep in the weeds of collecting reasoning annotation data for GUI interfaces to train a reliable reward model.
We would love all thoughts, feedback, and collaborations!
Comments URL: R1 Computer Use | Hacker News
Points: 103
# Comments: 57
Continue reading...
We are working to apply the ideas of R1 to computer use. The primary struggle is creating reliable neural reward models since hard-verification rewards are not available at scale in GUI interactions.
Our team is currently deep in the weeds of collecting reasoning annotation data for GUI interfaces to train a reliable reward model.
We would love all thoughts, feedback, and collaborations!
Comments URL: R1 Computer Use | Hacker News
Points: 103
# Comments: 57
Continue reading...