Page Canary Learning Browser
Hypothesis
- AI tools can be used to learn how to use a website
- Learning works by using AI models to detect and click on elements
- If the workflow was successful, we save that set of actions for reuse.
Premise
- Web automation is usually a workflow of:
- Get the query selectors
- Click and type to accomplish workflow
Pivot n34t.com
"We're building an AI powered web browser"
- Write an integration tests to test the sign in workflow on "example.com"
Example
Given a task "Message reddit user "n34t-bot" hello"
Break it down to:
- Navigate to Reddit
- Login to Reddit
- Message user
System design
UI => Query screening => Task breakdown => Knowledge lookup => Run task => Save results in knowledge db => return result
UI
- Collect the directive from the user
- View previous runs
Query screening
Query screening attempts to error out early if the user enters an invalid query
- Does the bot have enough information to perform the task?
- Is the task acheiveable by a web browser?
- Is the task malicous / illegal / etc.
Task breakdown / planning
- Break down the task into substeps to be completed in order
Knowledge lookup
- Checks internal knowledge system if we know how to accomplish any of these tasks already
Run tasks
- Runs tasks to accomplish workflow
Save results
- Saves the results