Page Canary Design Doc
This document covers the engineering design of Page Canary
- We will the gpt 3.5 LLM model for cost reasons
Bot algorithm
Loop
The core bot loop can be classified into the following category:
State detection
- Bot reads the state of the world
- Native apis
- Bot
- OCR
- Computer vision
Planning
Bot figures out what it should do:
Execution
- Bot executes on it's plan
Page Canary Scaling
Free trial
Free trial hosts run on the main host
Overview
- Each paid instance is it's own VM
- When a premium user signs up we provision a VM
- Users are given a dashboard to control the VM
- Users can spin down the VM
Pricing
The pricing from AWS is $6 / VM
- If the users billing fails, we spin down their host VM and send them an email
Provisioning
- Call VM host api
- Store in database
- Connect to user owner
To provision a new VM we need to call the API to create a new $6 VM for the user
Installation
- SSH to VM
- Download installation script
- Run installation script
VM selection
It was determined that the t4g.small instance
- 2 GB RAM
AI
How do we use artificial intelligence at Page Canary?
Large language models (LLMs)
- Issue prioritization
- Page content issue detection
- Duplicate issue detection
- Dynamic QA test plans
Computer vision
Object detection
- Open CV object detection is used
Recommended reading resources
- Observations running more than 5 million headless sessions a week (opens in a new tab)
- LLM Powered Autonomous Agents (opens in a new tab)
Helpful shell scripts
Determine jpeg quality
identify -format '%Q' path-to-image.jpgPage Canary Lang Chain
https://www.langchain.com/ (opens in a new tab) https://js.langchain.com/docs/modules/agents/ (opens in a new tab) https://minimaxir.com/2023/07/langchain-problem/ (opens in a new tab)