LLM Element Descriptors

Problem

LLM vision models needs to be able to uniquely determine which element to interact with when there are multiple matching some condition.

LLM describe the element that it wants to click on

instead of "click("text")"

click("text", {
color: 'blue',
text: 'large',
})