I used 3.7 on a project yesterday (refactoring to use a different library). I provided the documentation and examples in the initial context and it re-factored the code correctly. It took the agent about 20 minutes to complete the re-write and it took me about 2 hours to review the changes. It would have taken me the entire day to do the changes manually. The cost was about $10.
It was less successful when I attempted to YOLO the rest of my API credits by giving it a large project (using langchain to create an input device that uses local AI to dictate as if it were a keyboard). Some parts of the codes are correct, the langchain stuff is setup as I would expect. Other parts are simply incorrect and unworkable. It’s assuming that it can bind global hotkeys in Wayland, configuration required editing python files instead of pulling from a configuration file, it created install scripts instead of PKGBUILDs, etcetc.
I liken it to having an eager newbie. It doesn’t know much, makes simple mistakes, but it can handle some busy work provided that it is supervised.
I’m less worried about AI taking my job then my job turning into being a middle-manager for AI teams.
I feel this pain.
I’ve been trying to get simple telemetry working over lora on a ESP32-C6, LLMs are largely worthless in this. We gotta fall back to old school RTFM models