Do You Test Your AI? | Sten Vesterli's Blog

How do you test the AI you use? All the major providers are constantly leapfrogging each other, and if you don’t investigate their improving capabilities, you are at risk of missing out.

My personal test suite for LLM chatbots includes playing chess against them. Two years ago, they could manage 5-7 moves before they made an illegal or stupid move. The best LLMs are now up to 20-25 moves.

In a professional coding setting, you should create a test suite of relevant coding tasks on your codebase. For example, giving each AI a block of code and asking for a security and performance review. You’ll find a big difference between engines, and all of them are rapidly improving.

Have someone on your team test the current crop of AI tools regularly. Rotate the task between team members – they will each test different aspects.