Meta-GPT propose multi-agent frameworks for writing code, and claim that it can develop programs no other agent framework is capable of. According to Table 2 which is illustrated below, existing frameworks are incapable of creating relatively simple games, which was surprising to me considering GPT-4's capabilities. I tested the official prompts,
The tasks are scored based on a grading system from ‘0’ to ‘3’, where ‘0’ denotes ‘complete failure’, ‘1’ denotes ‘runnable code’, ‘2’ denotes ‘largely expected workflow’, and ‘3’ denotes ‘perfect match to expectations’ (shown in Section 4.2).
Task | AutoGPT | LangChain w/ Python REPL tool | AgentVerse | MetaGPT |
---|---|---|---|---|
Flappy Bird | 0 | 0 | 0 | 1 |
Tank Battle Game | 0 | 0 | 0 | 2 |
2048 Game | 0 | 0 | 0 | 2 |
Snake Game | 0 | 0 | 0 | 3 |
Brick Breaker Game | 0 | 0 | 0 | 3 |
Excel Data Process | 0 | 0 | 0 | 3 |
CRUD Manage | 0 | 0 | 0 | 3 |
- enter the official prompt from Table 6.
- If ChatGPT responds with general suggestions instead of code (e.g. tank game ), slightly modify the prompt to make it more explicit(e.g. using pygame, change some verbs)
- Since ChatGPT responses are typically short, if the model suggested that the current code is incomplete, simply respond
continue
until the code is complete (e.g. 2048-web). - I simply copy-pasted (and stitched in the case of 3) the generated code without modification. I did not write or modify any line not mentioned by gpt. I prompted
stitch together the final code without omissions
from thebrick
game and did 0 manual modifications
- I didn't rigorously tested this multiple times, but I didn't retry any failed attempts
- I tried my best not to do any sort of p-hacking or prompt engineering apart from the rules mentioned above unless mentioned.
- I filled missing resource files (e.g. sprites and music) that the model clearly said to include seperately.
These are tasks claimed to fail, according to Table 2
Task | Result | Conversation url | Description |
---|---|---|---|
2048-web | ✅ | link | Not so pretty, but works. |
2048-py | ✅ | link | pygame, the up and down keys are inverted, but works otherwise. |
snake | ✅ | link | pygame |
tank-game | ✅ | link | I stitched code from multiple blocks. I did not manually write any line. Nevertheless, this included, sprites, sound, shooting & collision, death checks which weren't pretty, but functions mostly well. I manually added the png and wav files, but did no modifications to the code. |
brick | ✅ | link | pygame |
flappy | ❌ | link | p5js, the game has some features but is incomplete |
excel | ✅ | link | |
crud | ✅ | link | Works surprisingly well! One mistake is that it doesn't check for existence on delete unlike on update. |