Comparing code generation capabilities of various models.
Model | Response Size | Code Size | Completion Tokens | Prompt Tokens | TTFT | Total Time |
---|---|---|---|---|---|---|
claude-sonnet-4
RUNJS
|
23.41 KB | 21.99 KB | 6491 | 3390 | 8.18s | 87.26s |
deepseek-r1-0528
RUNJS
|
20.59 KB | 19.48 KB | 4888 | 2798 | 16.75s | 132.74s |
gemini-2.5-pro
RUNJS
|
20.08 KB | 18.84 KB | 8165 | 3155 | 35.53s | 81.19s |
o4-mini-high
RUNJS
|
9.86 KB | 9.85 KB | 9322 | 2709 | 88.00s | 111.80s |
devstral-medium
RUNJS
|
13.66 KB | 13.12 KB | 3087 | 2881 | 5.42s | 36.25s |
kimi-k2
RUNJS
|
14.95 KB | 14.94 KB | 3338 | 2717 | 7.98s | 69.82s |
gpt-4.1
RUNJS
|
17.4 KB | 16.74 KB | 5160 | 2710 | 5.17s | 83.00s |
qwen3-235b-a22b-07-25
RUNJS
|
24.31 KB | 24.3 KB | 5265 | 2737 | 4.98s | 187.57s |
qwen3-coder
RUNJS
|
20.41 KB | 20.4 KB | 4294 | 2737 | 5.46s | 217.69s |
grok-4
RUNJS
|
15.89 KB | 15.88 KB | 4083 | 2678 | 13.64s | 53.60s |