When I encounter non-trivial programming problems, I try to compare a few LLMs, to keep the pulse of which are most effective in practical, day-to-day use. The Twittersphere has leaned Anthropic during Q4 2024 and Q1 2025, and my first few examples lean that direction as well.


But, in an environment where small differences can be leveraged into 100x results, it seems worthwhile to keep experimenting to confirm that we understand where to get a 15% edge. Thus, we are pleased to present you, a daily comparison of


linkSunday March 9: Find an ambiguously described bug buried in a 30 line method

Given this method, Why does Ruby report

[2025-03-09 06:53:39.983]
[2025-03-09 06:53:39.983] SystemStackError (stack level too deep):
[2025-03-09 06:53:39.983]
[2025-03-09 06:53:39.983] /Users/bill/.rvm/rubies/ruby-3.2.3/lib/ruby/3.2.0/set.rb:511:in `each_key'
[2025-03-09 06:53:39.983] /Users/bill/.rvm/rubies/ruby-3.2.3/lib/ruby/3.2.0/set.rb:511:in `each'
[2025-03-09 06:53:39.983] /Users/bill/.rvm/rubies/ruby-3.2.3/lib/ruby/3.2.0/benchmark.rb:311:in `realtime'
[2025-03-09 06:53:39.983] app/controllers/api/api_controller.rb:131:in `bubble_activity'
[2025-03-09 06:53:39.983] config/initializers/warden_jwt_token_dispatcher_patch.rb:40:in `call'
[2025-03-09 06:53:39.983] lib/middleware/timeout_middleware.rb:16:in `call'
[2025-03-09 06:53:39.983] lib/middleware/okay_check_middleware.rb:23:in `call'
[2025-03-09 06:53:39.983] vendor/gems/rails_modifications/initializers/action_dispatch_no_missing_map_exception.rb:8:in `call'
[2025-03-09 06:53:39.983] lib/middleware/block_ip_middleware.rb:32:in `call'

Anthropic 3.7 Deep Think: Gradually evaluates possibilities over a couple minutes, eventually pinpoints two lines, one of which was the actual culprit. ✅

ChatGPT 4o: Wrong, not helpful.

Anthropic 3.7 Standard: Gets the same general answer as Deep Think version, which is correct. ✅ Tried it a second time and got it very wrong though.

DeepSeek: Thorough, but wrong and not very helpful.

Winner: Anthropic by a good bit over ChatGPT 4o and DeepSeek


linkFriday March 7: Interpret a variety of time zone strings

How to parse a time zone string in Ruby on Rails into the number of hours deviation from UTC? It should parse "US/Pacific" or "Pacific/Fiji" or any other time zone name a user may enter

OpenAI o1 Pro: Disparate answer. Mildly helpful in that it offers multiple avenues, but misses the point of wanting a single method that combines all approaches.

DeepSeek DeepThink R1: Partially correct: doesn't have the fallback heuristics of Anthropic, but better captures DST differences by checking the current time in the zone vs. current time in UTC ✅

Winner: Anthropic & DeepSeek both considerably better than OpenAI. Anthropic better "spirit of the question" answer, DeepSeek more thorough answer.


linkThursday March 6: String diff method (deduce query is asking for LCS)

linkQuery

Write a Ruby method that can take two strings bananas o'reilly really for reals and banana really(for reals) and produce the shortest possible difference in characters between the two:

1. "s o'reilly"

2. "()"

No other characters should be present in the diff, since all the rest of the words are common between the two strings. Ensure that the methods produced do not abbreviate variable names.


linkCommentary

The examples that were given in the query weren't quite right - there was an extra space that should have been present in the first example. This led DeepSeek down a multi-minute rabbit hole where it kept trying good ideas, but finding they didn't match the expected output because the human hadn't paid close enough attention to give a completely accurate example.


It was interesting that Anthropic Deep Think also was confused it its output generated wasn't exactly right, but it deemed the answer sufficiently correct that it stopped generating in less than a minute, similar to o1.


linkNon-deep thinks

First submitted the algo to Anthropic & OpenAI without deep thinking. Both gave very deficient answers.


linkOpenAI deep think

After 90 seconds, produces this code, which isn't very close to what was asked:

DiffUtility.single_chunk_diff("bananas o'reilly really for reals", "banana really(for reals)")
[
[0] "s o'reilly really for reals",
[1] " really(for reals)"
]

Weaksauce.


linkAnthropic deep think

After 2 minutes, produces this code, which successfully produces:

DiffUtility.string_diff("bananas o'reilly really for reals", "banana really(for reals)")
[
[0] "so'illy re ",
[1] "()"
]

Arguably slightly better than desired.


Date: March 2024

Winner: Anthropic over OpenAI by a lot


linkWednesday March 5: Give functional component a ref

How can the functional component CabViewer keep a ref for its functional component child, CabCommittersFrame?


linkCopilot

Suggests using forwardRef with useImperativeHandle in CabCommittersFrame, then includes the whole of CabViewer with a one line description of what changed in the file.

linkCursor

Suggests using React.forwardRef on CabCommittersFrame, with no change needed to CabViewer. Certainly preferable if it works (tbd).