Local AI vs. GPT-4o: Can Private Open-Source LLMs Compete on a Coding Task?
In a previous article, Protect Your AI Privacy by Using Private LLMs — A Simple Step-by-Step Guide I showed how to run local LLMs using Ollama infrastructure, Node.js, and Next.js. Some readers asked why anyone would bother with local LLMs when large, closed-source models like GPT-4o are readily available online.
The answer comes down to privacy. Closed-source LLMs, while powerful, present privacy concerns when sensitive corporate or personal information is processed. By running local, open-source models, you maintain full control over your data. But does it make sense to use an open-source LLM for complex tasks like coding? In this article, I will compare responses from OpenAI’s GPT-4o with variously sized Deepseek models and critically assess their performance on a coding challenge.
The Challenge: Palindrome Checker
Prompt
Task: Write a function is_palindrome(s: str) -> bool that checks whether a given string is a palindrome, ignoring spaces, punctuation, and case.
Requirements:
- Return True if the string is a palindrome, and False otherwise.
- Ignore non-alphanumeric characters and case.
Example:
is_palindrome(“A man, a plan, a canal, Panama”) # True
is_palindrome(“Hello”) # False
GPT-4o Response
GPT-4o provided the following solution:
import re
def is_palindrome(s: str) -> bool:
cleaned_str = re.sub(r'[^a-zA-Z0-9]', '', s).lower()
return cleaned_str == cleaned_str[::-1]
Deepseek Responses
Now, let’s compare this “gold-standard” response with responses obtained from open-source Deepseek models of varying sizes.
Deepseek-r1 14 billion parameters model response
def is_palindrome(s: str) -> bool:
cleaned = [c.lower() for c in s if c.isalnum()]
cleaned_str = ''.join(cleaned)
return cleaned_str == cleaned_str[::-1]
Interestingly, this approach mirrors GPT-4o’s method but avoids regular expressions, using list comprehension for string cleaning. Preprocessing converts each character to lowercase and filters out non-alphanumeric characters, while the palindrome check compares the cleaned string with its reverse. The logic is sound, and performance should be the same.
Deepseek-r1 8 billion parameters model response
def is_palindrome(s: str) -> bool:
processed = ''.join([c.lower() for c in s if c.isalnum()])
return processed == processed[::-1]
Like the 14b model, this version uses list comprehension to clean the string but omits extra variables, streamlining the function. While more concise, it achieves the same result.
Deepseek-r1 1.5 billion parameters model response
s = input().strip()
n = len(s)
if n == 0:
print("True")
else:
is_palindrome = True
for i in range(n // 2):
if s[i] != s[n - 1 - i]:
is_palindrome = False
break
print(is_palindrome)
This approach manually iterates from both ends of a string, comparing characters one by one, eliminating the need for slicing or string manipulation. However, it assumes case sensitivity and does not account for punctuation, making it an outdated solution that fails to fully meet the task’s requirements. This method is less elegant and more error-prone in real-world applications due to its inability to handle case insensitivity and ignore non-alphanumeric characters.
Evaluation: Is Open-Source Reliable for Coding Challenges?
While GPT-4o’s solution is both clear and concise, the 14b and 8b Deepseek models offer competitive alternatives. The main difference lies in implementation style — GPT-4o uses regular expressions while Deepseek models rely on list comprehensions. Both approaches are equally valid, though GPT-4o’s use of re.sub is more compact for the specific problem at hand.
However, the smaller 1.5b Deepseek model falls short. Its manual approach ignores key requirements such as ignoring case and punctuation, demonstrating that smaller models may struggle with complex tasks or miss important details.
For coding tasks like palindrome checking, larger open-source models such as Deepseek-r1:14b and 8b provide reliable solutions that are comparable to closed-source models like GPT-4o. However, the performance of smaller models (e.g., 1.5b) can vary significantly, and they may not be suitable for more nuanced or detailed problems. Moreover, if speed of execution is of essence the closed-source model may be much quicker than open-source alternative, although this will largely depend on the availability and spec of the GPU hardware.
In the next article, I will explore how open-source and closed-source models handle creative challenges and assess their capabilities in more abstract tasks. Stay tuned!