Sunday, August 10, 2025

AI dev tools August 2025 update

This is an update to my previous post. Things are moving fast, so I decided to document the state of things after five months.   

This week, OpenAI released their open-weight (Apache 2 licensed) models gpt-oss-20b and gpt-oss-120b. IMO, this is bigger news than their other model release this week, GPT-5. The expert analyses are showing that the 120B and 20B models are very capable while being significantly smaller in parameters compared to the competition. Running an O4-mini or O3-mini level model in a consumer laptop would have been unimaginable a few weeks ago, but here we are.  


The above video shows my experience running gpt-oss-20b on a 2024 M3 MacBook Air with LM Studio. Although the tokens generated per second are quite low, it is still usable. (Also, I've realized that LM-studio with its MLX support is much faster and far superior to Ollama in usability.) Here I'm getting the model to retrieve a live web page (BBC home page) and extract the news headings, all happening locally on a laptop with the help of a code sandbox and an MCP tool invoking playwright. 😅

Microsandbox and coderunner open up lots of possibilities here to make life easier using local models.

Talking about possibilities, Google also released Genie 3, which I believe will have many implications across video game development, filmmaking, robotics, and many other industries. Sadly, it didn't get the attention it deserved in my opinion.

Coding agents are the talk of the town these days. But interestingly, everyone (except Claude Code) is open-sourcing their product, it seems!
In other news, research has challenged the notion that AI tools are significantly improving the productivity of senior OSS developers. However, as explained in this excellent article, the situation is not as straightforward as it seems.

All in all, we may not be any closer to ASI or AGI. But the progress the field has achieved so far is big enough to change the trajectory of lots of careers (and economies too), I think. I wonder what new advancements we'll see in the coming months.

Sunday, March 30, 2025

AI dev tools that I use in March 2025

It's 2025, and LLM-based developer tools have become mainstream, not a novelty. I've been gradually using these tools for coding tasks outside of work. I enjoy using them in two popular ways: for vibe coding, which is all about those fun, low-stakes projects, and in responsible AI-assisted programming. In that latter setting, I take the time to review every line of code it generates, as I have higher security and quality requirements for certain tasks.

Here are the tools I commonly use these days:

  • Co-pilot chat in VS Code: I use it mostly to ask questions about the code base I have open in the editor and for general technical questions regarding the task at hand. Inline chat by highlighting code snippets or adding @workspace as context to the chat works well with my flow. The 50 chat requests/month on the free plan are enough for me. Although it comes with access to both GPT-4o and Claude 3.5 Sonnet models, I'm mostly sticking to GPT-4o here.
  • Cline: This is one of the first VS code extensions I tried for AI-assistance, and I liked it enough to stick with it. There are a few features I enjoy most in Cline:
    • The separation of plan/act modes helps me to mentally track the progress of the task.
    • The pane showing token costs helps to track the API costs in dollars as well as tokens.
    • The diff view clearly shows the proposed changes, so I can approve/disapprove as needed. After the task is completed, I will have a sense of which files of the code base were modified.
  • Claude code: It was somewhat surprising to see a big model provider like Anthropic getting into the application layer by providing an agentic CLI tool for development tasks. I'm sure they have some good reasons to build a CLI tool without going on the IDE extension route like the others. The installation, setting up, and working with it were very smooth. I like that they keep adding new features to it.
  • repomix and llm tool: I use this combination to understand code bases I'm delving into for the first time. I use repomix to first generate an LLM-friendly representation of the code base and then ask LLM to generate an architecture summary of the repo like this: cat repomix-output.xml | llm -m gemini-2.5-pro-exp-03-25 -s 'this file contains all the files in the repository combined into one. provide architectural overview as markdown' | tee architecture.md

Based on the number of input tokens sent to the model, I can clearly see that Cline is a bit more verbose in its prompts than Claude Code. But I guess the additional context helps because I noticed Cline was slightly better when handling complex tasks. So my usage pattern is to first try the task with Claude Code and, if it fails, throw away the generated code to start from scratch with Cline.

In terms of the models, I've moved from Claude 3.5 to Claude 3.7 Sonnet, and I'm paying for the API tokens as needed. Based on some quick experiments via their web interface, I'm also considering moving to Gemini 2.5 Pro (gemini-2.5-pro-exp-03-25 to be exact) because Google's offering a generous API quota in the free tier for this very capable model.

Tools that I want to start using more:

  • OpenHands: I've heard a lot of good things about it since OpenDevin, but I didn't get the time to try it out.
  • Zed: With me trying out Rust development, I wanted to use Zed for my development with all the hype surrounding it. But I could not get into the flow with it.
  • Warp: Same as with Zed, Warp sounded amazing on paper. But I kept going back to the plain old terminal. I should give this another chance.
  • files-to-prompt: I want to use it instead of repomix, but could not get it to work with UV.
  • Cursor, Windsurf, and others: Cline is working well for me for now, but I should try these out to see what's new with them.
  • Ollama: I'm eager to integrate all my tools with a locally hosted model! Unfortunately, the models I've found so far for my hardware (MacBook Air M3 16GB) haven't quite met my needs for coding tasks. I'm really looking forward to that amazing future. If you have any suggestions for models that would work well for me, I'd love to hear them!

PS: I also use Grammarly Premium. I thought I would include it in dev tools because I write a lot in English while coding (in code comments and documentation, and these days, mostly as prompts to the LLMs). So, grammar and spell checking all of this prose is important.