Skip to content

Category: Software Development

Artificial Analysis’s “Coding Index” Doesn’t Measure Coding

If you’re trying to figure out which LLM is best at writing software, you’ll eventually land on ArtificialAnalysis.ai. It looks authoritative. Clean design, lots of models, comparison charts — exactly what you want when you’re evaluating options. They publish a “Coding Index” that ranks models on programming ability.

There’s just one problem: Artificial Analysis’s Coding Index doesn’t measure coding ability. Not even close. And by publishing it under that name, they are actively misleading every developer, team lead and decision-maker who relies on it.

What’s Actually Being Measured

The Coding Index is a composite of two benchmarks: Terminal-Bench Hard and SciCode. Let’s look at what each one actually tests.

Terminal-Bench Hard evaluates AI capabilities in terminal environments — system administration, data processing and software engineering tasks. In practice this is primarily sysadmin work. Can the model navigate a filesystem, run commands, compile things? These are useful skills in the same way that knowing how to use a screwdriver is useful if you’re a building architect. It’s table stakes, not a measure of the thing that matters.

SciCode is a scientist-curated benchmark with 288 subproblems across 16 scientific disciplines. The benchmark’s own description says it “requires integrating scientific knowledge with programming skills to solve real research problems.” Read that again — it’s explicitly testing the intersection of domain-specific scientific knowledge and coding. If a model happens to know less about computational fluid dynamics but writes better production software, it scores lower on the “Coding Index.” A model that’s brilliant at architecting maintainable systems but doesn’t know the Navier-Stokes equations gets penalized on what’s supposed to be a coding benchmark.

That’s it. Two benchmarks. Sysadmin tasks and science homework. That’s what Artificial Analysis chose as the entire basis for ranking models on coding ability.

What’s Missing

Here’s what the Coding Index doesn’t include — benchmarks that actually measure software development:

SWE-bench Verified, where models patch real bugs in real open-source repositories with real test suites. It’s the closest thing the industry has to measuring actual software engineering: read an existing codebase, understand the problem in context, produce a working fix.

Aider’s polyglot benchmarks, LiveCodeBench, BigCodeBench and EvalPlus — all of which test code generation directly against functional correctness.

Beyond specific benchmarks, the Coding Index doesn’t test for any agentic capabilities, complex problem solving or long-horizon task completion — all of which are directly relevant to how models are actually used for software engineering today. The ability to plan an approach, execute across multiple files, recover from errors and sustain coherent work over extended sessions is increasingly what separates useful coding models from toys. The Coding Index measures none of it.

None of these are perfect. But they at least involve writing code that has to work.

Why This Matters

When I looked at Artificial Analysis’s Coding Index chart, it ranked Claude Sonnet 4.6 above Claude Opus 4.6 for coding. I use both models extensively every day for real software development work. That ranking is simply wrong. Sonnet is a capable model, but Opus is substantially better on complex software — multi-file refactors, subtle architectural issues, large-context reasoning across interconnected systems. The kind of work that actually defines whether a model is useful for serious development.

Artificial Analysis’s benchmark can’t see this because their benchmark isn’t measuring it.

The real danger isn’t that experienced developers will be misled — most of us will smell the problem quickly, just as I did. The danger is that decision-makers, team leads evaluating tools and developers earlier in their careers will see “Coding Index” on a professional-looking site and reasonably assume it reflects which model will best help them build software. Artificial Analysis is failing those people. The Coding Index is a grab bag of tangentially related capabilities wearing a label that implies something it absolutely does not deliver.

The Takeaway

If you’re evaluating models for software development, don’t use Artificial Analysis’s Coding Index. Look at SWE-bench Verified and Aider’s benchmarks. Look at what people doing real work with these models are reporting. Better yet — try them yourself on your actual tasks.

Composite indexes with authoritative-sounding names are convenient. But convenience is worthless when the underlying data doesn’t measure what the label claims. Artificial Analysis should either fix their Coding Index to include benchmarks that actually test software development or stop calling it a Coding Index. What they’re publishing right now is misinformation with a clean UI.

Leave a Comment

Back From the Wilderness

It has been over six years since my last post. Life happened. Covid happened. Work happened. I got heads-down on a challenging role and let the blog go dark. That’s on me and I’m here to course correct.

What I’ve Been Doing

When I last posted in 2019, I had just started a new job doing Clojure development on Mac — a pretty significant departure for someone with roughly 14 years of Delphi followed by 16 years of C# and .NET on Windows. That role has lasted considerably longer than I originally expected. I’ve spent the intervening years building complex systems in Clojure on the JVM, working with cloud infrastructure and generally living deep in a technology stack I would not have predicted for myself.

Working on Mac full-time has been its own education. Under the hood it’s built on UNIX and that part is genuinely great — I routinely have several terminal windows open with multiple tabs in each and the low-level, non-GUI development experience is solid. The GUI side is another story. Finder is an exercise in frustration, window drag handles are elusive and Apple’s philosophical commitment to “there is one correct way to do everything and you will conform” runs directly counter to how I think software should work. I believe software should adapt to the user and present multiple ways of accomplishing the same task — menus, toolbars, keyboard shortcuts — and be customizable. Apple believes users should adapt to the software. We disagree. But that’s maybe a topic for its own post.

Where My Head Is At

AI is, by any reasonable measure, the most important shift in software development in decades. Possibly ever. My interest in it isn’t new — I’ve been thinking seriously about the trajectory of artificial intelligence since around 2000 and I’ve been anticipating something like what we’re seeing now for a long time. What’s changed in the last couple of years is that the tools have matured to the point where they’re practically useful and the rate of progress has become impossible to ignore even for skeptics.

My employer has made it very clear that understanding and leveraging AI isn’t optional — it’s expected. And I agree with them on that. So I’ve been investing significant time into going deep: agents, local LLM inference, security and hardening, infrastructure and the practical realities of building real systems around these technologies. This is where the industry is going and I plan to be ahead of it rather than chasing it.

What I’ll Be Writing About

This blog is going to be a place where I share what I’m learning, what I’m building and what I think about all of it. Honestly. The tone will be direct. I’m not going to sugar coat things, I’m not going to toe any corporate lines and I’m not going to pretend to agree with “best practices” that are performative rather than practical. Where the facts don’t line up with the popular consensus, I’m going to go with the facts.

Expect posts on AI — agents, tooling, local inference, security and what it’s actually like to stand up the infrastructure to support all of this. The technology stack will lean heavily on C# with .NET and TypeScript where the choice is mine to make, because they’re mature, productive, broadly capable platforms. But I also have years of Clojure experience and some rather pointed opinions that have been building up. Those will make an appearance.

I’ll also be writing about infrastructure, networking, self-hosting and hardware — because the local AI space is evolving fast and the decisions around it matter more than most people realize.

The Short Version

I’m back. I have things to say. Some of it will be useful, some of it will be opinionated and I’ll do my best to make sure all of it is honest.

More to come.

Comments closed

DI is not IoC

Dependency Injection (DI) helps to enable Inversion of Control (IoC), but DI itself is not IoC because truly moving the locus of control outward requires an architecture that wraps and invokes the DI. An example of this would be ASP.NET MVC which uses DI to instantiate the controllers and other objects needed to process web requests. That is IoC, but the libraries that implement DI (“containers”) are not themselves IoC containers they are DI containers. Calling them IoC containers is inaccurate.

Comments closed

The Service Locator Pattern

Developers keep referring to Service Locator as an anti-pattern. If that is the case then ASP.NET MVC and every IoC container I’ve ever seen must be wrong because they use it.

The interface for accessing an IoC container is an implementation of the Service Locator pattern. You’re asking for some particular interface (aka a service) and its giving you back an instance (if it can).

Under the hood ASP.NET MVC uses a service locator (which almost always happens to be an IoC container) to new-up Controllers for handling incoming HTTP requests.

Service Locators can certainly be used incorrectly or where they should not, but they are not an anti-pattern. They are a specific tool in what should be an immense toolbox for solving certain types of problems. Sometimes they are the best choice. Sometimes they are a terrible choice. But the pattern itself is not at fault.

For more read Service Locator vs Dependency-Injection which goes into more detail and is also a very fun read. I’d mention the author’s name, but I can’t seen to find a name associated with the blog.

Comments closed

Original Coder Libraries w/Layers Architecture

I’ve just pushed a new version of the Original Coder Libraries up to GitHub that includes the first draft of the Layers library and architecture.

The libraries are hosted on GitHub: The Original Coder Libraries

This push includes the first version of the Layers architectural library I’ve been working on. It is based on similar architectures I’ve used on a few different projects in the past which proved to be very helpful. From a features and maturity standpoint this could probably be considered the 3rd incarnation (once they are completed, still in alpha).

The library makes it incredibly easy and efficient to build software systems using layers. Especially systems that deal with data that need CRUD (Create, Read, Update and Delete) operations. Using the library it will be possible to implement a full set of CRUD endpoints for a resource (an entity / database table / or the like) in about 100 lines of code.

I’ve included a project named LayerApiMockup that provides an example of what setting up and implementing will be like with the library. It still needs a bit of work and I need to add the add-on libraries for implementing specific technologies (Entity Framework, ASP.NET MVC, etc) but this is a good start.

1 Comment

Senior Developers & S.O.L.I.D. Principals

The S.O.L.I.D. principals didn’t exist when I was learning to program and by the time I had heard of them I had been working at a senior level for years. Once I heard about them I had a quick look, thought they were all pretty obvious and paid no more attention.

Let me clarify that. The principals that make up S.O.L.I.D. are pretty basic stuff. Junior developers will get them wrong all the time. Mid-level developers will mostly get them right but will still make mistakes. By the time a developer starts to transition into a senior role they should always be applying these sorts of basic principals correctly and mostly automatically. A developer who has been working at the senior level for a few years should never even need to think about such things consciously.

Just like people don’t consciously think about how to walk/run, how to use punctuation when writing or how to park their car after they have been doing any of those for a few years. Experienced authors and drivers don’t think about any of the basics of doing those tasks. Which is why people can commute from home to work and not even be able to remember the drive. For this same reason the S.O.L.I.D. principals should be being applied at a subconscious level by senior developers.

Which just recently has become a pain for me. Now that S.O.L.I.D. is all the rage every interviewer seems compelled to drill developers on all of the details. Try to remember all of the low-level spelling and grammar rules you use when writing. If you’ve been out of school for at least a few years I bet you can’t remember most, if any, of them. You don’t need to anymore, just like I haven’t needed to think on such a basic level for years when programming. Those basic principals are automatic and forcing them back into the conscious level isn’t a benefit and doesn’t improve anything.

Interviews just a couple/few years ago weren’t asking such basic questions during senior level interviews. These types of basic interviewing questions are not going to land good senior developers. Someone who is book smart or spent time cramming before the interview could answer those even if they weren’t a developer. But the more in depth questions that can’t be crammed for and require real (senior level) experience not just book learning aren’t getting asked as often (or at all) during senior level interviews now.

Comments closed

Initial release of Original Coder Libraries!

I’ve created the Original-Coder-Libraries repository on GitHub and uploaded some source code to get things started! They are licensed under the GNU LGPL v3.

The libraries currently contain approximately 3,500 lines of C# according to code metrics. This is a tiny fraction of what I have in my personal libraries and I’ll be adding more in the future.

The OriginalCoder.Common library includes:

  • Abstract base class for implementing IDisposable
  • Abstract base class for implementing IDiposable that also automatically cleans up registered children.
  • Exception classes for use in Original Coder libraries
  • Comprehensive set of extension methods for reading and writing XML using Linq to XML
  • Interfaces and classes for returning messages and operation results (mostly intended for use with Web APIs)
  • Extension methods for working with enumerations
  • Centralized application configuration for working with DateTimes (such as which formats to use for user display vs data storage).
  • Many useful DateTime extension methods
  • Extension methods for working with Type
  • Extension methods for calculating a cryptographic hash of a disk file
  • Standard interfaces for defining common properties on classes (Name, Description, Summary, WhenCreated, WhenDeleted, etc).
  • Extension methods for working with standard object property interfaces.

The OriginalCoder.Data library includes:

  • Standard interfaces for defining common data properties on classes (WhenCreated, WhenUpdated, WhenDeleted, IsActive).
  • Extension methods for working with standard object property interfaces.
  • Standard interfaces for defining unique key properties on classes (Id, Uid, Key)
  • Extension methods for working with standard key interfaces.

Repository: https://github.com/TheOriginalCoder/Original-Coder-Libraries

Comments closed

My Definitions for the S.O.L.I.D. Principals

The 5 principals which make this up were bundled together as the S.O.L.I.D. principals in the early 2000s. These describe basic concepts that are required to write good code. So it is very important for novice, junior and mid-level programmers to put real thought into these and get good at applying the concepts correctly and consistently. The better a programmer gets at these the less time they should need to spend thinking about them.

As I mentioned these are very important basic programming concepts that need to be learned, mastered and (eventually) internalized by software engineers. But I personally think the language often used to define and explain the underlying concepts is a bit cryptic or overly complex. Below is my attempt at conveying these concepts.

(S) Single Responsibility Principal

Don’t mix multiple (especially unrelated) capabilities / functionalities together in one class. Make separate, more narrowly defined classes for each distinct capability. Classes that have a more defined / narrow purpose are easier to learn, apply, maintain and also reuse. Note that this applies to any division of code (methods, classes, interfaces, modules, libraries, etc).

(O) Open Closed Principal

Great concept, not so great name.

Once functionality has been crated and is in use don’t modify it in a way that would break existing code! At the same time there should also be a way to extend or alter the behavior of the previously written code (without rewriting or copy/paste) that won’t break code that uses it.

A good way to do this is through the use of abstract base classes and inheritance. The core functionality for performing a task should be written into a base class. A concrete descendant class will inherit that functionality and allow it to be used. That concrete class should not be changed in a way that breaks existing once in use. But new concrete classes and possibly a new level of abstract class can be added that extends or changes the functionality without impacting existing concrete classes.

(L) Liskov Substitution Principal

A rather complicated way of defining a good concept.

Descendant classes should not break either the explicit nor implicit contract of the parent class. Descendant classes should be implemented in a manner that allows them to be dropped in as replacements to the parent class without requiring any code that expects the parent class to be changed. The key here is that it is more than just signature level compatibility (list of methods, parameters, and types). The expected behavior must also remain the same!

Note that this isn’t limited to classes. This concept should be applied anyplace where substitution is allowed by the programming language or system. Which means this also applies to interfaces. This would also apply to multiple DLLs that expose the same method signatures (which is sometimes used to implement plug-ins or extensions). Anyplace substitution is possible care should be taken to ensure the behaviors are consistent.

(I) Interface Segregation Principal

This is the same general concept that underlies the Single Responsibility Principal but applied specifically to interfaces. Its the same underlying concept, it doesn’t need 2 separate principals.

(D) Dependency Inversion Principal

This is the least obvious of the bunch, possibly because this way of thinking and the frameworks / technologies needed to support it are more recent. Or maybe because this principal is emergent and not stand alone; It becomes possible due to other principals.

The idea here is that classes which require instances of other classes (or interfaces) to perform their work should not instantiate specific concrete implementations within their code. This has the effect of embedding the decision as to which concrete implementation to use in a place where it is difficult to change. This also typically results in these choices being embedded in many different places throughout a software system.

The concept behind the Open/Close principal encourages inheritance and abstractions. The concept behind the Liskov principal states that all implementations of something that allows substitution (such as classes and interfaces) must be interchangeable. If we’re applying both of those concepts consistently its a shame to make substitutions difficult by hard-coding and embedding those decisions all over the place in the code.

Bingo! And that’s why Dependency Inversion sprung up. To standardize and centralize the ability to use substitution in a software system.

Currently the most common preference for implementing this capability is the Dependency Injection pattern where concrete instances are passed into a class via its constructor. But that is not the only option, any pattern that extracts and centralizes these decisions could be used. One such alternative is the Service Locator pattern which, in some cases, can be preferable to Dependency Injection.

It is worth noting that the Dependency Inversion principal is the only one of the bunch that isn’t universally applicable. Not all applications need DI. I would likely argue that all large or complex systems probably should use it. But small systems or one-offs that aren’t expecting to be maintained probably don’t.

Comments closed

Software Architecture is Layers of Goodness

The idea of software systems having layers has been around for quite awhile and the terminology is very helpful when used properly.

This article does not intend to cover the myriad of reasons why a software architect would choose to use, or not to use, layers in a software system. My off the cuff thought is any system that has more than 3 developers or more than 50,000 lines of code or would probably benefit from layers to some degree. Layers are certainly not needed for all software systems, but they are certainly helpful in some systems. Even in smaller systems they can be a useful conceptual idea, communication tool or handy for breaking up work by skill set.

Click to enlarge diagram

A layer is all of the classes, types and related artifacts that are used to perform a particular type of function in a software system. The code that makes up layers is mostly application specific (written by the application developers specifically for use in that one system). Layer code does not mingle, all application specific code that exists in a layer exists in one and only one layer. Generic code (such as List, cryptography functions, string functions, etc) does not fall into a layer because it isn’t application specific. Ideally code that isn’t application specific and doesn’t fall into a layer should be written as reusable code and put in a library. Failing that the code that makes up layers in a system should at least exist in in separate and obviously named namespaces with non-layered code in different name spaces.

The most common layers of functionality used in software systems are the presentation layer, the service layer and the repository layer. If a systems is intended for use by other systems (not an end user) then an API layer would take the place of the presentation layer. Each of these common layers can be referred to by different names depending on who you talk to. Just like a rose, the name isn’t important because the purpose of the layer remains the same.

Click to enlarge diagram
  • Presentation Layer, User Interface Layer, GUI Layer and Web Client Layer all refer to the same functionality of interacting with a person.
  • API Layer, Web API Layer and RESTful API Layer all refer to the same functionality of interacting with other external systems via a defined API.
  • Service Layer, Business Layer, Logic Layer and Business Logic Layer all refer to the same functionality of implementing the business rules, logic and complex processing within the software system.
  • Repository Layer and Data Access Layer both refer to the same functionality of reading and writing data to/from persistent storage.

The presentation / user interface layer is responsible for all user interaction. Business logic/rules and code that reads and writes data from persistent storage should not exist in this layer. The purpose of this layer is limited to displaying information and interacting with the user.

An API / Web API layer (instead of a user interface layer) for software systems that expose their functionality for use by other systems to use instead of a user interface. Functionally this layer takes the same role as the user interface layer would, it is responsible for all interfacing with the client.

The repository / data access layer is responsible for reading and writing data to/from persistent storage. Most often this is a relational database but it can be any type of structured storage (files, XML, no sql databases, etc).

The service / business logic layer is most of the code that exists between the presentation and repository layers. It is where the business logic and rules are implemented and where complex processing occurs. Business rules and logic should not be coded into other layers.

There can also be adapter layers positioned between the other layers in a system. For example an adapter layer between a Web API layer and a service layer that converts data from Data Transfer Objects (used in API) to the structures used by the service layer and then back into DTOs when data is returned from the service layer.

Systems can also include a proxy layer that replaces the service and repository layers for capabilities that are implemented by an external system. For example a complex web application system where the front end is implemented by one set of servers which use proxies to call into the Web API that is exposed by another set servers which contain the business rules, logic, processing and data access code.

Now that we have the fundamentals out of the way we can discuss why the concept of layers in software systems is important and still very much relevant today.

Click to enlarge diagram

Without the concept of layers software systems would be big collections of objects that interact with each other. This would be troublesome because not all objects in a software system should be allowed to talk to each other. Layers are a top down design, code executing in a layer can call other code in the same layer or in the next layer lower down. But code can not call into layers above it and can’t skip over layers when calling downward. This is why the concept of layers is important in software systems!

Classes in the presentation/API layer should never directly talk to classes in the repository layer. Likewise classes in in the service layer should never call classes in the presentation or API layer. Classes in the data access layer should never call presentation/API or service layer classes. If we were to throw out the concept of layers and view software systems as a collection of classes that call each other then these very important concepts of separation would be lost.

A good system architecture codifies these layer concepts and makes them easier and more accessible for the application programmer while also reducing the likelihood of bad code that violates these principals. How this can be done in architecture is a difficult concept to express concisely and would require, at the very least, a sizable article of its own. I am working on transforming some of my existing personal library code into open source libraries that I’ll publish on NuGet in the foreseeable future. Those libraries will contain a very nice structured architecture I’ve used numerous times to implement these layer concepts in systems. If you’re interested in that keep an eye on my blog.

Comments closed

My Natural Environment

This is where I do my software development (unless I’m on-site in a cubicle somewhere).

This is my natural Environment (my home office in development mode)

I love working in my home office and find it to be an extremely effective and efficient environment for writing software!

  • Fast Intel i7 CPU at 4 GHz
  • 32 GB of RAM
  • 1 TB SSD for boot & temporary storage
  • Dual 4TB hard drives in RAID-1 for data storage
  • Nvidia GeForce GTX 1080 for driving my main 4K monitor
  • Nvidia GeForce GTX 1050 Ti for driving my 2 additional monitors
  • Main monitor is 55″ 4K which allows me to see A LOT OF CODE at once
  • Two secondary monitors for references or other information
  • Indirect, subdued lighting so the focus is the monitors
  • Music with a strong beat (electronic, industrial or metal) for motivation

Though normally the computer is sitting on the right side of the desk, not on top. My motherboard failed last week and I had to replace it. Historically I’ve preferred AMD based systems but decided to give Intel a try for the last upgrade. That has gone very poorly resulting in yet another hardware failure (see my previous post AMD vs Intel) .

I’m waiting for the 3rd generation of AMD Ryzen Threadripper processors to become available (hopefully later this year) for my next upgrade. Those are a game changer with all of their extra I/O bandwidth. I’ll also upgrade to a M.2 PCIe main drive and 64 GB of RAM at the same time. Should be nice.

Comments closed
Site and all contents Copyright © 2019 James B. Higgins. All Rights Reserved.