Carbon Copy Testing

Another jab against test coverage cargo culting, I suppose.

Teams that cargo cult TDD or code coverage metrics tend to end up writing tests that provide little to no regression protection nor refactorizability. There’s a case to be made for writing more tests, which is a problem that many teams have. And then there’s a problem in which teams that strive to do this and try to quantify it (after all, how can you improve something you cannot measure?), then fall into the trap of doing things like enacting minimum code coverage in the CI pipeline¹.

Let’s say we have a User model defined like this:

class User
  attr_reader :name, :age
 
  def initialize(name, age)
    @name = name
    @age = age
  end
end

How would we test a piece of code like this? Maybe let’s try initializing it and see if we can read back the data we provided:

require 'user'
 
describe User do
  context 'Given User model' do
    it 'can initialize correctly' do
      name = 'John'
      age = 30
      user = User.new(name, age)
      expect(user).to be_a User
      expect(user.name).to eq name
      expect(user.age).to eq age
    end
  end
end

At first glance, this doesn’t seem too bad of an example. But we’re at a point along the spectrum, of which the extreme end looks something like this:

describe User do
  context 'Given User model' do
    it 'can initialize correctly' do
      file = File.open('lib/user.rb').read
      expect(file).to eq <<~EOS
        class User
          attr_reader :name, :age
 
          def initialize(name, age)
            @name = name
            @age = age
          end
        end
      EOS
    end
  end
end

I’d say the first example isn’t very far on the spectrum from the second example. What additional benefit does writing the test the first way give us, as opposed to just writing it the second way? Maybe it gives us some representational independence and enforces the duck typing contract, i.e. if it has accessable name and age fields, it’s a User. If we ever wanted to refactor the User class and implement in some other way to maintain this contract, we could.

Can you think of a way to improve or refactor the code without changing the test? Unless you made an actual mistake like misnaming the field to begin with, probably not.

Personally, I’d say the ROI on writing this test is at best, slightly higher than zero, and at worst, negative. Someone has to actually write the test² , and now anyone who’s trying to add a new field through the application’s call path has to do double the work. If your goal is to punish your engineers for testing non-business-logic related data piping, the second method achieves the same outcome.

After all, if you’re making the same mistake twice, I mean, you must have a good reason for it right?

I’ve come up with a term for these kinds of tests — carbon copy testing³.

Of course, if there is business logic in your models, then it’s worth testing. Let’s say we need to implement a method on the User class to generate a stable string representation of a user. We could do something like this:

class User
  attr_reader :name, :age
 
  def initialize(name, age)
    @name = name
    @age = age
  end
 
  def stable_hash
    "User:#{name}:#{age}"
  end
end

And test it like this:

describe User do
  context 'Given User model' do
    it 'should have the same stable string representation for two similar users' do
      user = User.new('John', 30)
      user2 = User.new('John', 30)
      expect(user.to_s).to eq(user2.to_s)
    end
  end
end

If we ever wanted to refactor this to use MD5, we could do so:

require 'digest'
 
class User
  attr_reader :name, :age
 
  def initialize(name, age)
    @name = name
    @age = age
  end
 
  def to_s
    Digest::MD5.hexdigest("User:#{name}:#{age}")
  end
end

And the test should still continue to pass, as we expect, without any modifications.

Using a language with static types makes it significantly less likely that carbon copy tests will appear, but if engineers are still forced to comply with code coverage metrics or have a code review culture of expecting tests to be written for every file changed, they may still have no choice but to resort to writing them.

See Goodhart’s Law. ↩
Github Copilot dramatically lowers the amount of time required for this, especially (and ironically so) for carbon copy tests. ↩
Perhaps similar in spirit to snapshot testing in the context of frontend testing, but I think the use case for that is more legitimate. ↩

Carbon Copy Testing

#Footnotes

Footnotes