It’s the units, stupid

Whenever I see scientific code without units I scream inside my heart.

Everytime you write simulations of physical phenomena (often using numpy or matlab), make sure to always have variables where the units are clear, e.g. :

lambda_m = 633e-9 #wavelentgh in meters
c_mps = 3e8 #speed of light in meters per seconds
freq_Hz = c_mps/lambda_m

Failure to keep good track of the units has led to disasters. Yet complete lack of clarity  happens more often than not – just look at the code of a random scientist on github to witness the extent of the damage.

The reason why I am adamant about this is because a lot of time is wasted trying to debug code where it there’s a silly unit mismatch, and because we are doing physics.

Math versus physics

Why is coding without units such a terrible practice? It all boils down to the fact that computing is mostly about math and logic, and therefore not geared towards physical quantities. There is beauty in mathematical abstraction, but sometimes it doesn’t mean anything.

Take a mathematical statement that should be true:


Now ask yourself: what does it mean? If I add one orange to one apple:

1 orange + 1 apple = ?

It might sound silly but it’s actually pretty deep. You cannot add quantities which are not congruent. Yes, you may say that by adding one fruit with another fruit you get two fruits, but you’re cheating then.

This is somehow why object-oriented programming was invented: with the notion of “classes”, you can add entities which are compatible, through the game of function overloading and other niceties. In an ideal world, physical quantities in simulations should all have their own class, where the units would be defined.

Coding without units is like a boat without a rudder

I have rescued the code of many undergrads and grads by just adding units and let them realized that their calculations did not make sense.

“if you don’t know where you’re going you may end up somewhere else” – Yogi Berra

Some icebergs

Of course, sometimes it’s unclear what the unit of a variable is, especially when they are intermediary variables.
The biggest offender is the Fourier Transform, where conversion between direct and reciprocal space if often botched (by laziness or misunderstanding.)
If you process data, sometimes you miss a conversion factor (distances, angles)
But these situation should not deter you: it’s a good occasion to ask yourself what are the dimensions in your system that you need to record in your metadata.
I also found that trying to get “absolute” measurements when possible (e.g. converting counts on a camera to photons) does bring a lot of insights regarding noise and potential limitations of simulations.
One case where things are quite though is when you’re dealing with a mix of physical quantities and black box machine learning, where the units of hyper-parameters is not clear, and the weights in the network are basically “coupling” constants, trying to create relations between unrelated physical quantities (reinventing the laws of physics through chained linear relations.) But hopefully this is only a fraction of your work.

Plots without axes are meaningless

The corollary of quantities without units is that you end up with graphs without units (if you made it until there knowing what you plot in the end… congratulations!) And a graph without units is pretty much meaningless, unless you make all the measurement relative.
Please use units, wear a mask, stay safe.