Introducing Numbat: A programming language with physical units as types

bugsmith@programming.dev · 1 year ago

Introducing Numbat: A programming language with physical units as types

kibiz0r@lemmy.world · 1 year ago

F# has a feature kinda like this: https://learn.microsoft.com/en-us/dotnet/fsharp/language-reference/units-of-measure

aluminium@lemmy.world · 1 year ago

Thought the same, plus you have the massive .NET Ecosystem.

OmnipotentEntity@beehaw.org · 1 year ago

This is a cool idea. There are other programming languages that have libraries that expose similar behavior. For instance, Rust has the uom crate, Haskell has the units package, and C++ has the header only library SI.

But there is something to be said about it being built in.

kureta@lemmy.ml · 1 year ago

The web page says other libraries implement units whereas they implement dimensions. 1 cm and 1 inch has the same dimension, namely length, and you are able to add them together and get a correct result. seems nice. I don’t know if it’ll have any practical benefit but I like it.

Starfighter@discuss.tchncs.de · edit-2 1 year ago

I can’t talk about the other libraries but the uom crate does the same thing.

The dimensions are encoded as a vector of generics, allowing you to get the correct unit even when dividing a distance by time for example.

It’s quite the clever use of Rusts type system.

Morphit @feddit.uk · 1 year ago

For sure. It’d be nice to have the units in a separate namespace but at least Numbat won’t let you override identifiers already defined in the system of measure. I use Pint on Python - I usually keep the units in an identifier named u so they can’t get accidentally overridden. That means either using u.km for single units or u('g/cm^3') for composite units. It’d be great if the language could separate units e.g. as [km] or `` but getting a compact syntax to distinguish the units namespace without colliding with other language features would be tricky. I remember F# having a good syntax but didn’t dive that deep since it’s not used widely in my field.

Irdial@lemmy.sdf.org · 1 year ago

Wish I had this in engineering undergrad! Very cool.

Nawor3565@lemmy.blahaj.zone · 1 year ago

I’m currently in engineering undergrad and this looks like it’ll be a lifesaver. Wolfram Alpha can do some pretty good work with units sometimes. But a lot of the time it’ll do weird stuff like refuse to interpret “V” as “volt”, so you have to type out the full name of every single unit. This language should handle that a million times better.

solrize@lemmy.world · 1 year ago

I’ve always thought Frink (frinklang.org) looked pretty cool. It’s been around forever. I’ve never used it though.

Lupec@lemm.ee · 1 year ago

Fascinating idea, that was an interesting read! Don’t think I’d ever seen something like that done before.

aluminium@lemmy.world · 1 year ago

There exist a bunch of libraries designed for kotlin using its extension methods and properties system to produce the same, like this one : https://github.com/vsirotin/si-units

ScreaminOctopus@sh.itjust.works · 1 year ago

This would be so nice in a mainstream language, I wonder if it would be possible with rust’s macro system?

SittingWave@programming.dev · edit-2 1 year ago

I disagree.

I worked with a software for quantum physics and electronic transport from microscale to mesoscale. It had a “python based” DSL that had support for units through that module. Seems the perfect scenario for such entity, so we wrote it integrating another similar package (it’s not the units package, I can’t find it anymore. In any case, it let you say things like speed = 3 * meters / second)

The results were… interesting.

There are many major problems:

managing scales. What if you add 1 meter and 1 nanometer? it’s technically possible, but you have loss of precision? or should it convert everything to nanometers? or increase the precision and set it to meters? Now multiply this for all the various rules of conversion and potential scale difference of various units and you get in a mess very fast.
constants. Researchers (the target of that language) often use fixed constants. Of course these constants have units. Of course they are important for dimensional analysis, but if all your work is in one measure domain (e.g. you are always using atomic units) then you just don’t care about the unit of measure of that constant. It’s known, but who cares? However, to perform math with dimensional correctness, now you force the researcher to define the constant in the script as the number followed by the unit, which again adds nothing but a chore of finding somewhere and writing in your meta language the unit.
Sometimes you are handling logarithms of metricated units, e.g. what’s the unit of measure of log(3 meters)? or what is the unit of measure of a cholesky decomposition of a matrix of metricated stuff? I honestly still don’t know as of today, but… does it matter? Do you care? Especially if they are in-transit numbers?
Massive performance impact and trouble when specifying arrays or mixing them. When specifying geometry information of large molecules, what do you do? specify an array, followed by the unit (meaning that the whole array numbers are all in the same unit?), or do you grant to specify one element in e.g. nanometers and the other in micrometers? now you have to eventually reconcile and normalise. What if you have to perform a multiplication between two matrices one in nanometers and one in micrometers? again, reconciliation. It’s a nightmare. Additionally, now these values are no longer memory contiguous, which trashes your cache and makes it close to impossible to transfer data to C, for example, for performance gain.
Units tend to be short names. This pollutes the namespace with short names such as m, s, etc. The result is that the likelihood of users overriding unit names is very high. So you write them in extended form, but then it becomes a chore because now instead of saying 3 * m / s they have to write 3 * meters / second. Or worse; 3 * units.meter / units.second.
Dimensional analysis implies that you might have to simplify all your units, to a normalised form, otherwise you end up with really complex behavior trying to perform operations. E.g. fuel efficiency is measured in meters squared, which is a very weird measure because it’s basically cubic meters (of fuel) divided by length traveled (in meters). The reynolds number is actually a pure (no unit) number. What should you do if you use the equation? simplify the pile of units until you eventually reduce it to a pure number, or leave it as it is?

So, it looks cool, but in practice it’s pointless. The only practice to follow is:

accept and output whatever unit makes sense for the business domain. Store them in a variable named explicitly with the unit (e.g. length_angstrom) until converted for internal use. Then you can implicitly assume the units to be standardized at one unit realm and omit the explicit unit.
convert at the interfaces fro user into metric or business specific units (eg. atomic units) and only use this form internally, in strict consistence.

In other words:

user gives stuff in micrometers -> store it in length_um -> convert it in nanometers -> store it in length -> use length from now on (implicitly in nanometers)

The reverse for output

Turun@feddit.de · edit-2 1 year ago

But all the downsides you mention are inherent to the problem, not to adding dimensions.

managing scales

How do you add a nanometer and a meter without units? You need to make the choice loss of precision vs convert to nanometer anyway. Types just give you reassurance that you did the right thing (at opposed to, say convert m to gigameter instead of nanometer, because you forgot a minus sign for the conversion).

Constants

A good unit system has the constants saved. I don’t need to look up the dimensions of hbar in eV, I just do units.hbar and get the thing I mean, not the number that implicitly has a J or eV next to it. And if you have a constant that is not in the units library, you only need to define it once. This really isn’t the problem you make it seem.

Logarithms

I am very curious where you take logarithms of measures with dimensions and why you cannot normalize to e.g. l/1m // length in meters before taking the log. A unit system doesn’t prevent you from doing this, it just makes it explicit what you implicitly did anyway (but didn’t tell anyone else, because it’s implicit)

Performance

This is a fair point and I will grant you that. If you do large scale simulations you need performance more than anything. But most math you do in science is a short script that I don’t want to pour an hour into to make sure I didn’t fuck up the units. I want to write the script, have the computer check the units and then take a break while the computer takes an hour to compute the result.

Units and namespaces

Not a big problem in my experience. The vast, vast majority of variables are derived. You don’t need to write v = 3 * meters / second, you have distance = 100 * meters and time = 33 * seconds somewhere in your code anyway, so you only need v = distance/time. The assignment to v is identical, if you have units or not. You only need to define the input, and only do it once.

normalized form

The units library simply allows you to choose. print(fuel_efficiency.in(1 * liter / (100 * kilometer))) // 5 l/100km or print(fuel_efficiency.to_si_base_units()) // 5e-8 m**2

I have written code of the form xxxx_in_meV, yyyy_in_per_cm_cubed, etc before. It’s much worse than a proper unit system library.
Because if you don’t use a library you may be able to use a number for your constants - but you have to find out the value of your constants in some weird jumble of dimensions. It’s the difference between target_efficiency = 5 * liter / (100 * km) and target_in_metersquared = 5e-8 // 5 l/100km, converted to base units

Taking user input is much easier too. Just do units.parse(user_input), and the user is free to give um or nm or Å. No need for a prominent tip in the ui “input must be given in um!”.

All that being said, a new language is not what I am looking for. I use python sympy (though it’s not very ergonomic to use) for proper script programs and insect.sh if I need to convert something quickly.

EDIT: insect.sh tells its users to use numbat now, hahaha! numbat.dev has exactly the same UX though, so I’ll just recommend that now for those quick physics calculations. It really is an invaluable tool to have.

SittingWave@programming.dev · 1 year ago

My point is that it’s mostly useless to use a language that supports these kind of things, because the proper programming practice is to normalise and treat the edge cases at the interface. Once you are inside your own codebase, you use SI at the scale that makes sense and that’s it. No more ambiguity, no more need to carry the unit around. The unit is implicit and standardised throughout your code, and you don’t have to carry around dead weight (in memory and computation) for nothing.

lad@programming.dev · 1 year ago

When something is enforced on type level it doesn’t require your memory and usually doesn’t require computation.

As of lately I came to think that being explicit is mostly better than being expressive. So in this case stating all the units might work better than having a concise progtam.

Starfighter@discuss.tchncs.de · 1 year ago

The uom crate implements this for Rust.

The core functionality is based on generics but there are some macros for defining custom measurement systems.

Turun@feddit.de · 1 year ago

I wonder if it would be possible with rust’s macro system?

I don’t know, but maybe check out numbat, it’s a new scientific calculator that is written in rust.

tiny_electron@beehaw.org · 1 year ago

I would have loved to use that when I was studying physics

silas@programming.dev · edit-2 1 year ago

Really cool! Reminds me a bit of the Numi calculator too

monotremata@kbin.social · 1 year ago

I do a lot of this stuff with the HP48 Units menu (albeit at this point via an emulator on my phone).

stilgar [he/him] @infosec.pub · 1 year ago

This looks like a lot of fun to use, I loved the example from What If, so many units!

Bluetreefrog@lemmy.world · 1 year ago

Reminds me of Mathcad and Calca