Octal Zero Prefix

The archaic convention of regarding a number with a leading zero as octal

The Problem

A number of computing environments take any number beginning with a zero as base 8 (octal) so that, for example,

020

does not mean twenty but is taken as sixteen, i.e. the 2 is taken to mean two eights.

This convention is followed not just in programming languages but also in other contexts which are more user-facing such as the Windows command prompt. This is a trap to catch the unwary. The comments below show some steps in how the convention came about and how to avoid being caught.

History

BCPL

An early manual for the programming language BCPL shows it used an underlined 8 as a prefix for octal literals. (Unlike today it was relatively common for there to be character entry conventions that allowed such things as underlining or overstriking. These issues were dealt with in the compiler's lexical analysis phase.)

Later, as shown under the heading Constants at http://www.davros.org/c/bcpl.html, BCPL used #<number> for octal and #x<number> for hexadecimal. Thus BCPL did not have the problem and there was no ambiguity.

B

When B was designed it used the rule, "an octal constant is the same as a decimal constant except that it begins with a zero." Thus the problem was introduced. Check out the heading 4.1 Primary Expressions in the Users' Reference to B on the PDP-11.

Note in the same paragraphs that 09 was allowed in B and meant 9 even though it began with a leading zero. It was the same value as 011. This may be bizarre but, in fairness, much worse choices were made by later language designers (who had more history to guide them).

C

Unfortunately when C was derived from B the leading-zero-means-octal convention remained. C accepts 020 as meaning 16, 011 as meaning 9 etc. For hexadecimal, C uses 0x as a prefix so 0x20 means 32.

It would have been better if B and C had required a clear prefix for octal such as 0t. Then the numbers above would be 0t20 and 0t11. (The letter "t" is used in the examples simply as it is the third letter of "octal" just as "x" is the third character of "hexadecimal" but another letter or character could have been used. The letter O is probably best avoided due to its resemblance to the digit 0. Some devices or fonts make them look almost identical.)

Unix

Unix was mainly written in C by the people who developed B and C and it was originally used by technical people. Perhaps for those reasons some Unix commands also accept a leading zero as indicating octal. Thus in Unix you can see

$ ping 192.168.1.020
PING 192.168.1.020 (192.168.1.16) 56(84) bytes of data.

Note that the 020 has become 16. This was taken from Linux (which derives from Unix) and shows the octal notation lives on.

Windows

Microsoft Windows adopts the same behaviour.

C:\>ping 192.168.1.020
Pinging 192.168.1.16 with 32 bytes of data:

Again, the 020 has become 16.

Python

A number of other programming languages follow suit. Python, which is written in C, reflects an inconsistent treatment of numbers with leading zeroes as shown here:

>>> print 020
16
>>> print 020.5
20.5

Adding the ".5" in the above makes Python interpret the number as decimal. Although a much more modern language this is a similar bizarre inconsistency to B's allowing 09, mentioned above. Note that what makes them especially bad is that both are silent inconsistencies. In other words the languages do not detect or report an error. Neither do they report the value as in the above examples unless explicitly told to do so. So a piece of code which reads
t1 = (x * 004 * (72 / y) - 101)
t2 = (x * 043 * (12 / y) - 020)
t3 = (x * 107 * (07 / y) - 015)

would be a silent mix of decimal and octal numbers.

The programmer may have used leading zeroes to space out the numbers and allow them to line up. Some languages would not complain but would simply assign unintended results. If the arithmetic results are not checked (or are checked against another computer context that also takes a leading zero to mean octal) the errors may persist and be part of a piece of code for many years.

Chmod

One final note is that some Unix commands such as chmod regard a numeric argument as octal whether it has a leading zero or not such as

$chmod 774 file1

In this case the "number" is not taken as a number. It is simply a group of digit characters.

What can be done?

As shown, a number of programming languages and commands treat numbers with a leading zero (but not a leading 0x) as octal - at least in some contexts. In almost identical contexts the number may be treated as decimal. This is a potential trap for those not aware of this behaviour.

Unfortunately it is not possible or practical to change the languages so the issue must be dealt with by one of these options.

  1. Avoid it by not using leading zeroes to align numbers - perhaps using spaces instead.
  2. Be aware of how the context - i.e. the command or the programming language you are using - treats numbers with leading zeroes.

In a program or a script it may be wise to add a comment to any use of leading zeroes specifying the intention.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License