The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
In most contexts the SI prefixeskilo-, mega- and giga- mean 1 thousand, 1 million and 1 (short scale) billion, respectively, as in one kilogram = one thousand grams, one megajoule = one million joules and one gigawatt = one billion watts. In symbols: 1 kg = 1,000 g; 1 MJ = 1,000,000 J; 1 GW = 1,000,000,000 W.
In computer science the units kilobyte, megabyte and gigabyte (symbols kB, MB and GB) were originally used in this standard decimal sense to mean 1,000 and 1,000,000 and 1,000,000,000 bytes, respectively. In symbols: 1 kB = 1000 B; 1 MB = 10002 B; 1 GB = 10003 B.
However, in modern use (and depending on the context), the same three symbols sometimes have a binary meaning. The binary definitions of these three symbols are 1 KB = 1024 B; 1 MB = 10242 B;[1] 1 GB = 10243 B. In this context it is customary to use an upper case "K" instead of the SI prefix "k", for kilo.
The computer itself does not account for the number of bytes using binary prefixes, but someone in the 1980s decided to report memory, file and HDD size in this manner. As such, the use of binary prefixes is only a convention. Altering this convention to agree with SI Prefixes such as in Apple's 2009 "Snow Leopard" release and Ubuntu could have been done at any time; however, it stuck this way for much of the computer industry.[2]
For many applications (primarily the storage capacity of hard disk drives and data rates for telecommunications), the decimal convention is retained, whereby one kilobit is exactly one thousand bits and one megabyte is exactly one million bytes.[3]
There are many WP articles in which the same symbol (eg MB) is used with two different meanings, often hopping between them in the same paragraph or section, sometimes even in the same sentence. This dual use creates confusion and a corresponding need to disambiguate.
These ambiguous usages are common beyond Wikipedia and have led to litigation.
Problems get successively worse with higher values prefixes tera- (10004 vs 10244), peta- (10005 vs 10245), etc. The highest value SI prefix for which a binary counterpart has been defined is yotta-, meaning 10008. The corresponding binary prefix yobi- means 10248 (≈1.21×1024), which differs by 21 % from the conventional decimal interpretation of yotta-.
In December 1998, in an attempt to resolve the ambiguity the International Electrotechnical Commission (IEC) introduced a new set of prefixes kibi-, mebi- and gibi- for the binary meanings, with symbols Ki-, Mi- and Gi- so that 1 KiB (one kibibyte) = 1024 B, 1 MiB (one mebibyte) = 10242 B and 1 GiB (one gibibyte) = 10243 B. In the IEC standard, the prefixes kilo-, mega- etc are reserved for their original decimal meanings.
Why Wikipedia should not deprecate the use of IEC prefixes
IEC prefixes are unambiguous, succinct, simple to use and simple to understand.
The use of IEC prefixes is endorsed by national and international standards bodies.
The use of one symbol (e.g. GB) to mean two different things in the same article creates confusion and ambiguity. Despite this ambiguity, there are many WP articles in which kilobyte, megabyte and/or gigabyte are used in this way. In this situation, the IEC prefixes provide an ideal disambiguation tool because they are unambiguous and succinct.
Deprecation (of IEC prefixes) increases the difficulty threshold for disambiguation, reducing the rate at which articles can be disambiguated by expert editors.
In turn this reduces the total number of articles that can be further improved by less expert editors with footnotes etc (assuming that there is consensus to do so).
Deprecation is interpreted by some editors as a justification for changing unambiguous units into ambiguous ones.
Removing IEC prefixes from articles, even when disambiguated with footnotes, destroys a part of the information that was there before, because it requires an expert to work out which footnote corresponds to which use in the article.
In the long term, the use of IEC prefixes would ultimately avoid the need to use same symbol (e.g., MB) with two different meanings. This may sound like a pipe dream, but it could be implemented as a user preference, so that readers could choose between familiar (ambiguous) units and (unfamiliar) unambiguous ones.
The main argument for not using IEC prefixes is the unfamiliarity of, for example, the mebibyte (MiB) compared with the megabyte (MB). The unfamiliarity is not disputed, but is not relevant to disambiguation. The point is that disambiguation is rare and therefore all disambiguation methods are unfamiliar.
Alternative disambiguation methods are either cumbersome (i.e., exact numbers of bytes), difficult and time-consuming to implement in a manner that is clear to the reader (i.e., footnotes)[4] or unlikely to be understood (i.e. exponentiation).
In conclusion, disambiguation is not easy, so it would be unwise to discard the simplest disambiguation tool at our disposal just because it is unfamiliar to some readers. The best disambiguation method has yet to be established, so it is premature to deprecate this one.
^According to the LBA Count for IDE Hard Disk Drives Standard from the website of the International Disk Drive Equipment and Materials Association (IDEMA), there are 1,000,194,048 bytes (1,953,504 logical blocks x 512 bytes/logical block) per nominal gigabyte of hard drive storage.
^This problem is illustrated by Address space layout randomization, which includes the confusing disambiguation footnote "Transistorized memory, such as RAM and cache sizes (other than solid state disk devices such as USB drives, CompactFlash cards, and so on) as well as CD-based storage size are specified using binary meanings for K (10241), M (10242), G (10243), ..."