scrolls

where thought scrolls on

Why Computers Think in Binary: The Magic of Bits, Overflow, and Floating-Point Reality


Have you ever stopped to wonder how your computer handles numbers under the hood? What seems like a simple task—adding, subtracting, or dividing—is actually constrained by strict rules and limitations. Unlike humans, computers operate in a world of finite precision and binary logic. In this post, we explore the foundational ideas of binary numbers and floating-point representation, and how they shape the digital universe.


Precision Has Limits: Welcome to Finite Math

In everyday life, we write huge numbers without thinking twice. But inside a computer, memory is limited. Every number is stored using a fixed number of bits, meaning only a finite number of values can be represented.

Imagine being allowed to write only three decimal digits. You could store anything from 000 to 999—but what about 1001, or -5, or 3.14159? These are outside the allowable set. This limitation leads to two types of errors:

  • Overflow: The result is too large (e.g., 600 + 600 = 1200, but 1200 can’t be stored).
  • Underflow: The result is too small, often approximated as zero.

Even fundamental properties like the associative law or distributive law can break down in finite-precision arithmetic. This is why numerical computing is both an art and a science.


Binary, Octal, and Hex—The Languages of Machines

Humans use base-10 numbers, but computers prefer base-2: binary. In binary, only two digits exist: 0 and 1.

Here’s how the number 2001 looks in different systems:

SystemRepresentation
Decimal (10)2001
Binary (2)11111010001
Octal (8)3721
Hex (16)7D1

To keep things unambiguous, notations like 0x7D1 (hex) or 11111010001₂ (binary) are often used.


How to Convert Between Systems

Binary ↔ Octal: Group bits in sets of three.
Binary ↔ Hex: Group bits in sets of four.
Decimal ↔ Binary: Use subtraction of powers of 2, or repeated division by 2.

For instance, converting 1492 to binary:

1492 ÷ 2 = 746 remainder 0  
746  ÷ 2 = 373 remainder 0  
...  
Continue until quotient is 0, then read remainders bottom-up

Representing Negative Numbers in Binary

There are several methods to store negative numbers:

  1. Signed Magnitude: First bit is the sign (0 = +, 1 = –).
  2. One’s Complement: Flip all bits for negative.
  3. Two’s Complement: Flip all bits, then add 1.
  4. Excess-N: Add a bias (e.g., +128 for 8-bit numbers).

Two’s complement is most common today. It avoids having “+0” and “–0” and simplifies arithmetic. But even this method can’t symmetrically represent all positive and negative values due to the even number of bit combinations.


Binary Arithmetic and Overflow

In binary math:

0 + 0 = 0  
0 + 1 = 1  
1 + 0 = 1  
1 + 1 = 0 (with carry)

Overflow detection is critical. If the carry into the sign bit doesn’t match the carry out of the sign bit, something went wrong.


Floating-Point Numbers: Riding the Precision Wave

When dealing with massive or tiny numbers (like 9 × 10⁻²⁸ or 2 × 10³³), we turn to floating-point representation—essentially scientific notation for machines:

n = f × 10^e

Here, f is the fraction (or mantissa) and e is the exponent. Computers use a similar method but in binary, often to base 2.

Normalized vs. Denormalized

  • Normalized: The first bit after the binary point is assumed to be 1 (saving space).
  • Denormalized: Used to handle underflows gracefully by sacrificing precision.

IEEE 754: The Floating-Point Bible

In the 1980s, IEEE standardized how floating-point numbers should work. Today, most CPUs follow this specification. Key features include:

FormatSizeExponent BiasFraction BitsRange
Single Precision32-bit12723 bits~±10³⁸
Double Precision64-bit102352 bits~±10³⁰⁸

Special values include:

  • Infinity (exp = all 1s, frac = 0)
  • NaN (“Not a Number” — result of undefined ops like ∞/∞)
  • Zero (exp = 0, frac = 0, with sign bit determining +0 or –0)
  • Denormalized numbers: Smooth transition toward 0 when precision can’t be maintained.

IEEE 754 also ensures consistent rounding and error handling across platforms.


Real vs. Floating-Point Numbers

Real numbers form a continuum. Floating-point numbers do not. Only a finite number can be represented, and between each pair, there may be a vast ocean of unrepresentable values.

Still, floating-point arithmetic gives us:

  • Huge dynamic range
  • Predictable rounding behavior
  • Special handling for edge cases (0, ∞, NaN)

It’s not perfect, but it’s an elegant compromise between speed, precision, and practicality.


Final Thoughts

Understanding how computers handle binary and floating-point numbers reveals the engineering trade-offs behind even the simplest calculations. The next time you see a rounding error or a floating-point glitch in your code, remember: it’s not a bug—it’s a fundamental part of how machines see the world.


Reference

Gajski, D. D. (1996). Principles of digital design.