Thursday, January 15, 2026

Nibble to Hexadecimal in 3 Cycles for Java, C, etc.

This is an overdue publication of a nibble-to-hex conversion I created early in 2025 (and, perhaps, one or more times in preceding years, without thinking twice about it). There is no telling how many programmers invented this before me, so I claim only its independent invention. Anyway…

Store a nibble in an “int” named “nibble” (its high 28 bits must be clear) then use one of the following expressions to obtain the corresponding ASCII/ISO-8859-1/Unicode code-point.

Nibble to Uppercase Hex Notation (3 Cycles)

(char) ((nibble | 0x30) + (9 - nibble >>> 29))

Nibble to Lowercase Hex Notation (4 Cycles)

(char) ((nibble | 0x30) + (9 - nibble >> 31 & 39))

(Java methods with detailed documentation, and C ports, are below the fold.)

Those cycle counts all depend on instruction-level parallelism, and immediate operands. They are accurate for x86. The ARM64 architecture can add a cycle due to a limitation in the SUB instruction's use of immediate operands, but, as I understand it, that extra cycle should not be seen in practice due to the dispatch width being great enough to allow load of operand 9 while the bitwise OR executes. (I spent many years programming in assembly on a daily basis, but that time has passed. I still look at microarchitecture documentation, but that does not produce an understanding of instruction sets and architectures comparable to someone who actually writes assembly code for the architectures. So, now, I double-check my conclusions with Gemini Pro.)

I’m unsatisfied with Java’s typical bits-to-hex methods (*.toHexString and printf). That’s why—not for the first time—I was examining an issue generally considered settled or unimportant. I hope other malcontents find value in this work.


Java Method Implementations

private static char nibbleToHexDigitUc
(
    final int               nibble
){
//  Repetition of the following expression may appear tedious, and invites mistakes, hence this method.
//  The expression is short enough for inlining to be a safe assumption.

/*  Clause 1, "nibble | 0x30" (equivalent to "nibble + '0'" in US-ASCII), produces "int" value "c1", ranging
    from 0x30 ('0') for "nibble" 0x0, to 0x3F ('?') for "nibble" 0xF.

    Clause 2, "9 - nibble", differentiates nibbles represented by US-ASCII characters '0'-'9' from those
    represented by 'A'-'F':
        (A) non-negative when "nibble" ≤ 9, or
        (B) negative when "nibble" ≥ 0xA.

    Clause 3, "c2 >>> 29", uses the high 3 bits of "c2" to produce "int" value "c3", the "c1" addend converting
    US-ASCII characters ':'-'?' (produced from nibbles ≥ 0xA) to characters 'A'-'F', while leaving characters
    '0'-'9' unchanged:
        (A) 0 from "c2A", or
        (B) 7 from "c2B".
    Because "c3" is either 0, or a sequence of consecutive 1 bits, this clause is not adaptable to lowercase
    hex production, which would require a "c3B" value of 39 (0b100111).

    Clause 4, "c1 + c3" produces "int" value "c4": the US-ASCII code-point of the hexadecimal digit representing
    "nibble".

    Note: The whole expression allows two instances of instruction-level parallelism: clauses 1 and 2. That will
          not provide a great performance gain, because every operator is single-cycle, and eliminating one cycle
          is good, but not "great". However, some parallelism is better than none, and executing 50% of its
          operators in parallel is better than most expressions.
*/
    //noinspection MagicNumber
    return (char) ((nibble | 0x30) + (9 - nibble >>> 29));
}
private static char nibbleToHexDigitLc
(
    final int               nibble
){
//  Repetition of the following expression may appear tedious, and invites mistakes, hence this method.
//  The expression is short enough for inlining to be a safe assumption.

/*  Clause 1, "nibble | 0x30" (equivalent to "nibble + '0'" in US-ASCII), produces "int" value "c1", ranging
    from 0x30 ('0') for "nibble" 0x0, to 0x3F ('?') for "nibble" 0xF.

    Clause 2, "9 - nibble", differentiates nibbles represented by US-ASCII characters '0'-'9' from those
    represented by 'A'-'F':
        (A) non-negative when "nibble" ≤ 9, or
        (B) negative when "nibble" ≥ 0xA.

    Clause 3, "c2 >> 31", produces "int" value "c3":
        (A)  0 from "c2A", or
        (B) -1 from "c2B".
    Thus, "c3" is an all-or-nothing mask determining whether clause 4 produces an addend to alter "c1", or
    leave it unchanged.

    Clause 4, "c3 & 39", produces "int" value "c4", the "c1" addend converting US-ASCII characters ':'-'?'
    (produced from nibbles ≥ 0xA) to characters 'a'-'f', while leaving characters '0'-'9' unchanged:
        (A)  0 when "nibble" ≤ 9, or
        (B) 39 when "nibble" ≥ 0xA.
    Value c4B makes this clause responsible for the letter case of hexadecimal digits > 9. As is, hexadecimal
    digits > 9 are lowercase. Uppercase digits would result from replacing 39 with 7 ... *except* there is no
    point to doing so, because a simpler, faster expression produces uppercase hex.

    Clause 5, "c1 + c4" produces "int" value "c5": the US-ASCII code-point of the hexadecimal digit representing
    "nibble".

    Note: The whole expression allows two instances of instruction-level parallelism: clauses 1 and 2. That will
          not provide a significant performance gain (every operator is single-cycle, so only one cycle is saved),
          but some parallelism is better than none, and executing 40% of its operators in parallel is better than
          most expressions.
*/
    //noinspection MagicNumber
    return (char) ((nibble | 0x30) + (9 - nibble >> 31 & 39));
}

Those are the implementations for Java, my language of choice since the late 1990s, so, if porting, keep in mind they: (1) implicitly operate on 32-bit, signed integers throughout, and (2) rely on an unsigned right shift (JLS terminology), otherwise known as a “logical right shift” (typical assembly language terminology).

C, and, I think, most C-inspired languages, lack an equivalent operator. (Originally, C didn’t specify the sign-extension behavior of its right shift operator, >>, so one might encounter either behavior depending on the CPU and the compiler, although arithmetic [sign-extending] right shifts seem to have been the most common.) Although I spent 15 years programming in C (now designated “K&R C” or “C78”), I never encountered any of C’s standardized versions, and my only C documentation remains the 1978 edition of K&R. However, I gather C99 standardized logical right shift for unsigned integers, so the ports below are deterministic, unless one can find a CPU that doesn’t use two’s-complement arithmetic on signed integers (good luck with that).

Nibble to Uppercase Hex Notation, C99, 3 Cycles

Where “nibble” is of type “int32_t”, its high 28 bits are clear, and two’s-complement arithmetic remains the norm:

(nibble | 0x30) + ((uint32_t) (9 - nibble) >> 29)

Nibble to Lowercase Hex Notation, C99, 5 Cycles

Where “nibble” is of type “int32_t”, and its high 28 bits are clear, and two’s-complement arithmetic remains the norm:

(nibble | 0x30) + (-(int32_t) ((uint32_t) 9 - nibble) >> 31) & 39)

Nibble to Lowercase Hex Notation, C23, 4 Cycles

Where “nibble” is of type “int32_t”, and its high 28 bits are clear:

(nibble | 0x30) + (9 - nibble >> 31 & 39)

No comments:

Post a Comment