The BLAKE hash function was one of the five entries for SHA-3 standards in the NIST (National Institute of Standards and Technology) hash function competition that made it to the final round. Other finalists include Grøstl, JH, Keccak and Skein. The competition was held between 2007 and 2012 and the Keccak hash function was announced as the winner and the new standard for the SHA-3 hash algorithm.

The BLAKE hash functions follow the HAIFA iteration mode: the compression function depends on the salt and the number of bits hashed so far (counter), to compress each message block with a distinct function. BLAKE repeatedly combines an 8-word hash value with 16 message words, truncating the ChaCha result to obtain the next hash value. BLAKE-256 and BLAKE-224 use 32-bit words and produce digest sizes of 256 bits and 224 bits, respectively, while BLAKE-512 and BLAKE-384 use 64-bit words and produce digest sizes of 512 bits and 384 bits, respectively.

An implementation of BLAKE is claimed to require low resources and is fast in both software and hardware environments. The successor of the BLAKE hash function is the BLAKE2 hash function, which was announced in 2012 and the BLAKE3 hash function, based on BLAKE2, was announced in 2020.

## The algorithm for the BLAKE-256 hash function

BLAKE hash function has mainly two variants, one which uses 32-bit words and another one that uses 64-bit words, they produce a hash digest of length 256-bit and 512-bit respectively. BLAKE-256 works on a file or the message of a maximum length of (264 - 1) and produces a hexadecimal output of length 64 i.e. 256-bit hash. Major steps involved in the BLAKE hash function are

### Padding bits

The data to be hashed (at most (2^{64} −1) bits) is first padded such that its length becomes congruent to 447 modulo 512. It is then split into 512-bit blocks. It involves two steps:

- Append to the data a bit “1” followed by the minimal (possibly zero) number of bits “0” so that the total length is congruent to 447 modulo 512. Thus, at least one bit and at most 512 are appended.
- Append a bit “1” to the data, followed by a 64-bit unsigned big-endian representation of the original length of data. Then, the 512-bit blocks are divided into 16 32-bit sub-blocks.

This process makes sure that the bit length of the padded data is a multiple of 512.

### Initialization

A 512-bit (16 words) initial state is initialized and represented as a 4 x 4 array. It is initialized with *h, s, t* and word constants as shown below

Here, *h* represents the chain value, *s* is the salt and *c* refers to 16 constant values (the leading 512 or 1024 bits of the fractional part of π), and *t* is the counter. It also uses a table of 10 16-element permutations. The salt *s* is chosen by the user, and set to the null value when no salt is required (i.e., *s _{0} = s_{1} = s_{2} = s_{3} = 0*).

*BLAKE-256(m, s) = h*, where

^{N}*m*is the (non-padded) message, and

*s*is the salt. The notation

*BLAKE-256(m)*denotes the hash of

*m*when no salt is used (i.e.,

*s = 0*).

### Compression using the round function

A round function iterates 14 times on the state* v* and uses a single core function *G* eight times in a round, i.e. 112 times. The core operation G, equivalent to ChaCha’s quarter round, operates on a 4-word column or diagonal *a, b, c, d*, combined with two words of message *m[ ]* and two constant words *c[ ]*. It is performed eight times per full round. The operation is depicted visually as-

The operation can also be depicted in form of equations as-

a ← a + b + (m_{σr(2i)} ⊕ c_{σr(2i+1)} )

d ← (d ⊕ a) ≫ 16

c ← c + d

b ← (b ⊕ c) ≫ 12

a ← a + b + (m_{σr(2i+1)} ⊕ c_{σr(2i)} )

d ← (d ⊕ a) ≫ 8

c ← c + d

b ← (b ⊕ c) ≫ 7

Above, *r* is the round number (0–13), at round *r ≥ 10*, the permutation used is *σ _{r mod 10}*, for example, at the last round

*(r = 15)*, the permutation

*σ*is used and

_{15 mod 10}= σ_{5 }*i*varies from 0 to 7. Also, the addition here is addition modulo

*2*and ⊕ refers to the XOR operation. (

^{32}*a >>> s*) means right shift operation by

*s-bit*on

*a*.

Here, *G* is applied to the word matrix in two ways: **column-wise **and **diagonally**. Firstly, G is applied to each column with four state variables (say *v _{0} , v_{4}, v_{8}* and

*v*) and the values of those state variables are updated, next, the core function

_{12}*G*is applied diagonally, as depicted with different colours in figure 4. The order of the round functions is depicted by the number given in the figure below. Similarly, each of the 14 rounds has 4 column steps and 4 diagonal steps respectively.

In each round function, the process goes on as depicted below:

### Finalisation

The new chain values are extracted from the modified state variables. XOR operations are performed on them along with salts and the initial chain values to get the final hash value. It is depicted as

The resultant values h’_{0} … h’_{7} make the hash value for the message and are represented in big-endian representation.

## Implementation in C

The code is uploaded on GitHub and is open for contribution, it can be made compatible with file input and can be optimized further.

```
#include <string.h>
#include <stdio.h>
#include <stdint.h>
typedef struct
{
uint32_t h[8], s[4], t[2];
int buflen, nullt;
uint8_t buf[64];
} state256;
#define U8TO32_BIG(p) \
(((uint32_t)((p)[0]) << 24) | ((uint32_t)((p)[1]) << 16) | \
((uint32_t)((p)[2]) << 8) | ((uint32_t)((p)[3])))
#define U32TO8_BIG(p, v) \
(p)[0] = (uint8_t)((v) >> 24); \
(p)[1] = (uint8_t)((v) >> 16); \
(p)[2] = (uint8_t)((v) >> 8); \
(p)[3] = (uint8_t)((v));
const uint8_t sigma[][16] =
{
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15},
{14, 10, 4, 8, 9, 15, 13, 6, 1, 12, 0, 2, 11, 7, 5, 3},
{11, 8, 12, 0, 5, 2, 15, 13, 10, 14, 3, 6, 7, 1, 9, 4},
{7, 9, 3, 1, 13, 12, 11, 14, 2, 6, 5, 10, 4, 0, 15, 8},
{9, 0, 5, 7, 2, 4, 10, 15, 14, 1, 11, 12, 6, 8, 3, 13},
{2, 12, 6, 10, 0, 11, 8, 3, 4, 13, 7, 5, 15, 14, 1, 9},
{12, 5, 1, 15, 14, 13, 4, 10, 0, 7, 6, 3, 9, 2, 8, 11},
{13, 11, 7, 14, 12, 1, 3, 9, 5, 0, 15, 4, 8, 6, 2, 10},
{6, 15, 14, 9, 11, 3, 0, 8, 12, 2, 13, 7, 1, 4, 10, 5},
{10, 2, 8, 4, 7, 6, 1, 5, 15, 11, 9, 14, 3, 12, 13, 0},
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15},
{14, 10, 4, 8, 9, 15, 13, 6, 1, 12, 0, 2, 11, 7, 5, 3},
{11, 8, 12, 0, 5, 2, 15, 13, 10, 14, 3, 6, 7, 1, 9, 4},
{7, 9, 3, 1, 13, 12, 11, 14, 2, 6, 5, 10, 4, 0, 15, 8},
{9, 0, 5, 7, 2, 4, 10, 15, 14, 1, 11, 12, 6, 8, 3, 13},
{2, 12, 6, 10, 0, 11, 8, 3, 4, 13, 7, 5, 15, 14, 1, 9}};
const uint32_t constant[16] =
{
0x243f6a88, 0x85a308d3, 0x13198a2e, 0x03707344,
0xa4093822, 0x299f31d0, 0x082efa98, 0xec4e6c89,
0x452821e6, 0x38d01377, 0xbe5466cf, 0x34e90c6c,
0xc0ac29b7, 0xc97c50dd, 0x3f84d5b5, 0xb5470917};
static const uint8_t padding[129] =
{
0x80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
// initialization of states
void initialize(state256 *S)
{
S->h[0] = 0x6a09e667;
S->h[1] = 0xbb67ae85;
S->h[2] = 0x3c6ef372;
S->h[3] = 0xa54ff53a;
S->h[4] = 0x510e527f;
S->h[5] = 0x9b05688c;
S->h[6] = 0x1f83d9ab;
S->h[7] = 0x5be0cd19;
S->t[0] = S->t[1] = S->buflen = S->nullt = 0;
S->s[0] = S->s[1] = S->s[2] = S->s[3] = 0;
}
// core function
void core_function(state256 *S, const uint8_t *block)
{
// states and message block - 32-bit each
uint32_t v[16], m[16], i;
// Shift
#define ROT(x, n) (((x) << (32 - n)) | ((x) >> (n)))
// core function
#define G(a, b, c, d, e) \
v[a] += (m[sigma[i][e]] ^ constant[sigma[i][e + 1]]) + v[b]; \
v[d] = ROT(v[d] ^ v[a], 16); \
v[c] += v[d]; \
v[b] = ROT(v[b] ^ v[c], 12); \
v[a] += (m[sigma[i][e + 1]] ^ constant[sigma[i][e]]) + v[b]; \
v[d] = ROT(v[d] ^ v[a], 8); \
v[c] += v[d]; \
v[b] = ROT(v[b] ^ v[c], 7);
// convert take 8-bit blocks into 32-bit and big-endian format
for (i = 0; i < 16; ++i)
m[i] = U8TO32_BIG(block + i * 4);
// initial states
for (i = 0; i < 8; ++i)
v[i] = S->h[i];
// rest states
v[8] = S->s[0] ^ constant[0];
v[9] = S->s[1] ^ constant[1];
v[10] = S->s[2] ^ constant[2];
v[11] = S->s[3] ^ constant[3];
v[12] = constant[4];
v[13] = constant[5];
v[14] = constant[6];
v[15] = constant[7];
// XOR with t is not required when the block has padding-bits
if (!S->nullt)
{
v[12] ^= S->t[0];
v[13] ^= S->t[0];
v[14] ^= S->t[1];
v[15] ^= S->t[1];
}
// run the core function 14 times for blake 256-hash
for (i = 0; i < 14; ++i)
{
// column step
G(0, 4, 8, 12, 0);
G(1, 5, 9, 13, 2);
G(2, 6, 10, 14, 4);
G(3, 7, 11, 15, 6);
// diagonal step
G(0, 5, 10, 15, 8);
G(1, 6, 11, 12, 10);
G(2, 7, 8, 13, 12);
G(3, 4, 9, 14, 14);
}
// generating the hash with all updated states
for (i = 0; i < 16; ++i)
S->h[i % 8] ^= v[i];
for (i = 0; i < 8; ++i)
S->h[i] ^= S->s[i % 4];
}
// update the length of the block left and to fill
void add_padding(state256 *S, const uint8_t *in, uint64_t inlen)
{
// space left
int left = S->buflen;
// printf("\nleft = %d\n", left);
// space left
int fill = 64 - left;
// printf("fill = %d\n", fill);
// data left is not null and length to be left is greater than available
if (left && (inlen >= fill))
{
// printf("1st condition\n");
memcpy((void *)(S->buf + left), (void *)in, fill);
S->t[0] += 512;
// printf("S->t[0] = %d\n", S->t[0]);
if (S->t[0] == 0)
S->t[1]++;
// for (int j = 0; j < 64; j++)
// {
// printf("S->buf[%d] = %d\n", j, S->buf[j]);
// }
// printf("\n");
core_function(S, S->buf);
in += fill;
inlen -= fill;
// printf("inlen = %d\n", inlen);
left = 0;
}
// if meesage is greater than length of 64
while (inlen >= 64)
{
// printf("Condition 2\n");
S->t[0] += 512;
if (S->t[0] == 0)
S->t[1]++;
core_function(S, in);
in += 64;
inlen -= 64;
}
// if the message when block is empty
if (inlen > 0)
{
// printf("Condition 3\n");
memcpy((void *)(S->buf + left), (void *)in, (size_t)inlen);
S->buflen = left + (int)inlen;
// printf("inlen = %d\n", inlen);
// printf("buflen = %d\n", S->buflen);
// for (int j = 0; j < 64; j++)
// {
// printf("S->buf[%d] = %d\n", j, S->buf[j]);
// }
}
else
S->buflen = 0;
}
// finalize blake 256
void final(state256 *S, uint8_t *out)
{
uint8_t msglen[8], zo = 0x01, oo = 0x81;
uint32_t lo = S->t[0] + (S->buflen << 3), hi = S->t[1];
// printf("lo = %d\n", lo);
// printf("hi = %d\n", hi);
// space fill is less than greater than 2^32 bits
if (lo < (S->buflen << 3))
hi++;
// get the message in 8-bit big-endian format
U32TO8_BIG(msglen + 0, hi);
U32TO8_BIG(msglen + 4, lo);
// print the message
// for (int i = 0; i < 8; i++)
// {
// printf("msglen[%d] = %d\n", i, msglen[i]);
// }
// only one byte for padding is fill
if (S->buflen == 55)
{
S->t[0] -= 8;
add_padding(S, &oo, 1);
}
else
{
// atleast 2 bytes are available for padding
if (S->buflen < 55)
{
// if buflen is 0
if (!S->buflen)
S->nullt = 1;
S->t[0] -= 440 - (S->buflen << 3);
// printf("S[t[0]] = %d\n", S->t[0]);
// printf("buflen = %d\n", S->buflen);
add_padding(S, padding, 55 - S->buflen);
}
else
{
S->t[0] -= 512 - (S->buflen << 3);
add_padding(S, padding, 64 - S->buflen);
S->t[0] -= 440;
add_padding(S, padding + 1, 55);
S->nullt = 1;
}
// add one after padding 0 bits
add_padding(S, &zo, 1);
S->t[0] -= 8;
// printf("S->t[0] = %d\n", S->t[0]);
}
S->t[0] -= 64;
// printf("S->t[0] = %d\n", S->t[0]);
// for (int j = 0; j < 64; j++)
// {
// printf("S->buf[%d] = %d\n", j, S->buf[j]);
// }
add_padding(S, msglen, 8);
// converting the 32-bit blocks into 8-bit hash output in big-endian
U32TO8_BIG(out + 0, S->h[0]);
U32TO8_BIG(out + 4, S->h[1]);
U32TO8_BIG(out + 8, S->h[2]);
U32TO8_BIG(out + 12, S->h[3]);
U32TO8_BIG(out + 16, S->h[4]);
U32TO8_BIG(out + 20, S->h[5]);
U32TO8_BIG(out + 24, S->h[6]);
U32TO8_BIG(out + 28, S->h[7]);
}
void blake_256(uint8_t *out, const uint8_t *in, uint64_t inlen)
{
state256 S;
initialize(&S);
add_padding(&S, in, inlen);
final(&S, out);
}
int main(int argc, char **argv)
{
// take the message
char *in = "";
// length of message
size_t msg_len = strlen(in);
// output array - this will contain the final hash
uint8_t out[32];
// invoke the hashing function
blake_256(out, in, msg_len);
// print the hash in hexadecimal form
printf("BLAKE256 HASH for \"\": ");
for (int i = 0; i < 32; ++i)
{
printf("%x", out[i]);
}
return 0;
}
```

**Output:**

`BLAKE256 HASH for "": 716f6e863f744b9ac22c97ec7b76ea5f598bc5b2f67c61510bfc4751384ea7a`

BLAKE is similar to the MD5 hash function in its initial steps, you can read more about the working and uses of the MD5 hash function here.

## References

- SHA-3 proposal BLAKE, version 1.3. [Source]- SHA-3 proposal BLAKE (aumasson.jp).
- The Hash Function BLAKE- Jean-Philippe Aumasson, Willi Meier Raphael C.-W. Phan, Luca Henzen

## 0 Comments