Working and Use of BLAKE Hash Function with C code

The BLAKE hash function was one of the five entries for SHA-3 standards in the NIST (National Institute of Standards and Technology) hash function competition that made it to the final round. Other …

11 mins
Working and Use of BLAKE Hash Function with C code

The BLAKE hash function was one of the five entries for SHA-3 standards in the NIST (National Institute of Standards and Technology) hash function competition that made it to the final round. Other finalists include Grøstl, JH, Keccak and Skein. The competition was held between 2007 and 2012 and the Keccak hash function was announced as the winner and the new standard for the SHA-3 hash algorithm.

The BLAKE hash functions follow the HAIFA iteration mode: the compression function depends on the salt and the number of bits hashed so far (counter), to compress each message block with a distinct function. BLAKE repeatedly combines an 8-word hash value with 16 message words, truncating the ChaCha result to obtain the next hash value. BLAKE-256 and BLAKE-224 use 32-bit words and produce digest sizes of 256 bits and 224 bits, respectively, while BLAKE-512 and BLAKE-384 use 64-bit words and produce digest sizes of 512 bits and 384 bits, respectively.

An implementation of BLAKE is claimed to require low resources and is fast in both software and hardware environments. The successor of the BLAKE hash function is the BLAKE2 hash function, which was announced in 2012 and the BLAKE3 hash function, based on BLAKE2, was announced in 2020.

The algorithm for the BLAKE-256 hash function

BLAKE hash function has mainly two variants, one which uses 32-bit words and another one that uses 64-bit words, they produce a hash digest of length 256-bit and 512-bit respectively. BLAKE-256 works on a file or the message of a maximum length of (264 - 1) and produces a hexadecimal output of length 64 i.e. 256-bit hash. Major steps involved in the BLAKE hash function are

Padding bits

The data to be hashed (at most (264 −1) bits) is first padded such that its length becomes congruent to 447 modulo 512. It is then split into 512-bit blocks. It involves two steps:

  1. Append to the data a bit “1” followed by the minimal (possibly zero) number of bits “0” so that the total length is congruent to 447 modulo 512. Thus, at least one bit and at most 512 are appended.
  2. Append a bit “1” to the data, followed by a 64-bit unsigned big-endian representation of the original length of data. Then, the 512-bit blocks are divided into 16 32-bit sub-blocks.

This process makes sure that the bit length of the padded data is a multiple of 512.

Padding in BLAKE-256

Initialization

A 512-bit (16 words) initial state is initialized and represented as a 4 x 4 array. It is initialized with h, s, t and word constants as shown below

Constants in BLAKE-256

Here, h represents the chain value, s is the salt and c refers to 16 constant values (the leading 512 or 1024 bits of the fractional part of π), and t is the counter. It also uses a table of 10 16-element permutations. The salt s is chosen by the user, and set to the null value when no salt is required (i.e., s0 = s1 = s2 = s3 = 0). BLAKE-256(m, s) = hN, where m is the (non-padded) message, and s is the salt. The notation BLAKE-256(m) denotes the hash of m when no salt is used (i.e., s = 0).

Permutation Table

Compression using the round function

A round function iterates 14 times on the state v and uses a single core function G eight times in a round, i.e. 112 times. The core operation G, equivalent to ChaCha’s quarter round, operates on a 4-word column or diagonal a, b, c, d, combined with two words of message m[ ] and two constant words c[ ]. It is performed eight times per full round. The operation is depicted visually as-

G-function

The operation can also be depicted in form of equations as-

a ← a + b + (mσr(2i) ⊕ cσr(2i+1) )
d ← (d ⊕ a) ≫ 16
c ← c + d
b ← (b ⊕ c) ≫ 12
a ← a + b + (mσr(2i+1) ⊕ cσr(2i) )
d ← (d ⊕ a) ≫ 8
c ← c + d
b ← (b ⊕ c) ≫ 7

Above, r is the round number (0–13), at round r ≥ 10, the permutation used is σr mod 10, for example, at the last round (r = 15), the permutation σ15 mod 10 = σ5 is used and i varies from 0 to 7. Also, the addition here is addition modulo 232 and ⊕ refers to the XOR operation. (a >>> s) means right shift operation by s-bit on a

Here, G is applied to the word matrix in two ways: column-wise and diagonally. Firstly, G is applied to each column with four state variables (say v0 , v4, v8 and v12) and the values of those state variables are updated, next, the core function G is applied diagonally, as depicted with different colours in figure 4. The order of the round functions is depicted by the number given in the figure below. Similarly, each of the 14 rounds has 4 column steps and 4 diagonal steps respectively.

Column and Diagonal operations

In each round function, the process goes on as depicted below:

Process of round function

Finalisation

The new chain values are extracted from the modified state variables. XOR operations are performed on them along with salts and the initial chain values to get the final hash value. It is depicted as

Finalization

The resultant values h’0 … h’7 make the hash value for the message and are represented in big-endian representation.

Implementation in C

The code is uploaded on GitHub and is open for contribution, it can be made compatible with file input and can be optimized further.

#include <string.h>
#include <stdio.h>
#include <stdint.h>

typedef struct
{
  uint32_t h[8], s[4], t[2];
  int buflen, nullt;
  uint8_t buf[64];
} state256;

#define U8TO32_BIG(p)                                        \
  (((uint32_t)((p)[0]) << 24) | ((uint32_t)((p)[1]) << 16) | \
   ((uint32_t)((p)[2]) << 8) | ((uint32_t)((p)[3])))

#define U32TO8_BIG(p, v)         \
  (p)[0] = (uint8_t)((v) >> 24); \
  (p)[1] = (uint8_t)((v) >> 16); \
  (p)[2] = (uint8_t)((v) >> 8);  \
  (p)[3] = (uint8_t)((v));

const uint8_t sigma[][16] =
    {
        {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15},
        {14, 10, 4, 8, 9, 15, 13, 6, 1, 12, 0, 2, 11, 7, 5, 3},
        {11, 8, 12, 0, 5, 2, 15, 13, 10, 14, 3, 6, 7, 1, 9, 4},
        {7, 9, 3, 1, 13, 12, 11, 14, 2, 6, 5, 10, 4, 0, 15, 8},
        {9, 0, 5, 7, 2, 4, 10, 15, 14, 1, 11, 12, 6, 8, 3, 13},
        {2, 12, 6, 10, 0, 11, 8, 3, 4, 13, 7, 5, 15, 14, 1, 9},
        {12, 5, 1, 15, 14, 13, 4, 10, 0, 7, 6, 3, 9, 2, 8, 11},
        {13, 11, 7, 14, 12, 1, 3, 9, 5, 0, 15, 4, 8, 6, 2, 10},
        {6, 15, 14, 9, 11, 3, 0, 8, 12, 2, 13, 7, 1, 4, 10, 5},
        {10, 2, 8, 4, 7, 6, 1, 5, 15, 11, 9, 14, 3, 12, 13, 0},
        {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15},
        {14, 10, 4, 8, 9, 15, 13, 6, 1, 12, 0, 2, 11, 7, 5, 3},
        {11, 8, 12, 0, 5, 2, 15, 13, 10, 14, 3, 6, 7, 1, 9, 4},
        {7, 9, 3, 1, 13, 12, 11, 14, 2, 6, 5, 10, 4, 0, 15, 8},
        {9, 0, 5, 7, 2, 4, 10, 15, 14, 1, 11, 12, 6, 8, 3, 13},
        {2, 12, 6, 10, 0, 11, 8, 3, 4, 13, 7, 5, 15, 14, 1, 9}};

const uint32_t constant[16] =
    {
        0x243f6a88, 0x85a308d3, 0x13198a2e, 0x03707344,
        0xa4093822, 0x299f31d0, 0x082efa98, 0xec4e6c89,
        0x452821e6, 0x38d01377, 0xbe5466cf, 0x34e90c6c,
        0xc0ac29b7, 0xc97c50dd, 0x3f84d5b5, 0xb5470917};

static const uint8_t padding[129] =
    {
        0x80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};

// initialization of states
void initialize(state256 *S)
{
  S->h[0] = 0x6a09e667;
  S->h[1] = 0xbb67ae85;
  S->h[2] = 0x3c6ef372;
  S->h[3] = 0xa54ff53a;
  S->h[4] = 0x510e527f;
  S->h[5] = 0x9b05688c;
  S->h[6] = 0x1f83d9ab;
  S->h[7] = 0x5be0cd19;
  S->t[0] = S->t[1] = S->buflen = S->nullt = 0;
  S->s[0] = S->s[1] = S->s[2] = S->s[3] = 0;
}

// core function
void core_function(state256 *S, const uint8_t *block)
{
  // states and message block - 32-bit each
  uint32_t v[16], m[16], i;
  // Shift
#define ROT(x, n) (((x) << (32 - n)) | ((x) >> (n)))
// core function
#define G(a, b, c, d, e)                                       \
  v[a] += (m[sigma[i][e]] ^ constant[sigma[i][e + 1]]) + v[b]; \
  v[d] = ROT(v[d] ^ v[a], 16);                                 \
  v[c] += v[d];                                                \
  v[b] = ROT(v[b] ^ v[c], 12);                                 \
  v[a] += (m[sigma[i][e + 1]] ^ constant[sigma[i][e]]) + v[b]; \
  v[d] = ROT(v[d] ^ v[a], 8);                                  \
  v[c] += v[d];                                                \
  v[b] = ROT(v[b] ^ v[c], 7);

  // convert take 8-bit blocks into 32-bit and big-endian format
  for (i = 0; i < 16; ++i)
    m[i] = U8TO32_BIG(block + i * 4);

  // initial states
  for (i = 0; i < 8; ++i)
    v[i] = S->h[i];

  // rest states
  v[8] = S->s[0] ^ constant[0];
  v[9] = S->s[1] ^ constant[1];
  v[10] = S->s[2] ^ constant[2];
  v[11] = S->s[3] ^ constant[3];
  v[12] = constant[4];
  v[13] = constant[5];
  v[14] = constant[6];
  v[15] = constant[7];

  // XOR with t is not required when the block has padding-bits
  if (!S->nullt)
  {
    v[12] ^= S->t[0];
    v[13] ^= S->t[0];
    v[14] ^= S->t[1];
    v[15] ^= S->t[1];
  }

  // run the core function 14 times for blake 256-hash
  for (i = 0; i < 14; ++i)
  {
    // column step
    G(0, 4, 8, 12, 0);
    G(1, 5, 9, 13, 2);
    G(2, 6, 10, 14, 4);
    G(3, 7, 11, 15, 6);
    // diagonal step
    G(0, 5, 10, 15, 8);
    G(1, 6, 11, 12, 10);
    G(2, 7, 8, 13, 12);
    G(3, 4, 9, 14, 14);
  }

  // generating the hash with all updated states
  for (i = 0; i < 16; ++i)
    S->h[i % 8] ^= v[i];

  for (i = 0; i < 8; ++i)
    S->h[i] ^= S->s[i % 4];
}

// update the length of the block left and to fill
void add_padding(state256 *S, const uint8_t *in, uint64_t inlen)
{
  // space left
  int left = S->buflen;
  // printf("\nleft = %d\n", left);

  // space left
  int fill = 64 - left;
  // printf("fill = %d\n", fill);

  // data left is not null and length to be left is greater than available
  if (left && (inlen >= fill))
  {
    // printf("1st condition\n");
    memcpy((void *)(S->buf + left), (void *)in, fill);
    S->t[0] += 512;

    // printf("S->t[0] = %d\n", S->t[0]);

    if (S->t[0] == 0)
      S->t[1]++;

    // for (int j = 0; j < 64; j++)
    // {
    //   printf("S->buf[%d] = %d\n", j, S->buf[j]);
    // }
    // printf("\n");

    core_function(S, S->buf);
    in += fill;
    inlen -= fill;

    // printf("inlen = %d\n", inlen);
    left = 0;
  }

  // if meesage is greater than length of 64
  while (inlen >= 64)
  {
    // printf("Condition 2\n");
    S->t[0] += 512;

    if (S->t[0] == 0)
      S->t[1]++;

    core_function(S, in);
    in += 64;
    inlen -= 64;
  }

  // if the message when block is empty
  if (inlen > 0)
  {
    // printf("Condition 3\n");
    memcpy((void *)(S->buf + left), (void *)in, (size_t)inlen);
    S->buflen = left + (int)inlen;
    // printf("inlen = %d\n", inlen);
    // printf("buflen = %d\n", S->buflen);
    // for (int j = 0; j < 64; j++)
    // {
    //   printf("S->buf[%d] = %d\n", j, S->buf[j]);
    // }
  }
  else
    S->buflen = 0;
}

// finalize blake 256
void final(state256 *S, uint8_t *out)
{
  uint8_t msglen[8], zo = 0x01, oo = 0x81;
  uint32_t lo = S->t[0] + (S->buflen << 3), hi = S->t[1];
  // printf("lo = %d\n", lo);
  // printf("hi = %d\n", hi);

  // space fill is less than greater than 2^32 bits
  if (lo < (S->buflen << 3))
    hi++;

  // get the message in 8-bit big-endian format
  U32TO8_BIG(msglen + 0, hi);
  U32TO8_BIG(msglen + 4, lo);

  // print the message
  // for (int i = 0; i < 8; i++)
  // {
  //   printf("msglen[%d] = %d\n", i, msglen[i]);
  // }

  // only one byte for padding is fill
  if (S->buflen == 55)
  {
    S->t[0] -= 8;
    add_padding(S, &oo, 1);
  }
  else
  {
    // atleast 2 bytes are available for padding
    if (S->buflen < 55)
    {
      // if buflen is 0
      if (!S->buflen)
        S->nullt = 1;

      S->t[0] -= 440 - (S->buflen << 3);
      // printf("S[t[0]] = %d\n", S->t[0]);
      // printf("buflen = %d\n", S->buflen);
      add_padding(S, padding, 55 - S->buflen);
    }
    else
    {
      S->t[0] -= 512 - (S->buflen << 3);
      add_padding(S, padding, 64 - S->buflen);
      S->t[0] -= 440;
      add_padding(S, padding + 1, 55);
      S->nullt = 1;
    }

    // add one after padding 0 bits
    add_padding(S, &zo, 1);
    S->t[0] -= 8;
    // printf("S->t[0] = %d\n", S->t[0]);
  }

  S->t[0] -= 64;
  // printf("S->t[0] = %d\n", S->t[0]);
  // for (int j = 0; j < 64; j++)
  // {
  //   printf("S->buf[%d] = %d\n", j, S->buf[j]);
  // }

  add_padding(S, msglen, 8);

  // converting the 32-bit blocks into 8-bit hash output in big-endian
  U32TO8_BIG(out + 0, S->h[0]);
  U32TO8_BIG(out + 4, S->h[1]);
  U32TO8_BIG(out + 8, S->h[2]);
  U32TO8_BIG(out + 12, S->h[3]);
  U32TO8_BIG(out + 16, S->h[4]);
  U32TO8_BIG(out + 20, S->h[5]);
  U32TO8_BIG(out + 24, S->h[6]);
  U32TO8_BIG(out + 28, S->h[7]);
}

void blake_256(uint8_t *out, const uint8_t *in, uint64_t inlen)
{
  state256 S;
  initialize(&S);
  add_padding(&S, in, inlen);
  final(&S, out);
}

int main(int argc, char **argv)
{
  // take the message
  char *in = "";

  // length of message
  size_t msg_len = strlen(in);

  // output array - this will contain the final hash
  uint8_t out[32];

  // invoke the hashing function
  blake_256(out, in, msg_len);

  // print the hash in hexadecimal form
  printf("BLAKE256 HASH for \"\": ");
  for (int i = 0; i < 32; ++i)
  {
    printf("%x", out[i]);
  }
  return 0;
}

Output:

BLAKE256 HASH for "": 716f6e863f744b9ac22c97ec7b76ea5f598bc5b2f67c61510bfc4751384ea7a

BLAKE is similar to the MD5 hash function in its initial steps, you can read more about the working and uses of the MD5 hash function here.

References

  1. SHA-3 proposal BLAKE, version 1.3. [Source]- SHA-3 proposal BLAKE (aumasson.jp).
  2. The Hash Function BLAKE- Jean-Philippe Aumasson, Willi Meier Raphael C.-W. Phan, Luca Henzen
Was this Article Helpful?
Ajay Choudhury

Ajay Choudhury

A learner and a student of technlogy. He loves sharing experiences and learning with others through his projects and blog. Along with technology, he loves playing football and enjoys listening to music and podcasts.

0 Comments

Info

OrbitGadget — A constructive and and resourceful hub for computer science geeks. Let us achieve together.

Privacy Policy Disclosure Site Terms Atom Feed Sitemap

Made with ❤️ and Django. OrbitGadget © 2020 - 2022.