Breaking a PRNG:
Is it Called Xor Shift or Xor Shit?

by Steve on March 27th, 2023

I like code, crypto(graphy), and math. I've already broken a PRNG called UHEPRNG (Ultra High Entropy PRNG). random-seed on NPM uses UHEPRNG and gets 100k+ downloads/week. Mostly because React includes it in their tests that use a seeded PRNG for repeatable numbers. This is bad because it legitimized a broken PRNG that the owner doesn't want to state in the description that it should not be used for crypto.

Anyway I was recently pointed out a PRNG called "srandom". It is meant to be install as /dev/srandom and replaces the output of /dev/urandom. This is not to be confused with OpenBSD's /dev/srandom which I believe was removed in version 4.9 or Linux's srandom function. Searching Github, there are about 200 hits. Most are custom Android kernels with few stars.

After the author was told this was using an insecure algorithm "Xor Shift", the author claimed it's secure because it's two different Xor Shift algorithms xored together and has multiple arrays that are randomly selected. Then added "probably the most secure PRNG available" to the read me. OK let's go.

There are two modes standard and ultra high speed (UHS). The difference between the two modes is which mixing function used on a state array. There are 2**n+1 state arrays (n=5 for UHS and n=4 for standard). The extra one is for randomly selecting an array to use. After randomly selecting 1021 arrays, it mixes the extra array. There are 3 global 64 bit integers used for mixing which are initialized with nanosecond time on boot and, when in standard mode, about every 3 minutes. Let's ignore that this is not a good method of seeding or reseeding which is it's own critical crypto bug. After doing a code review, I identified three critical coding bugs.

This is actually a good CTF. You can find the code here if you want to look at it before I ruin all the fun in finding the bugs.

Critical Coding Bug #1

The first critical coding bug is on line 144:

uint64_t (*prngArrays)[numberOfRndArrays + 1];  /* Array of Array of SECURE RND numbers */

They're trying to create a pointer to an array like this:

uint64_t prngArrays[numberOfRndArrays + 1][rndArraySize];

When defining a pointer to memory like that, you need to say how long each sub array is like this:

uint64_t (*prngArrays)[rndArraySize];

Now it knows each sub array is rndArraySize elements. Since rndArraySize is larger than numberOfRndArrays + 1, srandom's sub arrays overlap. If multiple threads are asking for random data at the same time, they can get partial same output. But good news because of this bug it's not writing to unallocated memory:

234                for (C = 0;C <= rndArraySize;C++) {
235                        prngArrays[arraysPosition][C] = xorshft128();
236                }

Critical Coding Bug #2

The second critical coding bug is on lines 329 to 347 and 383 to 389:

329        /*
330         * Select a RND array
331         */
332        while (mutex_lock_interruptible(&amp;ArrBusy_mutex));
333
334        arraysPosition = nextbuffer();
335
336        while ((ArraysBusyFlags &amp; 1 << arraysPosition) == (1 << arraysPosition)) {
337                arraysPosition += 1;
338                if (arraysPosition >= numberOfRndArrays) {
339                        arraysPosition = 0;
340                }
341        }
342
343        /*
344         * Mark the Arry as busy by setting the flag
345         */
346        ArraysBusyFlags += (1 << arraysPosition);
347        mutex_unlock(&amp;ArrBusy_mutex);
...
383        /*
384         * Clear ArraysBusyFlags
385         */
386        if (mutex_lock_interruptible(&amp;ArrBusy_mutex))
387                return -ERESTARTSYS;
388        ArraysBusyFlags -= (1 << arraysPosition);
389        mutex_unlock(&amp;ArrBusy_mutex);

In standard mode, "rndArraySize" is 16. If you create 17 threads and each asks for random data, then it will deadlock on lines 336 to 341 because it doesn't unlock and lock the mutex in the loop. This prevents the lock to free up the busy array. Now all calls to read /dev/srandom (and /dev/urandom) will block forever. A web page could trigger this with web workers since Crypto.getRandomValues() should be using system random.

Critical Coding Bug #3

The third critical coding bug is on line 145:

uint16_t ArraysBusyFlags = 0;             /* Binary Flags for Busy Arrays */

It's only a 16 bit integer and UHS mode uses 32 arrays. Thus in UHS mode, 16 arrays can't be locked. That means this can output the same random data to different threads.

Break Time

So right now it is a completely broken, insecure source of random. BUT that assumes the output is any good. Spoiler... it's not.

Breaking xorshft64()

There are two xor shift functions the first looks like this:

575 uint64_t xorshft64(void)
576 {
577         uint64_t z = (x += 0x9E3779B97F4A7C15ULL); // Note uint64_t x; is a global variable
578         z = (z ^ (z >> 30)) * 0xBF58476D1CE4E5B9ULL;
579         z = (z ^ (z >> 27)) * 0x94D049BB133111EBULL;
580         return z ^ (z >> 31);
581 }

To recover the state of xorshft64() from its output is fairly straight forward. You just need to know that a variable xored to itself is zero and a variable xored to zero is unchanged. Let's look at this code:

uint8_t x, y;
y = x ^ (x >> 3);

Let's assign all the bits variable names (or letters) x = abcdefgh and y = ijklmnop:

y = ijklmnop
  = abcdefgh
   ^000abcde

Thus "ijk" is "abc" from x. If we take bits "abc" from y and right shift by three and xor them back to y:

 abcdefgh
^000abcde
^000abc00
=
 abcdefgh
^000000de

Ah now we know "abcdef" from x. Well take bits "de" and right shift by three and xor them back to y:

 abcdefgh
^000000de
^000000de
=
 abcdefgh

We now have x. This can be simplified to just:

x = y ^ (y >> 3) ^ (y >> (2*3));

And in general:

x = y;
for (i = 1; i * shift < bitSize; i++)
  x ^= y >> (i * shift);

One can find the multiplicative inverse of any odd number modulo 2**bitSize as follows:

x = y;
for (i = 0; 3 << i < bitSize; i++)
  x *= 2 - y * x;

Which give us 0x319642b2d24d8ec3 and 0x96de1b173f119089 for the inverses of 0x94D049BB133111EB and 0xBF58476D1CE4E5B9.

Putting this all together we get the state of x before xorshft64() was called:

x  = (z ^ (z >> 31) ^ (z >> (2*31))) * UINT64_C(0x319642b2d24d8ec3);
x  = (x ^ (x >> 27) ^ (x >> (2*27))) * UINT64_C(0x96de1b173f119089);
x  =  x ^ (x >> 30) ^ (x >> (2*30));
x -= UINT64_C(0x9E3779B97F4A7C15);

Breaking xorshft128()

Now onto the second xor shift:

582 uint64_t xorshft128(void)
583 {
584         uint64_t s1 = s[0]; // Note uint64_t s[2]; is a global variable
585         const uint64_t s0 = s[1];
586         s[0] = s0;
587         s1 ^= s1 << 23;
588         return (s[ 1 ] = (s1 ^ s0 ^ (s1 >> 17) ^ (s0 >> 26))) + s0;
589 }

Let's clean that up a bit:

uint64_t xorshft128()
{
    uint64_t s0 = s[0];
    uint64_t s1 = s[1];

    s0 = s0 ^ (s0 << 23);
    s0 = s0 ^ (s0 >> 17) ^ s1 ^ (s1 >> 26);

    s[0] = s1;
    s[1] = s0;
    return s0 + s1;
}

We get s0 + s1 as the output. We could just get two of these and brute force a 64 bit integer, but that will take a bit of work and I can't do that on a laptop in a weekend. The new gold standard of "lol look at this broken crypto". But (s0 + s1) & 1 == (s0 ^ s1) & 1. So if we get 128 outputs and get the least significant bit of each, we can set up a system of equations. We just need to follow which bits are xored to which bits and build 128 equations that look like this example:

outputBit = ((s0 >> 5) ^ (s0 >> 34) ^ (s1 >> 3) ^ (s1 >> 61)) & 1;

If you are interested, you should look at my code. Here's a gif of it in action:

That was hours of extra work and several iterations to make that look nice. I hope you enjoyed those 12 seconds.

Once you know the state you can go backwards using the techniques used in breaking xorshft64(). Also you can generate equations after clocking forward how many you want to go back. Then use the current state's output and solve the system of equations. You'll end up with the state at that point.

Finding Xor Shift Output

An array is picked at random, copied to the output buffer, and in UHS mode is given to update_sarray_uhs() to be mixed. If you ask for multiple blocks of output, it just repeats that on a loop. This is the guts of update_sarray_uhs() cleaned up a bit:

Z1 = xorshft64();
if ((Z1 & 1) == 0)
{
    for (C = 0; C < rndArraySize - 4; C += 4)
    {
        X = xorshft64();
        prngArray[C    ] = prngArray[C + 1] ^ X;
        prngArray[C + 1] = prngArray[C + 2] ^ X ^ Z1;
        prngArray[C + 2] = prngArray[C + 3] ^ X ^ Z1;
        prngArray[C + 3] = X ^ Z1;
    }
}
else
{
    for (C = 0; C < rndArraySize - 4; C += 4)
    {
        X = xorshft64();
        prngArray[C    ] = prngArray[C + 1] ^ X ^ Z1;
        prngArray[C + 1] = prngArray[C + 2] ^ X;
        prngArray[C + 2] = prngArray[C + 3] ^ X ^ Z1;
        prngArray[C + 3] = X ^ Z1;
    }
}

Given two blocks of output (1 KiB), you can get X by xoring the second block's last element and the first block's either third to last or second to last element. You just need to recover the state and check if the state matches the other output.

Now you just make multiple calls and compare to previous blocks to know which array was used. This is a critical crypto bug because if you stop here knowing the contents of the 32 arrays you know the next call for random is one of 32 possible values.

In standard mode, update_sarray() is called instead of update_sarray_uhs(). This is the guts of update_sarray() cleaned up a bit:

Z1 = xorshft64();
Z2 = xorshft64();
Z3 = xorshft64();
if ((Z1 & 1) == 0)
{
    for (C = 0; C < rndArraySize - 4; C += 4)
    {
        X = xorshft128();
        Y = xorshft128();
        prngArray[C    ] = prngArray[C + 1] ^ X ^ Y;
        prngArray[C + 1] = prngArray[C + 2] ^ Y ^ Z1;
        prngArray[C + 2] = prngArray[C + 3] ^ X ^ Z2;
        prngArray[C + 3] = X ^ Y ^ Z3;
    }
}
else
{
    for (C = 0; C < rndArraySize - 4; C += 4)
    {
        X = xorshft128();
        Y = xorshft128();
        prngArray[C    ] = prngArray[C + 1] ^ X ^ Z2;
        prngArray[C + 1] = prngArray[C + 2] ^ X ^ Y;
        prngArray[C + 2] = prngArray[C + 3] ^ Y ^ Z3;
        prngArray[C + 3] = X ^ Y ^ Z1;
    }
}

Just note for later that if the state of xorshft64() and xorshft128() are known and update_sarray() is called on an array 4 times, we know the full contents of the array. Since we know all the X, Y, Z1, Z2, Z3 values and each call overwrites a quarter of the array while shifting the elements by one.

This looks very similar to update_sarray_uhs() and a similar technique can be used. Get blocks, isolate outputs from the xor shifts, and verify:

// Same as xorshft64() but let's you give it a state
uint64_t xorshft64_(uint64_t &state)
{
    uint64_t z = (state += UINT64_C(0x9E3779B97F4A7C15));
    z = (z ^ (z >> 30)) * UINT64_C(0xBF58476D1CE4E5B9);
    z = (z ^ (z >> 27)) * UINT64_C(0x94D049BB133111EB);
    return z ^ (z >> 31);
}

uint64_t buffer[64+4];
getDevSrandom(buffer, sizeof(buffer)); // gets data from /dev/srandom

Z3 = buffer[1] ^ buffer[64 + 0] ^ buffer[64 + 3];
state = xorshft64_getState(Z3) - 3 * UINT64_C(0x9E3779B97F4A7C15);
Z1 = xorshft64_(state);
Z2 = xorshft64_(state);
X = buffer[3] ^ buffer[64 + 2] ^ Z2;
Y = buffer[2] ^ buffer[64 + 1] ^ Z1;
if ((Z1 & 1) != 0 || buffer[64 + 3] != X ^ Y ^ Z3)
{
    Z1 = buffer[2] ^ buffer[64 + 1] ^ buffer[64 + 3];
    state = xorshft64_getState(Z1);
    Z2 = xorshft64_(state);
    Z3 = xorshft64_(state); 
    X = buffer[1] ^ buffer[64 + 0] ^ Z2;
    Y = buffer[3] ^ buffer[64 + 2] ^ Z3;
    if ((Z1 & 1) == 0 || buffer[64 + 3] != X ^ Y ^ Z1)
    {
        return 1; // error
    }
}

Now do this enough to get 128 outputs from xorshft128() and run that through the systems of equations solver:

Please enjoy these 12 seconds again.

Randomly Selected Array

The last part is figuring out which array is randomly selected. nextbuffer() is called once for each request and it uses that array for the full request of the amount of data. You can see the only call for nextbuffer() on line 334 in the code from the second critical coding bug. This is the nextbuffer() function cleaned up a bit:

int nextbuffer()
{
    uint64_t index;
    uint8_t  position = arraysBufferPosition / 16;
    uint8_t  roll     = arraysBufferPosition % 16;

    index  = prngArrays[numberOfRndArrays][position] >> (4 * roll);
    index &= numberOfRndArrays - 1;

    arraysBufferPosition++;
    if (arraysBufferPosition >= 1021)
    {
        arraysBufferPosition = 0;
        update_sarray(numberOfRndArrays);
    }

    return (int) index;
}

We can call our "getDevSrandom()" function up to 1021 times waiting for nextbuffer() to call update_sarray(). This is detectable because the state of xorshft64() increases by 3 extra times between outputs from "getDevSrandom()". Now we call "getDevSrandom()" 3*1021 times more and this guarantees the last array has been flushed with output from xorshft64() and xorshft128(). For standard mode, we know the states of both of those and we kept track of which arrays were updated so we know the full state of all of the arrays. At this point if we didn't already know the order of the arrays because of the overlap bug, we could just make a few more "getDevSrandom()" calls to find the order because we know which array index is used for each call.

For UHS mode, we only know the state of xorshft64() but because of the overlap bug we get access to part of the last array. We can call "getDevSrandom()" 1021 times and get the next partial block and repeating until we can recover the state of xorshft128().

What if the Overlap Bug Didn't Exist?

Normal mode was fully broken, but in UHS mode we used the overlap bug to get data from the extra array. So we need a new way to get the output from xorshft128(). Looking at nextbuffer():

uint64_t index;
uint8_t  position = arraysBufferPosition / 16;
uint8_t  roll     = arraysBufferPosition % 16;

index  = prngArrays[numberOfRndArrays][position] >> (4 * roll);
index &= numberOfRndArrays - 1;

In UHS mode, numberOfRndArrays is 32. Every 16 calls, roll is 15 and that means the most significant bit in the index is zero. Doing enough calls you can determine the 16 arrays that have the most significant bit of zero and the other arrays are one. Also the most significant and least significant bits of indexes overlap between calls:

roll=15: 0abcd
roll=14: defgh
roll=13: hijkl
roll=12: lmnop
...

Thus we know the least significant bits of the indexes. Well... we only care about the least significant bit of prngArrays[numberOfRndArrays][position]. Get 128 of these bit and...

(Sorry, I had to it was a lot of work to get that gif to look awesome)

Conclusion

Looks like I'm two for two on breaking PRNGs that have "ultra high" in the name. Please leave crypto to cryptographers or at least learn how to break stuff before writing crypto code. See Cryptopals and Cryptohack. Also Crypto I class, Crypto 101 free book, and Serious Crypto book. Even after all of that, when you do write crypto code you need qualified people to audit your code and design. There's also an episode of Security Cryptography Whatever called The Great Roll Your Own Crypto Debate about this topic.