Xorshift

Xorshift random number generators, also called shift-register generators are a class of pseudorandom number generators that were discovered by George Marsaglia.[1] They are a subset of linear-feedback shift registers (LFSRs) which allow a particularly efficient implementation without using excessively sparse polynomials.[2] They generate the next number in their sequence by repeatedly taking the exclusive or of a number with a bit-shifted version of itself. This makes them extremely fast on modern computer architectures. Like all LFSRs, the parameters have to be chosen very carefully in order to achieve a long period.[3]

Example random distribution of Xorshift128

Xorshift generators are among the fastest non-cryptographically-secure random number generators, requiring very small code and state. Although they do not pass every statistical test without further refinement, this weakness is well-known and easily amended (as pointed out by Marsaglia in the original paper) by combining them with a non-linear function, resulting e.g. in a xorshift+ or xorshift* generator. A native C implementation of a xorshift+ generator that passes all tests from the BigCrush suite (with an order of magnitude fewer failures than Mersenne Twister or WELL) typically takes fewer than 10 clock cycles on x86 to generate a random number, thanks to instruction pipelining.[4]

The scramblers known as + and * still leave weakness in the low bits,[5] so they're intended for floating point use, as conversion of a random number to floating point discards the low bits. For general purpose, the scrambler ** (pronounced 'starstar') makes the LFSR generators pass in all bits.

Because plain xorshift generators (without a non-linear step) fail a few statistical tests, they have been accused of being unreliable.[3]:360

Example implementation

A C version[note 1] of three xorshift algorithms[1]:4,5 is given here. The first has one 32-bit word of state, and period 232−1. The second has one 64-bit word of state and period 264−1. The last has four 32-bit words of state, and period 2128−1. All use three shifts and three or four exclusive-or operations:

#include <stdint.h>

struct xorshift32_state {
  uint32_t a;
};

/* The state word must be initialized to non-zero */
uint32_t xorshift32(struct xorshift32_state *state)
{
	/* Algorithm "xor" from p. 4 of Marsaglia, "Xorshift RNGs" */
	uint32_t x = state->a;
	x ^= x << 13;
	x ^= x >> 17;
	x ^= x << 5;
	return state->a = x;
}

struct xorshift64_state {
  uint64_t a;
};

uint64_t xorshift64(struct xorshift64_state *state)
{
	uint64_t x = state->a;
	x ^= x << 13;
	x ^= x >> 7;
	x ^= x << 17;
	return state->a = x;
}

struct xorshift128_state {
  uint32_t a, b, c, d;
};

/* The state array must be initialized to not be all zero */
uint32_t xorshift128(struct xorshift128_state *state)
{
	/* Algorithm "xor128" from p. 5 of Marsaglia, "Xorshift RNGs" */
	uint32_t t = state->d;

	uint32_t const s = state->a;
	state->d = state->c;
	state->c = state->b;
	state->b = s;

	t ^= t << 11;
	t ^= t >> 8;
	return state->a = t ^ s ^ (s >> 19);
}

The 128-bit algorithm passes the diehard tests. However, it fails the MatrixRank and LinearComp tests of the BigCrush test suite from the TestU01 framework.

Variations

All xorshift generators fail some tests out of TestU01's BigCrush test suite. This is true for all generators based on linear recurrences, such as the Mersenne Twister or WELL. However, it is easy to scramble the output of such generators to improve their quality.

xorwow

Marsaglia suggested scrambling the output by combining it with a simple additive counter modulo 232 (which he calls a "Weyl sequence" after Weyl's equidistribution theorem). This also increases the period by a factor of 232, to 2192−232:

#include <stdint.h>

struct xorwow_state {
    uint32_t a, b, c, d, e;
    uint32_t counter;
};

/* The state array must be initialized to not be all zero in the first four words */
uint32_t xorwow(struct xorwow_state *state)
{
    /* Algorithm "xorwow" from p. 5 of Marsaglia, "Xorshift RNGs" */
    uint32_t t = state->e;
    uint32_t s = state->a;
    state->e = state->d;
    state->d = state->c;
    state->c = state->b;
    state->b = s;
    t ^= t >> 2;
    t ^= t << 1;
    t ^= s ^ (s << 4);
    state->a = t;
    state->counter += 362437;
    return t + state->counter;
}

This performs well, but fails a few tests in BigCrush.[6] This generator is the default in Nvidia's CUDA toolkit.[7]

xorshift*

A xorshift* generator takes a xorshift generator and applies an invertible multiplication (modulo the word size) to its output as a non-linear transformation, as suggested by Marsaglia.[1] The following 64-bit generator with 64 bits of state has a maximal period of 264−1[8] and fails only the MatrixRank test of BigCrush:

#include <stdint.h>

struct xorshift64s_state {
  uint64_t a;
};

uint64_t xorshift64s(struct xorshift64s_state *state)
{
	uint64_t x = state->a;	/* The state must be seeded with a nonzero value. */
	x ^= x >> 12; // a
	x ^= x << 25; // b
	x ^= x >> 27; // c
	state->a = x;
	return x * UINT64_C(0x2545F4914F6CDD1D);
}

A similar generator is suggested in Numerical Recipes[9] as RanQ1, but it also fails the BirthdaySpacings test.

If the generator is modified to return only the high 32 bits, then it passes BigCrush with zero failures.[10]:7 In fact, a reduced version with only 40 bits of internal state passes the suite, suggesting a large safety margin.[10]:19

Vigna[8] suggests the following xorshift1024* generator with 1024 bits of state and a maximal period of 21024−1; it however does not always pass BigCrush.[5] xoshiro256** is therefore a much better option.

#include <stdint.h>

/* The state must be seeded so that there is at least one non-zero element in array */
struct xorshift1024s_state {
	uint64_t array[16];
	int index;
};

uint64_t xorshift1024s(struct xorshift1024s_state *state)
{
	int index = state->index;
	uint64_t const s = state->array[index++];
	uint64_t t = state->array[index &= 15];
	t ^= t << 31;		// a
	t ^= t >> 11;		// b
	t ^= s ^ (s >> 30);	// c
	state->array[index] = t;
	state->index = index;
	return t * (uint64_t)1181783497276652981;
}

Both generators, as it happens with all xorshift* generators, emit a sequence of 64-bit values that is equidistributed in the maximum possible dimension (except that they will never output zero for 16 calls, i.e. 128 bytes, in a row).[8]

xorshift+

Rather than using multiplication, it is possible to use addition as a faster non-linear transformation. The idea was first proposed by Saito and Matsumoto (also responsible for the Mersenne Twister) in the XSadd generator, which adds two consecutive outputs of an underlying xorshift generator based on 32-bit shifts.[11]

XSadd, however, has some weakness in the low-order bits of its output; it fails several BigCrush tests when the output words are bit-reversed. To correct this problem, Vigna[12] introduced the xorshift+ family, based on 64-bit shifts: the following xorshift128+ generator uses 128 bits of state and has a maximal period of 2128−1. It passes BigCrush, but not when reversed.[5]

#include <stdint.h>

struct xorshift128p_state {
  uint64_t a, b;
};

/* The state must be seeded so that it is not all zero */
uint64_t xorshift128p(struct xorshift128p_state *state)
{
	uint64_t t = state->a;
	uint64_t const s = state->b;
	state->a = s;
	t ^= t << 23;		// a
	t ^= t >> 17;		// b
	t ^= s ^ (s >> 26);	// c
	state->b = t;
	return t + s;
}

This generator is one of the fastest generators passing BigCrush.[4] One disadvantage of adding consecutive outputs is while the underlying xorshift128 generator is 2-dimensionally equidistributed, the associated xorshift128+ generator is only 1-dimensionally equidistributed.[12]

Xorshift+ generators, even as large as xorshift1024+, exhibit some detectable linearity in the low-order bits of their output.[5]

xoshiro and xoroshiro

xoshiro and xoroshiro are other variations of the shift-register generators, using rotations in addition to shifts. According to Vigna, they are faster and produce better quality output than xorshift.[13][14]

This class of generator has variants for 32-bit and 64-bit integer and floating point output; for floating point, one takes the upper 53 bits (for binary64) or the upper 23 bits (for binary32), since the upper bits are of better quality than the lower bits in the floating point generators. The algorithms also include a jump function, which sets the state forward by some number of steps – usually a power of two that allows many threads of execution to start at distinct initial states.

xoshiro256**

xoshiro256** is the family's general-purpose random 64-bit number generator.

/*  Adapted from the code included on Sebastian Vigna's website */

#include <stdint.h>

uint64_t rol64(uint64_t x, int k)
{
	return (x << k) | (x >> (64 - k));
}

struct xoshiro256ss_state {
	uint64_t s[4];
};

uint64_t xoshiro256ss(struct xoshiro256ss_state *state)
{
	uint64_t *s = state->s;
	uint64_t const result = rol64(s[1] * 5, 7) * 9;
	uint64_t const t = s[1] << 17;

	s[2] ^= s[0];
	s[3] ^= s[1];
	s[1] ^= s[2];
	s[0] ^= s[3];

	s[2] ^= t;
	s[3] = rol64(s[3], 45);

	return result;
}

xoshiro256+

xoshiro256+ is approximately 15% faster than xoshiro256**, but the lowest three bits have low linear complexity; therefore, it should be used only for floating point results by extracting the upper 53 bits.

#include <stdint.h>

uint64_t rol64(uint64_t x, int k)
{
	return (x << k) | (x >> (64 - k));
}

struct xoshiro256p_state {
	uint64_t s[4];
};

uint64_t xoshiro256p(struct xoshiro256p_state *state)
{
	uint64_t (*s)[4] = &state->s;
	uint64_t const result = s[0] + s[3];
	uint64_t const t = s[1] << 17;

	s[2] ^= s[0];
	s[3] ^= s[1];
	s[1] ^= s[2];
	s[0] ^= s[3];

	s[2] ^= t;
	s[3] = rol64(s[3], 45);

	return result;
}

Other variants

If space is at a premium, xoroshiro128** is the equivalent of xoshiro256**, and xoroshiro128+ is the equivalent of xoshiro256+. These have smaller state spaces, and thus are less useful for massively parallel programs. xoroshiro128+ also exhibits a mild dependency in Hamming weights, generating a failure after 5TB of output. The authors do not believe that this can be detected in real world programs.

For 32-bit output, xoshiro128** and xoshiro128+ are exactly equivalent to xoshiro256** and xoshiro256+, with uint32_t in place of uint64_t, and with different shift/rotate constants; similarly, xoroshiro64** and xoroshiro64* are the equivalent of xoroshiro128** and xoroshiro128+ respectively. Unlike the functions with larger state, xoroshiro64** and xoroshiro64* are not straightforward ports of their higher-precision counterparts.

More recently, the ++ generators have been made as an alternative to the ** generators.

Initialization

It is the recommendation of the authors of the xoshiro paper to initialize the state of the generators using a generator which is radically different from the initialized generators, as well as one which will never give the "all-zero" state; for shift-register generators, this state is impossible to escape from.[14][15] The authors specifically recommend using the SplitMix64 generator, from a 64-bit seed, as follows:

#include <stdint.h>

struct splitmix64_state {
	uint64_t s;
};

uint64_t splitmix64(struct splitmix64_state *state) {
	uint64_t result = (state->s += 0x9E3779B97f4A7C15);
	result = (result ^ (result >> 30)) * 0xBF58476D1CE4E5B9;
	result = (result ^ (result >> 27)) * 0x94D049BB133111EB;
	return result ^ (result >> 31);
}

// as an example; one could do this same thing for any of the other generators
struct xorshift128_state xorshift128_init(uint64_t seed) {
	struct splitmix64_state smstate = {seed};
	struct xorshift128_state result = {0};

	uint64_t tmp = splitmix64(&smstate);
	result.a = (uint32_t)tmp;
	result.b = (uint32_t)(tmp >> 32);

	tmp = splitmix64(&smstate);
	result.c = (uint32_t)tmp;
	result.d = (uint32_t)(tmp >> 32);

	return result;
}

Notes

  1. In C and most other C-based languages, the caret (^) represents the bitwise XOR, and the " << " and " >> " operators represent left and right bitwise shifts, respectively.

References

  1. Marsaglia, George (July 2003). "Xorshift RNGs". Journal of Statistical Software. 8 (14). doi:10.18637/jss.v008.i14.
  2. Brent, Richard P. (August 2004). "Note on Marsaglia's Xorshift Random Number Generators". Journal of Statistical Software. 11 (5). doi:10.18637/jss.v011.i05.
  3. Panneton, François; L'Ecuyer, Pierre (October 2005). "On the xorshift random number generators" (PDF). ACM Transactions on Modeling and Computer Simulation. 15 (4): 346–361. doi:10.1145/1113316.1113319. S2CID 11136098.
  4. Vigna, Sebastiano. "xorshift*/xorshift+ generators and the PRNG shootout". Retrieved 2014-10-25.
  5. Lemire, Daniel; O’Neill, Melissa E. (April 2019). "Xorshift1024*, Xorshift1024+, Xorshift128+ and Xoroshiro128+ Fail Statistical Tests for Linearity". Computational and Applied Mathematics. 350: 139–142. arXiv:1810.05313. doi:10.1016/j.cam.2018.10.019. S2CID 52983294. We report that these scrambled generators systematically fail Big Crush—specifically the linear-complexity and matrix-rank tests that detect linearity—when taking the 32 lowest-order bits in reverse order from each 64-bit word.
  6. Le Floc'h, Fabien (12 January 2011). "XORWOW L'ecuyer TestU01 Results". Chase The Devil (blog). Retrieved 2017-11-02.
  7. "cuRAND Testing". Nvidia. Retrieved 2017-11-02.
  8. Vigna, Sebastiano (July 2016). "An experimental exploration of Marsaglia's xorshift generators, scrambled" (PDF). ACM Transactions on Mathematical Software. 42 (4): 30. arXiv:1402.6246. doi:10.1145/2845077. S2CID 13936073. Proposes xorshift* generators, adding a final multiplication by a constant.
  9. Press, WH; Teukolsky, SA; Vetterling, WT; Flannery, BP (2007). "Section 7.1.2.A. 64-bit Xorshift Method". Numerical Recipes: The Art of Scientific Computing (3rd ed.). New York: Cambridge University Press. ISBN 978-0-521-88068-8.
  10. O'Neill, Melissa E. (5 September 2014). PCG: A Family of Simple Fast Space-Efficient Statistically Good Algorithms for Random Number Generation (PDF) (Technical report). Harvey Mudd College. pp. 6–8. HMC-CS-2014-0905.
  11. Saito, Mutsuo; Matsumoto, Makoto (2014). "XORSHIFT-ADD (XSadd): A variant of XORSHIFT". Retrieved 2014-10-25.
  12. Vigna, Sebastiano (May 2017). "Further scramblings of Marsaglia's xorshift generators" (PDF). Journal of Computational and Applied Mathematics. 315 (C): 175–181. arXiv:1404.0390. doi:10.1016/j.cam.2016.11.006. S2CID 6876444. Describes xorshift+ generators, a generalization of XSadd.
  13. Vigna, Sebastiano. "xoshiro/xoroshiro generators and the PRNG shootout". Retrieved 2019-07-07.
  14. Blackman, David; Vigna, Sebastiano (2018). "Scrambled Linear Pseudorandom Number Generators". arXiv:1805.01407. Cite journal requires |journal= (help)
  15. Matsumoto, Makoto; Wada, Isaku; Kuramoto, Ai; Ashihara, Hyo (September 2007). "Common defects in initialization of pseudorandom number generators". ACM Transactions on Modeling and Computer Simulation. 17 (4): 15–es. doi:10.1145/1276927.1276928. S2CID 1721554.

Further reading

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.