Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
PHP's silly PRNG (ifies.com)
33 points by zokier on Sept 6, 2009 | hide | past | favorite | 39 comments


For some added context, the normal rand function in PHP uses an implementation based on libc.

PHP provides a second, safer, function, mt_rand, which is based around an implementation of the Mersenne Twister.


The difference when using mt_rand is astonishing.

PHP 5.3 graph using rand: http://twitpic.com/gq81b

PHP 5.3 graph using mt_rand: http://twitpic.com/gq827


The regular rand can do that just as well. see elsewhere in this thread.


> before you criticize a project make sure you understand the material.

He's right.

Mersenne Twister = mt_rand(). The guy that wrote this is an ignorant.

Ignorant: Lacking knowledge. Unaware or uninformed (adjective: moron).


The only thing that is left from this whole circus is a reference to a bug that has been known about for a long time:

http://bugs.php.net/bug.php?id=45184

It would have really helped if they did a bit of research before writing such an inflammatory post and then go all out to get people to spread it.

Anyway, the damage is done.


It would make more sense to call mt_rand simply rand, and the current rand to fast_rand.


well, that could make some sense, but mt_rand is significantly faster too than basic rand.

edit: what I meant was that imho there is not really any reason to keep that old, broken rand function lying around


So basically PHP keeps the slower and worst random generator as the default.

Great minds in action.


Backwards-compatibility should not be an issue here either. Is there any way to actually be dependent on the regularity of rand()?


Yes. Some applications want series of numbers that are pseudorandom, but completely reproducible, once you've provided the same seed. (See PHP's srand().)

This is often the case with simulations, or other algorithms that have a 'random' component that you'd still like to hold constant across (for example) related runs during testing or across distributed machines.

So there may be a sensible case for maintaining backward-compatible rand() behavior here, even though a better algorithm is available.


Then make srand = mt_srand.


That doesn't resolve the problem for code relying on stable behavior of these functions. Given the following code...

  srand(CONSTANT); 
  n = rand();
...the scenarios I mentioned benefit from n being the exact same number in PHP version X and then version X+1, X+2, etc.

If PHP swaps out both rand() and srand(), backward-compatibility is broken, and along with it some kinds of quite-reasonable code that relied on srand() initializing rand() to give the same deterministic stream of numbers (rather than just a deterministic stream).

It might still be worth taking that hit -- I don't have a strong opinion -- but there are good reasons not to break backwards-compatibility, even to improve PRNG quality.


php, now with goto!


mt_rand isn't crypto-safe either. You need to be using the system's secure random number generator, or OpenSSL's.


The only thing silly here is this article.

First off, the random generator in PHP is simply a 2^32 period random number generator, nothing special or wrong about that. Yes, there are better ones but it's good enough for its intended purpose.

About the image in the article:

The image is generated by using the 'scale' function of the PHP PRNG to sample the 'lowest' bit.

You do not do a scale when you should be doing a one bit and.

It pays off to study an issue for a while, once you raise a stink it's hard to undo the damage you've done.

By replacing the line in the code that plots the pixel with:

   imagesetpixel($im,$x,$y, rand() & 1);
this image appears:

http://ww.com/random.php

Problem gone, no fuss.

That's on a linux box running php 4.4.2, the results look pretty random to me, it isn't perfect but the aliasing that was displayed in the original test image is clearly gone using the 'regular' (so not mt_rand) random number generator.

To make the result reproducible I've seeded the PRNG with '0' using srand(0); at the beginning of the program.

If you want to use the 'scale' function in PHP for something that you should be doing by a one bit and then you will reap the results of that, you are effectively sampling the lowest bit after a bunch of float operations, which is clearly sub-optimal.

PHP can't correct for incorrect use.

As for the rest of the code, the $x=0; line can be dropped, there is no need to 'reset' your loop variable at the end of a loop, last I checked.

I'm no PHP fan but before you criticize a project make sure you understand the material.


If mt_rand is really that much better, what is the point of keeping them both? Shouldn't they just make rand map to mt_rand?


There seems to be some kind of law that, for anything to do with crypto, numerics, or randomness, a language must ship with both a broken default and a correct implementation hidden deep in some library known only to the pure of heart.


Probably a matter of mitigating risk. Despite its drawbacks, rand() is used a LOT and works...well, "good enough".

In the interest of making sure the code you wrote for PHP3 will work on current/future versions and not have anything unexpected happen.


I would like to see the code that breaks if PRNG is swapped. I mean, like is there code out there that relies on the unrandomness of current rand?


Testsets + known output, spots where the PRNG has been used as part of the seed driving cryptographic code.

The fact that a given seed makes your program deterministic is a great way to 'lock' the variation that a random generator would otherwise give you.

So, say you have a series of steps that produce a result given a random number as the seed.

You come up with a bunch of cases where the run output has been saved and checked by hand to conform, you now use that output to verify against when maintaining the software.

If you should change the behavior of the PRNG then output will change and the test will break.

A database with passwords that were 'encrypted' (I use the word loosely) using the PRNG with parts of the password as the seed. If you change the PRNG under water none of those users can log in.

There are probably lots of other examples, these are just two simple scenarios where code would break.


As far as I see these examples are all scandalous abuses of undocumented implementation details. While I can't be bothered to look up the actual documentation, I'm almost 100% sure that it's not guaranteed that PHP's rand() will always yield the same results for the same seed. But if it is, well, that would be a peculiar example of shortsighted language framework design.


> I'm almost 100% sure that it's not guaranteed that PHP's rand() will always yield the same results for the same seed.

That's the whole point of having a seed, isn't it ?

see the Periodicity entry:

http://en.wikipedia.org/wiki/Pseudorandom_number_generator

and

http://www.php.net/manual/en/function.srand.php

Does not make any mention of the behavior you cite, if you can't be bothered to look it up why bring it up in the first place ?


All I'm saying that while it is indeed true that having a seed implies a deterministic behaviour, this should be an implementation detail that should not be exposed to the the users of the interface.


Either I'm not following you, or you are not clear. Or both ;)

It is meant to work that way, for precisely the kind of reasons I outlined above.

Say you are writing a poker program. At the beginning of every game you seed the random number generator, you save the seed.

Halfway through a test you find a bug in the program.

You fix the bug (you think) and you make a test case for the fix.

You run the test case with the restored random seed to see if the bug is actually under control and does not influence any other part of the game up to there.

And so on.

That's clearly documented and very common use.

If it wouldn't be like that you'd have to play poker until the cows came home in the hope that one day that same sequence of play would occur.


What DrJokepu may have been getting at is that the implementation of rand() itself is not guaranteed. Expecting it not to change in future versions of the library is dubious.


True, but if you do not have to break stuff then don't.

After all, it was pseudo random to begin with, nobody made any claims as to how random it was, simply that it was 'periodical' and that the period was long enough that in a short lived program you would get the impression of random data.

Every programmer even halfway worth his weight in bits would know that that meant that if you needed 'high grade' random data you'd best look elsewhere, something with a longer period that downsamples to 32 bits for instance, or a hardware based random generator.

This has been true of the 'rand' function on a unix box as well, the period is (2^(sizeof(int) x 8))-1, that's just about the only guarantee you get. (hn eats asterisks)

The article linked here concentrates on the lower single bit of the output of the random generator (the rand(0,1) part of the code will generate a number between 0 and 1), but in fact completely misses the point that if you want a number that is either 0 or 1 you MASK out the lower bit instead of asking PHP to do this using the range option, which is meant to give you a number which has been 'scaled' down from the output of the PRNG in the php core. Right now you are probably sampling the lower bit of a bunch of float operations, not the best way to get one random bit.

(rand() & 1) would have done the job just fine.

So much for programmers that can manipulate bits, even after they see the implementation they still can't figure out how to do it properly. What's so hard about a one bit and ?

So, first of the article uses the PRNG in a completely nonsense way, then it is labelled as 'broken' based on a broken test and next the PHP maintainers get flak for staying backwards compatible when there was no clear need for change.

It wasn't 'broken', it could be improved a lot at the expense of backwards compatibility and so they chose the appropriate route and added a library routine for that.

I'm no big fan of PHP, but this whole thread is way over the top.

here is the same example but with the lower bit masked instead of doing a bunch of unnecessary float ops on the output of the PRNG:

http://ww.com/random.php

I would say that looks quite a bit better than the provided sample.

And I also don't get what the $x=0; is for at the end of the loop on the x-coordinate, never knew that you had to reset your loop variables.


Making real random data using a computer is hard, very hard.

These are pseudo random number generators, I take it that everybody using them understands that the 'P' means that you can not rely on the output of these pieces of code to be truely random.

One of the first online casinos used a PIC to sample a noisy zener diode to get to something less predictable.

And even with that as the source concept it took quite a bit of tweaking of the design to get really random bits, it's surprising how much hum from your powersupply can make it through to the bits of your a/d converter.


Yeah. Random is hard.

BUT when you actually already have far better implementation of PRNG, namely md_rand in PHP, there really is no excuse for defaulting to broken behavior.


That may actually have to do with not upsetting peoples unit tests that depend on this 'broken behavior'.

Let's say someone wrote an extensive body of statistical software that gets tested with the known input from the PRNG in php, if you then change the random number generator all the output will be different.

Or if someone used the PRNG in PHP to encrypt a bunch of data replacing it would break the decrypt function leaving the user with inaccessible data.


Trusting the broken behavior of a broken random-number generator is hardly the best approach to encrypting data. Anyone who writes a test that depends on the predictability of a random number generator is in for a huge disappointment when the data is to be decrypted.

Why not do it using a proper crypto library?


It was two examples, one for 'test' data, one for the passwords example.

The first showed a series of cases that have been saved to check if code has been inadvertently broken.

The second example was about data stored in a database that depends on the functioning of the PRNG in a predictable way for its use.

Both examples are 'contrived', just theoretical situations to show you that it is possible to break stuff by changing the function of a piece of code that implements a language feature.

The general message was: if you can remain backwards compatible then that is probably a good thing because there are bound to be use cases for your code that you can't think of that will have huge consequences.

So I think the PHP team did the right thing here in making a better random number generator available besides the old one.

Legacy data and software is always a pain, as soon as you have an installed base of '1' you have a problem. Designers of languages break existing code at their peril.

That said, they should have probably implemented a better one right from day 1.


"That said, they should have probably implemented a better one right from day 1."

I still can't believe it's a good idea not to break behavior that depends on the predictability of a broken implementation of a random number generator. It's not like this code will be broken by fixing the RNG. It was broken from the start.


I heard that an online casino used optical sensors and lava lamps to generate random numbers.


Zalewski's "Silence On The Wire" (which is totally fantastic in its own right) graphs a few PRNGs and discusses their issues.

http://www.amazon.com/exec/obidos/tg/detail/-/1593270461


The problem that you see here is mostly that rand() gets the range in a way biased towards the lower-order bits, which in more LCGs are less random than the higher-order bits.


Not only that, they invoke a bunch of float code that then has to balance on getting the rounding of the lowest bit right after a whole bunch of float scaling operations.

That's simply not the way to do it when you are only interested in one bit of data to check the random number generator.

That's only handy if you want to limit the range of your random numbers between two numbers with a bit more space between them than just the lower order bit.

Say 500 and 10000 or 0 and 500 or something like that.


> Tuesday, May 20, 2008

I assume PHP still has this issue?


Of course!

PHP only gains new issues, no warts are ever really removed.


Yes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: