Browsing Posts in Serverside

Sometimes it's the simple things!

Like many you may Google "blank strings returned from htmlentities(); php5.4, php5.5". Welp! you're not alone as user <ky dot patterson at adlinkr dot com> points out on

If you have content

that is not 100% UTF-8 then TAKE NOTE

Starting in PHP 5.4 htmlspecialchars() and htmlentities() assume charset=UTF-8 by default AND WILL RETURN BLANK IF YOUR INPUT IS NOT VALID UTF-8.

So if you have a lot of function calls that look like this:

echo htmlspecialchars($input);
// or
echo htmlentities($input);

i.e. no charset and no flags -- and $input is ISO-8859 (or anything else apart from 7-bit ASCII or UTF-8) -- then PHP 5.4 and 5.5 will return an empty string, and you will be surprised and probably unhappy.

This is apparently a feature, not a bug.

The Good News

This problem is "somewhat" limited to PHP5.4/5.5 implicitly and it's all related to the "optional" $encoding argument.

string htmlentities ( string $string [, int $flags = ENT_COMPAT | ENT_HTML401 [, string $encoding = ini_get("default_charset") [, bool $double_encode = true ]]] )

If omitted, the default value of the encoding varies depending on the PHP version in use. In PHP 5.6 and later, the default_charset configuration option is used as the default value. PHP 5.4 and 5.5 will use UTF-8 as the default. Earlier versions of PHP use ISO-8859-1.

Although this argument is technically optional, you are highly encouraged to specify the correct value for your code if you are using PHP 5.5 or earlier, or if your default_charset configuration option may be set incorrectly for the given input.

Prior to PHP5.4 the default encoding was ISO-8859-1, in 5.4+ that got switched to UTF-8 (presumably as part of the PHP6 Unicode project to upgrade PHP internals from UTF-8 to UTF-16, that never got released, but ultimately ended up being back-ported into 5.3/5.4).

The problem prior to 5.6 is there is no alternative to fallback to, other than UTF-8, you have to implicitly specify encoding (if you're not using UTF-8) everytime, otherwise it'll return an invalid code unit sequence and consequently an empty string! This is potentially a nightmare; updating hundreds if not thousands of instances of htmlentities(), htmlspecialchars() etc...

Fortunately in 5.6 it will fallback to ini_get("default_charset") as a bare minimum. Now whilst this is almost an "I told you not to lazy code" point, it is somewhat problematic. I think a typical solution is to write some "wrapper" functionality (as exampled here) and simply find/replace their relative instances or install the Advanced PHP Debugger (ADP) library from the PECL repositories to leverage override_function();

So. Good habit to get into, always implicitly specify your encoding or mb_detect_encoding();

Also worthy of note:

Although most are probably aware of the impending deprecation of the original mysql_ extension in PHP. It was finally deprecated in PHP5.5

PHP7 or PHP Next Generation (PHPNG)

Whilst most production environments/hosting companies will probably wait awhile before rushing to PHP7, Zend is already preparing it's community for some of the changes:

One of the things that excites me about PHP7 is the refactoring of the Zend Engine (now Zend Engine 3) to offer " almost 100% increase in performance". A huge emphasis has been placed on performance enhancements in PHP7. We'll also have return type declarations and three-way comparison more commonly known as the "Spaceship Operator".

So... Lots of PHP awesomeness to come, but a period of headaches for many I feel. The pain will be worth it though :-)

  • Previous
  • Next