Serializing Data In PHP
One of the more annoying problems when working with data is creating a system to save data from RAM into long-term storage so it can be loaded back into RAM and processed at a later time. There are thousands of different file formats that allow us to take structured data, write it to storage, and then read from it again. Wouldn’t it be great if we could do the same thing with our classes so we could save their state and then restore them to the same state?
Most programming languages provide some kind of support to do this and PHP is no exception. In this article, we’ll be discussing how to serialize and unserialize our objects in PHP.
What Is Serialization?
The process of serialization is converting a data structure or object into the string representation of that data structure or object, which can then be stored or transmitted. We can then save the string to a file, database, cache, or even send it over a network easily.
The process of unserialization is the opposite, where we take a string representation of an object and recreate the object.
PHP provides built-in support for these operations using the `serialize()` and `unserialize()` functions to serialize and unserialization classes, respectively.
Serializing A User Class
Let’s work through a quick example of how to serialize a `User`. We’re going to keep it simple and only have the user’s name and email address. This will keep our examples small and easier to read but the number of properties isn’t a limiting factor in the real world.
This is our example `User` class.
var_dump($user);
This gives us the following output:
scott@keck-warren.com
test.php:4:
class User#2 (2) {
public string $name =>
string(5) "Scott"
public string $email =>
string(21) "scott@keck-warren.com"
}
Notice it’s an instance of our class and not something like a `stdClass`. This means we can easily unserialize our string to an instance of our class and can use functions inside the class like always.
Keeping Stuff Secret
Now one of the downsides to the built-in `serialize()`/`unserialize()` logic is that it exports all the properties of the class. This might not be exactly what you want.
For example, we might have a password property in our `User` class.
class User
{
public function __construct(
public string $name,
public string $email,
public string $password,
) {
}
}
Now obviously, we don’t want to be leaking our user’s passwords but if we serialize this class we’ll get the password right in the serialized string.
$user = new User("Scott", "scott@keck-warren.com", "Monkey1234!");
echo serialize($user);
// O:4:"User":3:{s:4:"name";s:5:"Scott";s:5:"email";s:21:"scott@keck-warren.com";s:8:"password";s:11:"Monkey1234!";}%
Thankfully PHP provides some magic methods to prevent this kind of headache. Magic methods are special methods we can define in our classes that will override PHP’s default behavior when certain actions are performed on an object. Magic methods start with two underscores and you’re likely very aware of the `__construct()` magic method.
In this case, we’re going to be looking at the `__serialized()` function. This function returns an array containing the items that we WANT to have serialized.
An option is to only export the properties that we can have in the serialized version.
// in User
public function __serialize(): array
{
return [
"name" => $this->name,
"email" => $this->email,
];
}
We could also encrypt any of the values we want to keep secure.
// in User
public function __serialize(): array
{
return [
"name" => $this->name,
"email" => $this->email,
"password" => encrypt($this->password),
];
}
Now the downside to this is that we’ve lost some information, either way, we do it. We’ve either completely lost the value or it’s encrypted. Thankfully there’s another magic method called `__unserialize()` that allows us to manually unserialize the class. This way we can add in the missing data.
public function __unserialize(array $data): void
{
$this->name = $data["name"];
$this->email = $data["email"];
$this->password = "Monkey1234!";
}
public function __unserialize(array $data): void
{
$this->name = $data["name"];
$this->email = $data["email"];
$this->password = decrypt($data["password"]);
}
We can also do other actions inside the `__unserialize()`, like reconnecting to the database or creating other items that were discarded when we serialized the class.
Pre 7.4
Now, most of us should be using at least PHP 8.0 at this point, but we need to have a brief discussion about version support of serialization in PHP. __serialize()
and __unserialize()
were both added in PHP 7.4. Before this there were two ways to provide the same functionality.
The first is that we could have an __sleep()
and __wake()
functions. They act the same as the __serialize()
and __unserialize()
functions but have been deprecated so we shouldn’t rely on them. We’re mentioning them because you still might be supporting them in your current code base. The interesting thing about the __sleep()
and __wakeup()
functions is that to provide backward compatibility, we can have __sleep()
, __wakeup()
, __serialize()
, and __unserialize()
functions in the same class and if we’re using a version of PHP that supports __sleep()
and __wakeup()
they will be used, but if we’re using a version that uses __serialize()
and __unserialize()
they will be used. This provides backward compatibility because the older versions of the language don’t know about the newer functions, cool isn’t it?
The second way to provide this is by implementing the Serializable interface. This provides an interface for us to implement that tells PHP that we’ve defined our own serialize()
and unserialize()
functions (note the lack of underscores). This interface is also deprecated, and as of PHP 8.1.0, if our class implements Serializable but doesn’t implement __serialize()
and __unserialize()
it will generate a deprecation warning.
Deprecated: User implements the Serializable interface, which is deprecated. Implement __serialize() and __unserialize() instead (or in addition, if support for old PHP versions is necessary) in /Users/scottkeck-warren/SynologyDrive/thisprogrammingthing.com/test.php on line 3
The long and the short of it is that if you currently have a class that uses the Serializable interface, the __sleep()
, or the __wake()
they should quickly be replaced with __serialize()
and __unserialize()
as soon as possible.
Don’t Blindly Trust Your Inputs
As we’ve seen, serialization is super powerful, but that power comes with the potential to mess up our data. We should never ever trust a string we receive from the outside world and unserialize it. Ideally, if we’re serializing and then later unserializing the data it should never leave our digital security perimeter because we just can’t trust it.
I once did a code review for a change, and part of the change required us to keep track of a state but not in a database. The developer who wrote the code passed a serialized string to the user and then had them send it back. I was easily able to alter the string so it would bypass the billing phase of the process. We quickly fixed the code so this wouldn’t happen.
What You Need to Know
- Serialization converts an object to a string
- Unserialization converts a string to an object
- PHP has built-in support
- Magic methods provide helpers for our objects to control the process
Leave a comment
Use the form below to leave a comment: