Where exactly does serialization comes into the picture? I read about serializtion on the ‘net and I have come to know that
it is an interface that if implements in a class, means that it can be automatically be serialized and deserialized by the different serializers.
Give me a good reason why and when would a class needs to be serialized? Suppose once it’s serialized, what happens exactly?
Serialization is needed whenever an object needs to be persisted or transmitted beyond the scope of its existence.
Persistence is the ability to save an object somewhere and load it later with the same state. For example:
- You might need to store an object instance on disk as part of a file.
- You might need to store an object in a database as a blob (binary large object).
Transmission is the ability to send an object outside of its original scope to some receiver. For example:
- You might need to transmit an instance of an object to a remote machine.
- You might need to transmit an instance to another AppDomain or process on the same machine.
For each of these, there must be some serial bit representation that can be stored, communicated, and then later used to reconstitute the original object. The process of turning an object into this series of bits is called “serialization”, while the process of turning the series of bits into the original object is called “deserialization”.
The actual representation of the object in serialized form can differ depending on what your goals are. For example, in C#, you have both XML serialization (via the
XmlSerializer class) and binary serialization (through use of the
BinaryFormatter class). Depending on your needs, you can even write your own custom serializer to do additional work such as compression or encryption. If you need a language- and platform-neutral serialization format, you can try Google’s Protocol Buffers which now has support for .NET (I have not used this).
The XML representation mentioned above is good for storing an object in a standard format, but it can be verbose and slow depending on your needs. The binary representation saves on space but isn’t as portable across languages and runtimes as XML is. The important point is that the serializer and deserializer must understand each other. This can be a problem when you start introducing backward and forward compatibility and versioning.
An example of potential serialization compatibility issues:
- You release version 1.0 of your program which is able to serialize some
Fooobject to a file.
- The user does some action to save his
Footo a file.
- You release version 2.0 of your program with an updated
- The user tries to open the version 1.0 file with your version 2.0 program.
This can be troublesome if the version 2.0
Foo has additional properties that the version 1.0
Foo didn’t. You have to either explicitly not support this scenario or have some versioning story with your serialization. .NET can do some of this for you. In this case, you might also have the reverse problem: the user might try to open a version 2.0
Foo file with version 1.0 of your program.
I have not used these techniques myself, but .NET 2.0 and later has support for version tolerant serialization to support both forward and backward compatibility:
- Tolerance of extraneous or unexpected data. This enables newer versions of the type to send data to older versions.
- Tolerance of missing optional data. This enables older versions to send data to newer versions.
- Serialization callbacks. This enables intelligent default value setting in cases where data is missing.
For example when you want to send objects over network or storing them into files.
Lets say you’re creating a Savegame-format for a video-game. You then could make the class
Player and every
Enemy serializable. This way it would be easy to save the state of the current objects into a file.
On the other end, when writing a multiplayer-implementation for your game, you could send the
Player serialized via network to the other clients, which then could handle these data.
In non-object-oriented languages, one would typically have data stored in memory in a pattern of bytes that would ‘make sense’ without reference to anything else. For example, a bunch of shapes in a graphics editor might simply have all their points stored consecutively. In such a program, simply storing the contents of all one’s arrays to disk might yield a file which, when read back into those arrays would yield the original data.
In object-oriented languages, many objects are stored as references to other objects. Merely storing the contents of in-memory data structures will not be useful, because a reference to object #24601 won’t say anything about what that object represents. While an object-oriented system may be able to do a pretty good job figuring out what the in-memory data “mean” and try to convert it automatically to a sensible format, it can’t recognize all the distinctions between object references which point to the same object, and those that point to objects which happen to match. It’s thus often necessary to help out the system when converting objects to a raw stream of bits.
Not classes, but the specific objects might be serialized to store in some persistent storage or to pass the object to another application/via network.
for example, when you want to send a object to some url, you might decide to send it in xml format. The process of converting from the in-memory object to (in this case) xml, is called serialization. Converting from xml to a in-memory is called de-serialization.