- Evolving schemas
Suppose intially you designed an schema like this for your Employee class
{
{"name": "emp_name", "type":"string"},
{"name":"dob", "type":"string"},
{"name":"age", "type":"int"}
}
Later you realized that age is redundant and removed it from the schema.
{
{"name": "emp_name", "type":"string"},
{"name":"dob", "type":"string"}
}
What about the records that were serialized and stored before this schema change. How will you read back those records?
That’s why the avro reader/deserializer asks for the reader and writer schema. Internally it does schema resolution ie. it tries to adapt the old schema to new schema.
Go to this link – http://avro.apache.org/docs/1.7.2/api/java/org/apache/avro/io/parsing/doc-files/parsing.html – section “Resolution using action symbols”
In this case it does skip action, ie it leaves out reading “age”. It can also handle cases like a field changes from int to long etc..
This is a very nice article explaining schema evolution – http://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html
-
Schema is stored only once for multiple records in a single file.
-
Size, encoded in very few bytes.