What is the advantage of storing schema in avro?

Question

Evolving schemas

Suppose intially you designed an schema like this for your Employee class

{
{"name": "emp_name", "type":"string"},
{"name":"dob", "type":"string"},
{"name":"age", "type":"int"}
}

Later you realized that age is redundant and removed it from the schema.

{
{"name": "emp_name", "type":"string"},
{"name":"dob", "type":"string"}
}

What about the records that were serialized and stored before this schema change. How will you read back those records?

That’s why the avro reader/deserializer asks for the reader and writer schema. Internally it does schema resolution ie. it tries to adapt the old schema to new schema.

Go to this link – http://avro.apache.org/docs/1.7.2/api/java/org/apache/avro/io/parsing/doc-files/parsing.html – section “Resolution using action symbols”

In this case it does skip action, ie it leaves out reading “age”. It can also handle cases like a field changes from int to long etc..

This is a very nice article explaining schema evolution – http://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html

Schema is stored only once for multiple records in a single file.
Size, encoded in very few bytes.

Leave a Comment Cancel reply