Time for action – defining the schema
Let's now create this simplified UFO schema in a single Avro schema file.
Create the following as ufo.avsc
:
{ "type": "record", "name": "UFO_Sighting_Record", "fields" : [ {"name": "sighting_date", "type": "string"}, {"name": "city", "type": "string"}, {"name": "shape", "type": ["null", "string"]}, {"name": "duration", "type": "float"} ] }
What just happened?
As can be seen, Avro uses JSON in its schemas, which are usually saved with the .avsc
extension. We create here a schema for a format that has four fields, as follows:
The Sighting_date field of type string to hold a date of the form
yyyy-mm-dd
The City field of type string that will contain the city's name where the sighting occurred
The Shape field, an optional field of type string, that represents the UFO's shape
The Duration field gives a representation of the sighting duration in fractional minutes
With the schema defined, we will now create some sample data.