Go to Sign up
Note: Your files never leave your device. We don't upload, transfer, or store your data.
Apache Avro is a data serialization system that uses JSON-based schemas to define data structures, then serializes data in a compact binary format. It is a core technology in the Apache big data ecosystem.
Key properties:
| Property | Description |
|---|---|
| Schema-first | Data structure defined before serialization |
| JSON schemas | Schemas written in JSON; data serialized in binary |
| Language-neutral | Supports Java, Python, C, C++, C#, Go, Ruby, PHP, etc. |
| Compact binary | Smaller than JSON; no field names in serialized data |
| Schema evolution | Add fields without breaking backward compatibility |
| Self-describing | Data includes its schema |
| Platform / Use Case | How Avro Is Used |
|---|---|
| Apache Kafka | Default serialization format for Kafka messages (Confluent Schema Registry) |
| Apache Hadoop | Data serialization for MapReduce, HDFS storage |
| Apache Spark | Avro is a native data source format |
| Apache Flink | Stream processing with Avro types |
| Apache NiFi | Data flow processing with Avro schemas |
| Confluent Platform | Schema Registry stores and manages Avro schemas |
| AWS Glue | Data Catalog uses Avro schemas for table definitions |
| Data lakes | Avro is a common format in S3/GCS/ADLS data lakes |
| Event streaming | High-throughput, schema-enforced event delivery |
An Avro schema is a JSON document that defines the structure of serialized data. It specifies field names, types, and optional default values.
{
"type": "record",
"name": "User",
"namespace": "com.example",
"fields": [
{"name": "id", "type": "int"},
{"name": "name", "type": "string"},
{"name": "email", "type": ["null", "string"], "default": null},
{"name": "active", "type": "boolean"}
]
}
| Component | Description | Example |
|---|---|---|
type | Schema type | "record", "array", "map" |
name | Record/type name | "User" |
namespace | Unique package identifier | "com.example" |
fields | Array of field definitions | [{"name": "id", "type": "int"}] |
doc | Documentation string | "User record" |
| Avro Type | Description | JSON Example |
|---|---|---|
null | No value | null |
boolean | True or false | true |
int | 32-bit signed integer | 42 |
long | 64-bit signed integer | 9999999999 |
float | Single precision (32-bit) | 3.14 |
double | Double precision (64-bit) | 3.141592653589793 |
bytes | Sequence of 8-bit bytes | (binary) |
string | Unicode character sequence | "hello" |
| Type | Description | Example |
|---|---|---|
record | Named fields (like a struct) | {"type": "record", "fields": [...]} |
enum | Named set of values | {"type": "enum", "symbols": ["A", "B"]} |
array | Ordered sequence | {"type": "array", "items": "string"} |
map | Key-value pairs | {"type": "map", "values": "string"} |
union | One of multiple types | ["null", "string"] |
fixed | Fixed-size bytes | {"type": "fixed", "size": 16} |
| Feature | Avro | JSON | Protocol Buffers |
|---|---|---|---|
| Schema | JSON-based | None (optional) | .proto files |
| Serialization | Binary | Text | Binary |
| Schema evolution | Built-in (field ordering) | Manual | Built-in (field numbers) |
| Code generation | Optional | N/A | Required (protoc) |
| Dynamic typing | Supported | Native | Not supported |
| Null handling | Union types (["null", "type"]) | Native null | Optional fields (proto3) |
| Primary ecosystem | Hadoop, Kafka | Web, APIs | gRPC, Google |
| Best for | Big data, event streaming | APIs, config | RPC, microservices |
| JSON Type | Avro Type |
|---|---|
| Any value | string |
| JSON Type | JSON Example | Inferred Avro Type |
|---|---|---|
| String | "hello" | string |
| Integer (fits in int) | 42 | int |
| Integer (exceeds int) | 9999999999 | long |
| Float | 3.14 | double |
| Boolean | true | boolean |
| Null | null | null (added to union) |
| Object | {...} | Nested record |
| Array | [...] | array |
| Use Case | Description |
|---|---|
| Kafka migration | Generate Avro schemas for Kafka topics currently using JSON |
| Schema Registry | Register schemas in Confluent Schema Registry |
| Data lake setup | Define schemas for Parquet/Avro files in S3/GCS |
| Hadoop jobs | Create Avro schemas for MapReduce input/output |
| Spark data sources | Define schemas for Spark Avro data sources |
| API contracts | Define data contracts for event-driven architectures |
| Documentation | Auto-generate schemas from sample JSON for reference |
| Schema evolution planning | Start with inferred schema, then refine for versioning |
File Upload: Drag and drop or select a .json file.
Code Editor: Paste or type raw JSON with syntax highlighting and real-time validation.
Sets the name field in the Avro schema. This identifies the record type.
{
"type": "record",
"name": "Order",
...
}
Use PascalCase by convention: User, Order, SensorReading, PageView.
Sets the namespace field — a dot-separated identifier that uniquely qualifies the schema name.
{
"type": "record",
"name": "User",
"namespace": "com.example.events",
...
}
Conventions:
Java-style: com.company.project.module
Domain-based: io.company.events
The fully qualified name becomes com.example.events.User.
| Option | Output |
|---|---|
| Schema + Data | Avro schema followed by the JSON data formatted according to the schema |
| Schema Only | Only the Avro schema definition (no data) |
Use Schema Only for registering in Schema Registry. Use Schema + Data for documentation and testing.
Controls output JSON formatting:
| Option | When to Use |
|---|---|
| 2 spaces | Default — clean, readable |
| 4 spaces | Deeply nested schemas |
| Tab | Match project conventions |
| Minified | Smallest file size for storage |
When enabled, the tool infers Avro types from JSON values instead of defaulting all fields to string.
Smart Types enabled:
{"name": "id", "type": "int"}
{"name": "price", "type": "double"}
{"name": "active", "type": "boolean"}
Smart Types disabled:
{"name": "id", "type": "string"}
{"name": "price", "type": "string"}
{"name": "active", "type": "string"}
Enable Smart Types for type-safe schemas. Disable for schemas where all values should be treated as strings (e.g., CSV imports with no type information).
When enabled, adds null to union types for fields that may contain null values.
Enabled:
{"name": "email", "type": ["null", "string"], "default": null}Disabled:
{"name": "email", "type": "string"}Enable this when your JSON data may contain null values. In Avro, nullable fields must use union types ["null", "type"]. Without this, null values in your data will cause serialization errors.
All processing runs entirely in your browser. No data is uploaded to any server.
Choose one of two input methods:
Upload a file: Click "Choose File" and select a .json file, or drag it into the upload area.
Paste data: Click "Enter Data" to switch to the code editor. Paste your JSON array.
Important: The tool expects a JSON array of objects.
Use the Properties panel on the right:
Record Name: Enter a name (e.g., User, Order, Event).
Namespace: Enter a namespace (e.g., com.example.events).
Output Format: Choose Schema + Data or Schema Only.
Indent: Select formatting style.
Smart Types: Enable to infer int/boolean/double from JSON values.
Parse JSON — Include Null: Enable to add null to union types.
Click Convert. The Avro schema appears in the "Output Data" panel.
Click Copy to Clipboard to paste into your schema file.
Premium users can click Download File to save as .avsc.
Input JSON:
[
{"id": 1, "name": "Alice", "email": "[email protected]", "active": true, "score": 95.5},
{"id": 2, "name": "Bob", "email": null, "active": false, "score": 88.0},
{"id": 3, "name": "Charlie", "email": "[email protected]", "active": true, "score": 72.3}
]
Configuration:
Record Name: UserEvent
Namespace: com.example.events
Output Format: Schema Only
Indent: 2 spaces
Smart Types: On
Include Null: On
Output:
{
"type": "record",
"name": "UserEvent",
"namespace": "com.example.events",
"fields": [
{"name": "id", "type": "int"},
{"name": "name", "type": "string"},
{"name": "email", "type": ["null", "string"], "default": null},
{"name": "active", "type": "boolean"},
{"name": "score", "type": "double"}
]
}
Input JSON:
[
{"order_id": "ORD-001", "total": 149.99, "items": 3, "status": "shipped", "timestamp": 1715011200},
{"order_id": "ORD-002", "total": 59.50, "items": 1, "status": "pending", "timestamp": 1715011300}
]
Configuration:
Record Name: OrderEvent
Namespace: com.company.orders
Smart Types: On
Include Null: Off
Output:
{
"type": "record",
"name": "OrderEvent",
"namespace": "com.company.orders",
"fields": [
{"name": "order_id", "type": "string"},
{"name": "total", "type": "double"},
{"name": "items", "type": "int"},
{"name": "status", "type": "string"},
{"name": "timestamp", "type": "long"}
]
}
Register in Confluent Schema Registry:
curl -X POST \
http://localhost:8081/subjects/order-events-value/versions \
-H "Content-Type: application/vnd.schemaregistry.v1+json" \
-d '{"schema": "..."}'
Configuration:
Smart Types: Off
Output:
{
"type": "record",
"name": "Data",
"fields": [
{"name": "id", "type": "string"},
{"name": "name", "type": "string"},
{"name": "active", "type": "string"}
]
}
Use this when all data should be treated as strings (e.g., CSV-to-Avro conversion with no type info).
No. All conversion happens locally in your browser using JavaScript. Your data never leaves your device.
Apache Avro is a data serialization system that uses JSON-based schemas and compact binary serialization. It is widely used in Apache Kafka, Hadoop, Spark, and big data pipelines.
An Avro schema is a JSON document that defines the structure of serialized data — field names, types, namespaces, and defaults. Data is serialized in binary using the schema.
When enabled, Smart Types infers Avro types from JSON values: integers become int, floats become double, booleans become boolean. When disabled, all fields default to string.
It adds null to union types for fields that may contain null values. In Avro, nullable fields must use union types like ["null", "string"]. Without this, null values cause serialization errors.
A namespace uniquely qualifies the schema name, similar to a Java package. Example: com.example.events. The fully qualified name becomes com.example.events.UserEvent.
Schema Only outputs just the Avro schema definition. Schema + Data outputs the schema followed by the JSON data formatted according to it. Use Schema Only for Schema Registry registration.
Use .avsc for Avro Schema files and .avro for Avro data files.
Yes. The generated schema is standard Avro JSON format compatible with Confluent Schema Registry, Kafka producers/consumers, and all Avro implementations.
The tool processes data entirely in your browser. Files up to 10 MB typically convert without issues on modern hardware.