---
published: true
layout: post
title: The Open Source Community Tooling Built on Avro
image: >-
  https://s3.amazonaws.com/kinlane-productions2/algorotoscope-master/braceros-domingo-ulloa-container-ship-in-seattle.jpg
author:
  name: kinlane
tags:
  - Avro
  - Tooling
---
**Specification**

*   [**avro**](https://github.com/apache/avro) - (forks: 1066) (stars: 1594) (watchers: 1594) - apache avro is a data serialization system.

**Registries**

*   [**schema registry**](https://github.com/confluentinc/schema-registry) - (forks: 736) (stars: 1234) (watchers: 1234) - confluent schema registry for kafka
*   [**schema registry ui**](https://github.com/lensesio/schema-registry-ui) - (forks: 88) (stars: 321) (watchers: 321) - web tool for avro schema registry |
*   [**schemer**](https://github.com/indix/schemer) - (forks: 3) (stars: 90) (watchers: 90) - schema registry for csv, tsv, json, avro and parquet schema. supports schema inference and graphql api.

**Queries**

*   [**rq**](https://github.com/dflemstr/rq) - (forks: 45) (stars: 1553) (watchers: 1553) - record query - a tool for doing record analysis and transformation

**Education**

*   [**examples**](https://github.com/confluentinc/examples) - (forks: 458) (stars: 670) (watchers: 670) - apache kafka and confluent platform examples and demos
*   [**kafka storm starter**](https://github.com/miguno/kafka-storm-starter) - (forks: 335) (stars: 726) (watchers: 726) - code examples that show to integrate apache kafka 0.8+ with apache storm 0.9+ and apache spark streaming 1.1+, while using apache avro as the data serialization format.
*   [**avro hadoop starter**](https://github.com/miguno/avro-hadoop-starter) - (forks: 86) (stars: 111) (watchers: 111) - example mapreduce jobs in java, hive, pig, and hadoop streaming that work on avro data.
*   [**Avro2TF**](https://github.com/linkedin/Avro2TF) - (forks: 19) (stars: 118) (watchers: 118) - avro2tf is designed to fill the gap of making users' training data ready to be consumed by deep learning training frameworks.

**Serialization**

*   [**avsc**](https://github.com/mtth/avsc) - (forks: 98) (stars: 844) (watchers: 844) - avro for javascript :zap:
*   [**avro4s**](https://github.com/sksamuel/avro4s) - (forks: 178) (stars: 536) (watchers: 536) - avro schema generation and serialization / deserialization for scala
*   [**fastavro**](https://github.com/fastavro/fastavro) - (forks: 115) (stars: 362) (watchers: 362) - fast avro for python
*   [**gogen avro**](https://github.com/actgardner/gogen-avro) - (forks: 66) (stars: 191) (watchers: 191) - generate go code to serialize and deserialize avro schemas
*   [**avrohugger**](https://github.com/julianpeeters/avrohugger) - (forks: 82) (stars: 147) (watchers: 147) - generate scala case class definitions from avro schemas
*   [**scalavro**](https://github.com/GenslerAppsPod/scalavro) - (forks: 31) (stars: 119) (watchers: 119) - a reflection-based avro library in scala.
*   [**abracad**](https://github.com/damballa/abracad) - (forks: 31) (stars: 107) (watchers: 107) - a clojure library for de/serializing clojure data structures with avro.
*   [**python avro json serializ**](https://github.com/linkedin/python-avro-json-serializer) - (forks: 32) (stars: 104) (watchers: 104) - serializes data into a json format using avro schema.
*   [**avro\_turf**](https://github.com/dasch/avro_turf) - (forks: 44) (stars: 97) (watchers: 97) - a library that makes it easier to use the avro serialization format from ruby.
*   [**avro rs**](https://github.com/flavray/avro-rs) - (forks: 48) (stars: 89) (watchers: 89) - avro client library implementation in rust
*   [**json schema avro**](https://github.com/fge/json-schema-avro) - (forks: 22) (stars: 102) (watchers: 102) - avro to json schema, and back
*   [**jsAvroPhonetic**](https://github.com/torifat/jsAvroPhonetic) - (forks: 56) (stars: 84) (watchers: 84) - a javascript implementation of avro phonetic
*   [**kafka avro**](https://github.com/waldophotos/kafka-avro) - (forks: 34) (stars: 76) (watchers: 76) - node.js bindings for librdkafka with avro schema serialization.
*   [**pyavroc**](https://github.com/Byhiras/pyavroc) - (forks: 17) (stars: 46) (watchers: 46) - an avro file reader/writer for python
*   [**BlueSteel**](https://github.com/saksdirect/BlueSteel) - (forks: 15) (stars: 47) (watchers: 47) - an avro encoding/decoding library for swift.
*   [**libserdes**](https://github.com/confluentinc/libserdes) - (forks: 35) (stars: 36) (watchers: 36) - avro serialization/deserialization c/c++ library with confluent schema-registry support
*   [**vulcan**](https://github.com/fd4s/vulcan) - (forks: 8) (stars: 46) (watchers: 46) - functional avro for scala
*   [**avro schema**](https://github.com/tarantool/avro-schema) - (forks: 2) (stars: 48) (watchers: 48) - apache avro schema tools for tarantool

**Generators**

*   [**xml avro**](https://github.com/elodina/xml-avro) - (forks: 56) (stars: 58) (watchers: 58) - generate avro schema and avro binary from xsd schema and xml

**Connectors**

*   [**spark avro**](https://github.com/databricks/spark-avro) - (forks: 316) (stars: 535) (watchers: 535) - avro data source for apache spark
*   [**cpp serializers**](https://github.com/thekvs/cpp-serializers) - (forks: 82) (stars: 484) (watchers: 484) - benchmark comparing various data serialization libraries (thrift, protobuf etc.) for c++

**Code Generation**

*   [**gradle avro plugin**](https://github.com/davidmc24/gradle-avro-plugin) - (forks: 53) (stars: 135) (watchers: 135) - a gradle plugin to allow easily performing java code generation for apache avro. it supports json schema declaration files, json protocol declaration files, and avro idl files.
*   [**sbt avrohugger**](https://github.com/julianpeeters/sbt-avrohugger) - (forks: 37) (stars: 95) (watchers: 95) - sbt plugin for generating scala sources for apache avro schemas and protocols.
*   [**avromatic**](https://github.com/salsify/avromatic) - (forks: 11) (stars: 56) (watchers: 56) - generate ruby models from avro schemas

**Tabular**

*   [**iceberg**](https://github.com/Netflix/iceberg) - (forks: 48) (stars: 363) (watchers: 363) - iceberg is a table format for large, slow-moving tabular data

**Toolchains**

*   [**DevOps Python tools**](https://github.com/HariSekhon/DevOps-Python-tools) - (forks: 152) (stars: 310) (watchers: 310) - 80+ devops & data cli tools - aws, log anonymizer, spark, hadoop, hbase, hive, impala, linux, docker, spark data converters & validators (avro/parquet/json/csv/ini/xml/yaml), travis ci, ambari, blueprints, cloudformation, elasticsearch, solr, pig, ipython - python / jython tools
*   [**bigdata playground**](https://github.com/Chabane/bigdata-playground) - (forks: 54) (stars: 157) (watchers: 157) - a complete example of a big data application using : kubernetes (kops/aws), apache spark sql/streaming/mlib, apache flink, scala, python, apache kafka, apache hbase, apache parquet, apache avro, apache storm, twitter api, mongodb, nodejs, angular, graphql

**Data Store**

*   [**chana**](https://github.com/dcaoyuan/chana) - (forks: 50) (stars: 332) (watchers: 332) - avro data store based on akka

**Data Generation**

*   [**ratatool**](https://github.com/spotify/ratatool) - (forks: 45) (stars: 251) (watchers: 251) - a tool for data sampling, data generation, and data diffing

**Conversion**

*   [**json wikipedia**](https://github.com/diegoceccarelli/json-wikipedia) - (forks: 41) (stars: 241) (watchers: 241) - json wikipedia, contains code to convert the wikipedia xml dump into a json/avro dump
*   [**json avro converter**](https://github.com/allegro/json-avro-converter) - (forks: 60) (stars: 158) (watchers: 158) - json to avro conversion tool designed to make migration to avro easier.

**Database**

*   [**storagetapper**](https://github.com/uber/storagetapper) - (forks: 46) (stars: 205) (watchers: 205) - storagetapper is a scalable realtime mysql change data streaming, logical backup and logical replication service

**Binary**

*   [**jackson dataformats binar**](https://github.com/FasterXML/jackson-dataformats-binary) - (forks: 67) (stars: 187) (watchers: 187) - uber-project for standard jackson binary format backends: avro, cbor, protobuf, smile

**IDE**

*   [**vscode data preview**](https://github.com/RandomFractals/vscode-data-preview) - (forks: 20) (stars: 168) (watchers: 168) - data preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large json array/config, yaml, apache arrow, avro & excel data files

**Documentation**

*   [**avrodoc**](https://github.com/ept/avrodoc) - (forks: 60) (stars: 121) (watchers: 121) - documentation tool for avro schemas

**Validation**

*   [**aptos**](https://github.com/pennsignals/aptos) - (forks: 16) (stars: 141) (watchers: 141) - :sunny: a tool for validating data using json schema and converting json schema documents into different data-interchange formats

**Command Line Interface**

1.  [**schema registry**](https://github.com/lensesio/schema-registry) - (forks: 24) (stars: 96) (watchers: 96) - a cli and go client for kafka schema registry

**Semantics**

1.  [**schema\_salad**](https://github.com/common-workflow-language/schema_salad) - (forks: 33) (stars: 40) (watchers: 40) - semantic annotations for linked avro data

Like JSON Schema, Avro is a very data centric specification. I need to better understand how it is used by leading providers like Confluent for powering Kafka, but I also want to better understand its relationship to JSON Schema, and how it is used for AsyncAPI and OpenAPI. This dive provided me with a fresh look at how the API space is evolving, and also how data and our databases are still king when it comes to everything API.