--- published: true layout: post title: The Open Source Community Tooling Built on Avro image: >- https://s3.amazonaws.com/kinlane-productions2/algorotoscope-master/braceros-domingo-ulloa-container-ship-in-seattle.jpg author: name: kinlane tags: - Avro - Tooling --- **Specification** * [**avro**](https://github.com/apache/avro) - (forks: 1066) (stars: 1594) (watchers: 1594) - apache avro is a data serialization system. **Registries** * [**schema registry**](https://github.com/confluentinc/schema-registry) - (forks: 736) (stars: 1234) (watchers: 1234) - confluent schema registry for kafka * [**schema registry ui**](https://github.com/lensesio/schema-registry-ui) - (forks: 88) (stars: 321) (watchers: 321) - web tool for avro schema registry | * [**schemer**](https://github.com/indix/schemer) - (forks: 3) (stars: 90) (watchers: 90) - schema registry for csv, tsv, json, avro and parquet schema. supports schema inference and graphql api. **Queries** * [**rq**](https://github.com/dflemstr/rq) - (forks: 45) (stars: 1553) (watchers: 1553) - record query - a tool for doing record analysis and transformation **Education** * [**examples**](https://github.com/confluentinc/examples) - (forks: 458) (stars: 670) (watchers: 670) - apache kafka and confluent platform examples and demos * [**kafka storm starter**](https://github.com/miguno/kafka-storm-starter) - (forks: 335) (stars: 726) (watchers: 726) - code examples that show to integrate apache kafka 0.8+ with apache storm 0.9+ and apache spark streaming 1.1+, while using apache avro as the data serialization format. * [**avro hadoop starter**](https://github.com/miguno/avro-hadoop-starter) - (forks: 86) (stars: 111) (watchers: 111) - example mapreduce jobs in java, hive, pig, and hadoop streaming that work on avro data. * [**Avro2TF**](https://github.com/linkedin/Avro2TF) - (forks: 19) (stars: 118) (watchers: 118) - avro2tf is designed to fill the gap of making users' training data ready to be consumed by deep learning training frameworks. **Serialization** * [**avsc**](https://github.com/mtth/avsc) - (forks: 98) (stars: 844) (watchers: 844) - avro for javascript :zap: * [**avro4s**](https://github.com/sksamuel/avro4s) - (forks: 178) (stars: 536) (watchers: 536) - avro schema generation and serialization / deserialization for scala * [**fastavro**](https://github.com/fastavro/fastavro) - (forks: 115) (stars: 362) (watchers: 362) - fast avro for python * [**gogen avro**](https://github.com/actgardner/gogen-avro) - (forks: 66) (stars: 191) (watchers: 191) - generate go code to serialize and deserialize avro schemas * [**avrohugger**](https://github.com/julianpeeters/avrohugger) - (forks: 82) (stars: 147) (watchers: 147) - generate scala case class definitions from avro schemas * [**scalavro**](https://github.com/GenslerAppsPod/scalavro) - (forks: 31) (stars: 119) (watchers: 119) - a reflection-based avro library in scala. * [**abracad**](https://github.com/damballa/abracad) - (forks: 31) (stars: 107) (watchers: 107) - a clojure library for de/serializing clojure data structures with avro. * [**python avro json serializ**](https://github.com/linkedin/python-avro-json-serializer) - (forks: 32) (stars: 104) (watchers: 104) - serializes data into a json format using avro schema. * [**avro\_turf**](https://github.com/dasch/avro_turf) - (forks: 44) (stars: 97) (watchers: 97) - a library that makes it easier to use the avro serialization format from ruby. * [**avro rs**](https://github.com/flavray/avro-rs) - (forks: 48) (stars: 89) (watchers: 89) - avro client library implementation in rust * [**json schema avro**](https://github.com/fge/json-schema-avro) - (forks: 22) (stars: 102) (watchers: 102) - avro to json schema, and back * [**jsAvroPhonetic**](https://github.com/torifat/jsAvroPhonetic) - (forks: 56) (stars: 84) (watchers: 84) - a javascript implementation of avro phonetic * [**kafka avro**](https://github.com/waldophotos/kafka-avro) - (forks: 34) (stars: 76) (watchers: 76) - node.js bindings for librdkafka with avro schema serialization. * [**pyavroc**](https://github.com/Byhiras/pyavroc) - (forks: 17) (stars: 46) (watchers: 46) - an avro file reader/writer for python * [**BlueSteel**](https://github.com/saksdirect/BlueSteel) - (forks: 15) (stars: 47) (watchers: 47) - an avro encoding/decoding library for swift. * [**libserdes**](https://github.com/confluentinc/libserdes) - (forks: 35) (stars: 36) (watchers: 36) - avro serialization/deserialization c/c++ library with confluent schema-registry support * [**vulcan**](https://github.com/fd4s/vulcan) - (forks: 8) (stars: 46) (watchers: 46) - functional avro for scala * [**avro schema**](https://github.com/tarantool/avro-schema) - (forks: 2) (stars: 48) (watchers: 48) - apache avro schema tools for tarantool **Generators** * [**xml avro**](https://github.com/elodina/xml-avro) - (forks: 56) (stars: 58) (watchers: 58) - generate avro schema and avro binary from xsd schema and xml **Connectors** * [**spark avro**](https://github.com/databricks/spark-avro) - (forks: 316) (stars: 535) (watchers: 535) - avro data source for apache spark * [**cpp serializers**](https://github.com/thekvs/cpp-serializers) - (forks: 82) (stars: 484) (watchers: 484) - benchmark comparing various data serialization libraries (thrift, protobuf etc.) for c++ **Code Generation** * [**gradle avro plugin**](https://github.com/davidmc24/gradle-avro-plugin) - (forks: 53) (stars: 135) (watchers: 135) - a gradle plugin to allow easily performing java code generation for apache avro. it supports json schema declaration files, json protocol declaration files, and avro idl files. * [**sbt avrohugger**](https://github.com/julianpeeters/sbt-avrohugger) - (forks: 37) (stars: 95) (watchers: 95) - sbt plugin for generating scala sources for apache avro schemas and protocols. * [**avromatic**](https://github.com/salsify/avromatic) - (forks: 11) (stars: 56) (watchers: 56) - generate ruby models from avro schemas **Tabular** * [**iceberg**](https://github.com/Netflix/iceberg) - (forks: 48) (stars: 363) (watchers: 363) - iceberg is a table format for large, slow-moving tabular data **Toolchains** * [**DevOps Python tools**](https://github.com/HariSekhon/DevOps-Python-tools) - (forks: 152) (stars: 310) (watchers: 310) - 80+ devops & data cli tools - aws, log anonymizer, spark, hadoop, hbase, hive, impala, linux, docker, spark data converters & validators (avro/parquet/json/csv/ini/xml/yaml), travis ci, ambari, blueprints, cloudformation, elasticsearch, solr, pig, ipython - python / jython tools * [**bigdata playground**](https://github.com/Chabane/bigdata-playground) - (forks: 54) (stars: 157) (watchers: 157) - a complete example of a big data application using : kubernetes (kops/aws), apache spark sql/streaming/mlib, apache flink, scala, python, apache kafka, apache hbase, apache parquet, apache avro, apache storm, twitter api, mongodb, nodejs, angular, graphql **Data Store** * [**chana**](https://github.com/dcaoyuan/chana) - (forks: 50) (stars: 332) (watchers: 332) - avro data store based on akka **Data Generation** * [**ratatool**](https://github.com/spotify/ratatool) - (forks: 45) (stars: 251) (watchers: 251) - a tool for data sampling, data generation, and data diffing **Conversion** * [**json wikipedia**](https://github.com/diegoceccarelli/json-wikipedia) - (forks: 41) (stars: 241) (watchers: 241) - json wikipedia, contains code to convert the wikipedia xml dump into a json/avro dump * [**json avro converter**](https://github.com/allegro/json-avro-converter) - (forks: 60) (stars: 158) (watchers: 158) - json to avro conversion tool designed to make migration to avro easier. **Database** * [**storagetapper**](https://github.com/uber/storagetapper) - (forks: 46) (stars: 205) (watchers: 205) - storagetapper is a scalable realtime mysql change data streaming, logical backup and logical replication service **Binary** * [**jackson dataformats binar**](https://github.com/FasterXML/jackson-dataformats-binary) - (forks: 67) (stars: 187) (watchers: 187) - uber-project for standard jackson binary format backends: avro, cbor, protobuf, smile **IDE** * [**vscode data preview**](https://github.com/RandomFractals/vscode-data-preview) - (forks: 20) (stars: 168) (watchers: 168) - data preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large json array/config, yaml, apache arrow, avro & excel data files **Documentation** * [**avrodoc**](https://github.com/ept/avrodoc) - (forks: 60) (stars: 121) (watchers: 121) - documentation tool for avro schemas **Validation** * [**aptos**](https://github.com/pennsignals/aptos) - (forks: 16) (stars: 141) (watchers: 141) - :sunny: a tool for validating data using json schema and converting json schema documents into different data-interchange formats **Command Line Interface** 1. [**schema registry**](https://github.com/lensesio/schema-registry) - (forks: 24) (stars: 96) (watchers: 96) - a cli and go client for kafka schema registry **Semantics** 1. [**schema\_salad**](https://github.com/common-workflow-language/schema_salad) - (forks: 33) (stars: 40) (watchers: 40) - semantic annotations for linked avro data Like JSON Schema, Avro is a very data centric specification. I need to better understand how it is used by leading providers like Confluent for powering Kafka, but I also want to better understand its relationship to JSON Schema, and how it is used for AsyncAPI and OpenAPI. This dive provided me with a fresh look at how the API space is evolving, and also how data and our databases are still king when it comes to everything API.