:title: Metrics Grimoire Architecture

Metrics Grimoire
################

Metrics Grimoire is a toolset focused on retrieving software development
information from publicly available data sources.

At a Glance
===========

:Hosts:
 * https://activity.openstack.org/dash/
:Puppet:
 * :file:`manifests/init.pp`
 * :file:`manifests/cvsanaly.pp`
 * :file:`manifests/mlstats.pp`
 * :file:`manifests/sibyl.pp`
 * :file:`manifests/bicho.pp`
 * :file:`manifests/ircanalysis.pp`
 * :file:`manifests/sortinghat.pp`
:Projects:
 * https://github.com/MetricsGrimoire

Overview
========

The site https://activity.openstack.org/dash is based on the information retrieved
by Metrics Grimoire toolset. 

Retrieval Process Architecture
==============================

These are the tools that are used in the information retrieval process:

CVSAnalY
--------

Git information retrieval. This tool analyzes all of the git repositories available under a local directory and stores such information in a MySQL database.

Bicho
-----

This tool is used for several purposes:
- Launchpad tickets retrieval from https://launchpad.net/openstack
- Gerrit information retrieval from https://review.openstack.org
- StoryBoard stories retrieval from https://storyboard.openstack.org/

Each of these tools stores the correspondant API of each of the mentioned
technologies. That information is later stored in a MySQL database.

Sibyl
-----

Sibyl retrieves information from the Askbot site of OpenStack at 
http://ask.openstack.org/. This is later stored in a MySQL database.

IRCAnalysis
-----------

This is a simple Python based script that parses log information. This is 
retrieved from http://eavesdrop.openstack.org/irclogs/.

Mailing List Stats
------------------

This tool parses mailing lists information in mbox format. This analyzes
all information found at http://lists.openstack.org/cgi-bin/mailman/listinfo.

Unique identities generator
---------------------------

This tool uses heuristics to match same identities accross the several
repositories of information. This tool simply adds or updates information
in the existing databases.

Architecture schema
-------------------


  Git------------> CVSAnalY--------> CVSAnaly database-----|
  Launchpad------> Bicho-----------> Bicho database--------|
  Gerrit---------> Bicho-----------> Bicho database--------|
  StoryBoard-----> Bicho-----------> Bicho database--------|+Unique identities db
  Askbot---------> Sibyl-----------> Sibyl database--------|
  IRC logs-------> IRCAnalysis-----> IRCAnalysis database--|
  Mailing lists--> MLStats---------> MLStats database------|


Data Analysis Architecture
==========================

The information process is done through the GrimoireLib library available at
https://github.com/VizGrimoire/GrimoireLib. This library is a database
transparency layer that helps to access the several databases schemas and
generate JSON files.

Given that GrimoireLib is a library, there's a need for a proper tool to use that library.
Report tool is the tool in charge of this analysis, and through the GrimoireLib API, 
generate JSON files.


Architecture schema
-------------------

CVSAnalY database (Git)-----------|                      |
Bicho database (Launchpad)--------|                      |
Bicho database (Gerrit)-----------|                      |
Bicho database (StoryBoard)-------|-Unique identities db-|-GrimoireLib--> JSON files
Sibyl database (Askbot)-----------|                      | 
IRCAnalysis database--------------|                      |
MLStats database (Mailing lists)--|                      |


Visualization
=============

The final step for the whole process is based on the visualization of the JSON files.
In order to avoid dependencies from third party technologies, this approach is focused
on generating static JSON files that feeds the JavaScript machinery of Grimoire toolset.
However, other technologies can be used. 

Visualization consists of two more projects: VizgrimoireJS and VizgrimoireJS-lib.
The latter is the JavaScript library in charge of accessing all of the JSON files and
retrieve the needed information. VizgrimoireJS is a set of HTML/CSS templates (bootstrap based)
that take advantage of such library and visualizes the current version of the dashboard.

Thus, the visualization side only needs of an Apache that serves HTML/CSS/JS/JSON files.


Architecture schema
-------------------

Data Sources -> Retrieval Process -> MySQL ddbb -> Data Analysis -> JSON files -> Visualization