# nodejs.org Download Metrics
This directory contains anonymized log records for binary and source downloads of Node.js from nodejs.org.
## What data is available?
There is roughly one log file per day, starting on the 14th of May 2014 to the current day. There is a gap in the data from the 1st to the 21st of September 2015 due to a server configuration error.
The data is gleaned from the access logs by matching for known binary and source files in [/dist](https://nodejs.org/dist/) and the newer [/download/release/](https://nodejs.org/download/release) which is where the /dist/ directory also points. No other access log information is included in the data available here.
IP addresses and exact times are not reported, only days and geolocation data for the original IP addresses.
## What format is the data in?
Raw log files are available in the **[./logs/](./logs/)** sub-directory where each file's name takes the form: `nodejs.org-access.log.YYYYMMDD.TTTTTTTTTT.csv`, where the last entry in the file is used to create the string `YYYYMMDD` from the year, month and day of the month respectively and `TTTTTTTTTT` as the unix epoch timestamp. There may zero, one or two log files for a given day. However, when stitched together they should form a continuous record of the downloads from nodejs.org.
There is always a [nodejs.org-access.log.csv](./logs/nodejs.org-access.log.csv) file which represents the _current day's_ data and **is not final**, i.e. it will change from update to update, either appending new data or starting again for a new day. The other log files can be considered final until we decide to adjust the format at some point in the future.
The raw log files are comma-separated value format with the following columns: day, country, region, path, version, os, arch, bytes.
***Country*** and ***region*** are calculated by using MaxMind's [GeoLite2 City](http://dev.maxmind.com/geoip/geoip2/geolite2/) database and some entries may contain blank values where the look-up fails. `X-Forwarded-For` headers are used to determine the most likely origin IP address by parsing out the [leftmost non-private address](https://r.va.gg/2011/07/wrangling-the-x-forwarded-for-header.html).
The ***path*** field contains the actual path that was requested by the client, with the ***version***, ***os*** and ***arch*** columns calculated from this value.
The ***version*** field is occasionaly empty due to the availability of `node-latest.tar.gz` which is a symlink to the latest source tarball and the various `latest` directory symlinks when used to download `node.exe` (versions could be roughly calculated for all of these when matched with release dates if desired).
The **os** field contains operating system identifiers as well as `src` for source tarballs and `headers` for header tarballs.
The **arch** field is blank when **os** is `src` or `headers`.
## Pre-processed summary data
A set of pre-processed summary data is also made available in the **[./summaries/](./summaries/)** sub-directory. Each type of summary consists of:
* A directory containing CSV files with names matching the raw log file names and rows containing aggregated per-day data for the given summary datatype. Most of these files contain two rows, for two days, as the raw log files don't span neatly across day boundaries.
* An aggregation file, in CSV format, where each row is a single day during the full period for which there is available data.
* A PNG file with a simple plot of the data.
### Total
Contains two data columns: ***downloads*** and ***TiB***, where TiB is 240 bytes.
Source data: [./summaries/total/](./summaries/total/)
Aggregate data: [./summaries/total.csv](./summaries/total.csv)
Plot:
### Architectures
Contains a data column per distributed architecture, including ***unknown*** where the architecture cannot be determined (source or header tarballs). The columns are ordered by totals where the architecture that has the highest total is listed first and so on, the column ordering may therefore change over time. The list is not fixed and may expand when additional architectures are distributed from nodejs.org. The **.pkg** OS X installers are counted as ***x64*** even though, prior to Node.js v4, they were "universal binaries" containing both x64 and x86 versions, usable on both architectures.
Source data: [./summaries/arch/](./summaries/arch/)
Aggregate data: [./summaries/arch.csv](./summaries/arch.csv)
Plot:
### Countries
Contains a data column per country from the geolocation data, including ***unknown*** where a country could not be determined. The column names take the form of [ISO 3166](https://en.wikipedia.org/wiki/ISO_3166) country codes. The columns are ordered by totals where the country with the highest total is listed first and so on, the column ordering may therefore change over time. The list is not fixed and may expand if additional countries not already listed are discovered via geolocation.
Source data: [./summaries/country/](./summaries/country/)
Aggregate data: [./summaries/country.csv](./summaries/country.csv)
Plot:
### Operating Systems
Contains a data column per distributed operating system, including ***unknown*** where the operating system cannot be determined (due to `node-latest.tar.gz`), ***src*** for source tarballs and ***headers*** for header tarballs. The columns are ordered by totals where the operating system that has the highest total is listed first and so on, the column ordering may therefore change over time. The list is not fixed and may expand when additional operating systems are distributed from nodejs.org.
Source data: [./summaries/os/](./summaries/os/)
Aggregate data: [./summaries/os.csv](./summaries/os.csv)
Plot:
### Versions
Contains a data column per significant version number of Node.js. For <= 0.12, the semver-minor version number is listed, for >= 4.x the semver-major version number is listed. The ***unknown*** column contains counts of downloads where the version number could not be determined (see above note about `node-latest.tar.gz` and the `latest` directory symlinks coupled with `node.exe`). The columns are ordered by totals where the version that has the highest total is listed first and so on, the column ordering may therefore change over time. The list is not fixed and will expand when additional significant Node.js versions are made available for download.
Source data: [./summaries/version/](./summaries/version/)
Aggregate data: [./summaries/version.csv](./summaries/version.csv)
Plot:
## Additional stuff
The source of this file along with the various scripts used to generate the data files and graphs can be found in the [nodejs/build](https://github.com/nodejs/build) GitHub repository in the [setup/www/tools/metrics](https://github.com/nodejs/build/tree/master/setup/www/tools/metrics) directory. Questions, suggestions and pull requests are welcome in that repository.