
 "Open


 "View



 "View


# Intro to `qsv count`

In this notebook we'll be covering examples of using [qsv's `count` command](https://github.com/jqnatividad/qsv/blob/master/src/cmd/count.rs).

This notebook uses **qsv**, an open-source CSV data wrangling toolkit available as a command line tool. You may learn more at https://github.com/jqnatividad/qsv.

## Table of Contents

1. [Setup](#1)
 - 1.1 [Environment Notes](#1.1)
 - 1.2 [Downloading qsv](#1.2)
 - 1.3 [Downloading a CSV Data Set](#1.3)
2. [Let's Use `qsv count`!](#2)
 - [2.1 Option: `--help`](#2.1)
 - [2.2 Running `qsv count` On Our CSV](#2.2)
 - [2.3 Option: `--human-readable, -H`](#2.3)
 - [2.4 Option: `--no-headers, -n`](#2.4)
 - [2.5 Option: `--width`](#2.5)
 - [2.5.1 Understanding the `--width` Option's Output](#2.5.1)
3. [Bash Use Cases](#3)
 - [3.1 String Interpolation](#3.1)
4. [Python Use Cases](#4)
 - [4.1 Running `qsv count` on User's Input File Path](#4.1)
5. [Conclusion](#5)


## Part 1: Setup


### 1.1 Environment Notes

 - The notebook was run on Google Colab based on an Ubuntu 22.04 LTS environment, so you may need to modify the commands if you're running on a different OS (i.e. Windows) or missing any dependencies.
 - Commands are prepended by an exclamation point `!` in this Jupyter notebook environment to execute them, but should be removed when using Bash on a terminal.


### 1.2 Downloading qsv

First, let's download qsv into our notebook from the [releases page](https://github.com/jqnatividad/qsv/releases). We'll use qsv 0.111.0:

In [25]:
# Downloading the .zip file that contains qsv
!curl -LO https://github.com/jqnatividad/qsv/releases/download/0.111.0/qsv-0.111.0-x86_64-unknown-linux-gnu.zip
# Unzipping the .zip file into a folder
!unzip -o qsv-0.111.0-x86_64-unknown-linux-gnu.zip -d qsv-0.111.0-files
# Moving the qsv binary file from the folder into /bin to use the qsv command anywhere on our system
!cp qsv-0.111.0-files/qsv /bin

 % Total % Received % Xferd Average Speed Time Time Time Current
 Dload Upload Total Spent Left Speed
 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 73.3M 100 73.3M 0 0 42.5M 0 0:00:01 0:00:01 --:--:-- 76.7M
Archive: qsv-0.111.0-x86_64-unknown-linux-gnu.zip
 inflating: qsv-0.111.0-files/README 
 inflating: qsv-0.111.0-files/qsv 
 inflating: qsv-0.111.0-files/qsv_glibc-2.31 
 inflating: qsv-0.111.0-files/qsv_glibc-2.31_rust_version_info.txt 
 inflating: qsv-0.111.0-files/qsv_nightly 
 inflating: qsv-0.111.0-files/qsv_nightly_rust_version_info.txt 
 inflating: qsv-0.111.0-files/qsvdp 
 inflating: qsv-0.111.0-files/qsvdp_glibc-2.31 
 inflating: qsv-0.111.0-files/qsvdp_nightly 
 inflating: qsv-0.111.0-files/qsvlite 
 inflating: qsv-0.111.0-files/qsvlite_glibc-2.31 
 inflating: qsv-0.111.0-files/qsvlite_nightly 



### 1.3 Downloading a CSV Data Set

Here is the main CSV data set I'll be using:

| Data set | Source | Download Link | Rounded size |
| --------- | ------ | ------------ | ------------ |
| Indicators of Anxiety or Depression Based on Reported Frequency of Symptoms During Last 7 Days | https://catalog.data.gov/dataset/indicators-of-anxiety-or-depression-based-on-reported-frequency-of-symptoms-during-last-7- | https://data.cdc.gov/api/views/8pt5-q6wp/rows.csv?accessType=DOWNLOAD | 2.1 MB |

Let's download the data set into our notebook as `data.csv`.

In [26]:
# Downloading the .csv file as data.csv
!curl https://data.cdc.gov/api/views/8pt5-q6wp/rows.csv?accessType=DOWNLOAD -o data.csv

 % Total % Received % Xferd Average Speed Time Time Time Current
 Dload Upload Total Spent Left Speed
100 2144k 0 2144k 0 0 1894k 0 --:--:-- 0:00:01 --:--:-- 1895k



## Part 2: Let's Use `qsv count`!

Time to explore with `qsv count`! Let's start by simply getting the help message for `qsv count`.


### 2.1 Option: `--help`

As with any qsv command, we'll use the `--help` option to get the help message:

In [27]:
!qsv count --help

Prints a count of the number of records in the CSV data.

Note that the count will not include the header row (unless --no-headers is
given).

For examples, see https://github.com/jqnatividad/qsv/blob/master/tests/test_count.rs.

Usage:
 qsv count [options] []
 qsv count --help

count options:
 -H, --human-readable Comma separate row count.
 --width Also return the length of the longest record.
 The count and width are separated by a semicolon.

Common options:
 -h, --help Display this message
 -n, --no-headers When set, the first row will be included in
 the count.



### 2.2 Running `qsv count` On Our CSV

We may start by getting the default output for `qsv count` by running it on our data set. This should get us the number of non-header records (rows) in our CSV:

In [28]:
!qsv count data.csv

13671


That's 13,671 non-header rows of data!


### 2.3 Option: `--human-readable`, `-H`

With the `--human-readable` option (or its alias `-H`), qsv should automatically add commas in the appropriate places to help us read the number better.

In [29]:
!qsv count data.csv --human-readable

13,671



### 2.4 Option: `--no-headers`, `-n`

What if we we want to also include the header row in the count (therefore counting all the rows in the CSV)?

We can use the `--no-headers` (or `-n`) option to include the header row in the count.

We should expect `13,672` as our output, including the commas by also using `-H`.

In [30]:
!qsv count data.csv -n -H

13,672



### 2.5 Option: `--width`

There's one more option that you might not expect.

What if we wanted to find out how long the longest row is in our data set, based on the number of characters it has?

The `--width` option should return the length of the longest record. The count and width are separated by a semicolon.

In [31]:
!qsv count data.csv --width

13671;237



#### 2.5.1 Understanding the `--width` Option's Output

The longest record has 237 characters. But you may have some questions about this width output:

- Does the width include the header if we don't specify the `--no-headers` option?
- Does the width include the commas within the rows that separate the field values?

Let's find out with this simple CSV file we'll name `sample.csv`:

In [32]:
# Write our data to sample.csv
!echo 'letter,number' > sample.csv
!echo 'alpha,13' >> sample.csv
!echo 'beta,24' >> sample.csv
# Display the data from sample.csv
!cat sample.csv

letter,number
alpha,13
beta,24


First let's use `--width` without `--no-headers`.

Our initial assumption is that if the headers are not included then we should get `8` as the width because there are `8` total characters in the row `alpha,13` when you also include the comma `,`.

In [33]:
!qsv count sample.csv --width

2;9


Hmm... We get 9. Why is that?

In our CSV data, there's a sort of hidden character at the end of each row: the newline character `\n`. This is included as a character in the width for our rows, so we simply add 1 to our estimate of 8. We can also see that the comma is included in the width output.

To further verify both of these claims, let's run the command with `--no-headers` to try and include the header row in the width output. Based on what we've learned so far, we can expect that all characters including the commas `,` between field values and the newline `\n` at the end of the longest row are included in the width output. So for the header row `letter,number` we should expect a width of `12 + 1 + 1 = 14`:

In [34]:
!qsv count sample.csv --width --no-headers

3;14


Awesome! Now that you have a better understanding of `qsv count`, try it out for yourself!


## Part 3: Bash Use Cases


### 3.1 String Interpolation

Let's say I want to write a sentence that dynamically includes the count of a CSV file within it. For example, I want to print out:

```
There are 1,000,000 non-header rows of data in the data set!
```

The `1,000,000` is arbitrary, that is, it should be the output from using `qsv count` on a CSV file. Here's a Bash script using the `echo` command we can use to achieve this:

In [35]:
!echo "There are $(qsv count data.csv -H) non-header rows of data in the data set!"

There are 13,671 non-header rows of data in the data set!



## Part 4: Python Use Cases


### 4.1 Running `qsv count` on User's Input File Path

Let's say we want to run a Python script where the user can simply enter the path to the CSV file (in our case we can just write `data.csv`) and then get the output of running `qsv count` on it. We can use the `subprocess` module to run `qsv` commands and print the output. Here's a sample script with comments to help understand how it works:

In [36]:
import subprocess

# Get user input for the CSV path
csv_path = input('Enter the path to your CSV file: ')

# Run qsv count on the CSV file with the -H option
command = ["qsv", "count", csv_path, "-H"]

# Get the qsv count output
subprocess_output = subprocess.run(command, capture_output=True)
# stdout - standard stream of output from our subprocess that runs count
# decode - convert the stdout output from bytes to string
# strip - remove any spaces/newline characters in the output
count = subprocess_output.stdout.decode().strip()

# Print the output of qsv count within a sentence
print(f"There are {count} non-header rows of data in the data set!")

Enter the path to your CSV file: data.csv
There are 13,671 non-header rows of data in the data set!


With this script we can now simply provide the file path and then get the `qsv count` output for it! Of course you may expand on this script with improvements such as:

- Verifying the file exists and is a CSV.
- Error handling with a try/except block and printing `stderr`.


## 5. Conclusion

In this notebook we covered example usage of `qsv count` for tallying the number of rows in a CSV file. We discussed all the options that are available for `qsv count`, and we also went further to discover how `qsv count` can be integrated in Bash and Python.