/**
 * file: journal_karbytes_06november2025_p1.txt
 * type: plain-text
 * date: 07_NOVEMBER_2025
 * author: karbytes
 * license: PUBLIC_DOMAIN
 */

The following is a continuation of what is discussed in the previous journal entry at the following Uniform Resource Locator:

https://raw.githubusercontent.com/karlinarayberinger/KARLINA_OBJECT_extension_pack_49/main/journal_karbytes_06november2025_p0.txt

* * *

// .fq.gz --> .fasta.gz

After karbytes downloaded one of the two Gzip-compressed FASTQ files available for karbytes to download in its Sequencing dot Com account (for the 30-fold whole genome sequencing kit whose identifier is SQ873FU8), karbytes used the following Unix commands to convert that Gzip-compressed FASTQ file to a Gzip-compressed FASTA file (which took approximately two hours for the conversion process to complete).

[bash]

sudo apt update
sudo apt install seqtk

seqtk seq -a KarlinaBeringer-SQ873FU8-30x-WGS-Sequencing_com-11-06-25.1.fq.gz | gzip > KarlinaBeringer-SQ873FU8-30x-WGS-Sequencing_com-11-06-25.1.fasta.gz

[end bash]

The resulting .fasta.gz file has an approximate file size of approximately 15.7 gigabytes.

* * *

// .fasta.gz --> .fasta.naf

In order to maximize the data compression of the .fasta.gz file, karbytes installed NAF via the Unix command line and then generated a .fasta.naf version of the aforementioned .fasta.gz file.

(NAF (i.e. Nucleotide Archival Format) is a specialized, lossless compression format designed specifically for storing and sharing DNA and RNA sequences).

// 1. Install required build tools:

[bash]

sudo apt update
sudo apt install git gcc make diffutils perl

[end bash]

// 2. Clone the repository:

[bash]

git clone --recurse-submodules https://github.com/KirillKryukov/naf.git
cd naf

[end bash]

// 3. Build and test:

[bash]

make
make test

[end bash]

// 4. Install the binaries:

[bash]

sudo make install

[end bash]

// 5. Verify installation:

[bash]

ennaf --version
unnaf --version

[end bash]

(The NAF tools are: ennaf (compress) and unnaf (decompress)).

// 6. Create temporary directory for ennaf data

[bash]

mkdir -p ~/Desktop/tmp

[end bash]

(The .fasta.gz file is located in the Desktop directory. Hence, tmp is located in the Desktop directory).

// 7. Run ennaf

[bash]

zcat ~/Desktop/KarlinaBeringer-SQ873FU8-30x-WGS-Sequencing_com-11-06-25.1.fasta.gz | \
ennaf --temp-dir ~/Desktop/tmp > \
~/Desktop/KarlinaBeringer-SQ873FU8-30x-WGS-Sequencing_com-11-06-25.1.fasta.naf

[end bash]

The resulting .fasta.naf file has an approximate data size of 12.8 gigabytes (which is still too large to fit onto the karbytes_origins M_DISC. Hence, karbytes decided to proceed by splitting the .fasta.gz file into chunk files no larger than 5 megabytes each (which is sufficiently low-entropy to be WayBack Machine saveable) using the same process outlined in the following journal entry):

https://karbytesforlifeblog.wordpress.com/journal_karbytes_01october2025/