# Log of all training runs

- Optional: Install [NVidia Apex](https://github.com/NVIDIA/apex) - only needed if the `--fast` option is used during training
- Download and unpack [Camera-PrIMus](https://grfia.dlsi.ua.es/primus/) to `$gitroot/Corpus`
- Download and unpack [GrandStaff](https://sites.google.com/view/multiscore-project/datasets) to `$gitroot/grandstaff`
- Download and unpack [Documents in the Wild](https://github.com/cvlab-stonybrook/PaperEdge?tab=readme-ov-files) to `$gitroot/DIW`
- Clone [CPMS](https://github.com/itec-hust/CPMS) to `$gitroot/CPMS`
- Make sure you have installed pytorch and CUDA correctly

## Train log

SER failues shown here should be taken with a grain of salt. The SER is calculated on data which was acquired differently than then training sets, which should make it fair. On the other hand the data is pretty wild and you couldn't expect the SER to be low.

## Next Runs

Try CustomVisionTransformer
Change alpha/beta ratio
Only increase encoder or decoder depth
Higher weight to naturals which cancel the key?

## Run 90 warped data sets, no neg dataset

Date: 6 May 2024
Training time: ~14h (fast option)
Commit: 8f774545179f3e7bfdbd58fe1a6c55473b8d4343
SER: 65%
Manual validation result: 14.5

## Run 86 Cleaned up data set

Date: 3 May 2024
Training time: ~14h (fast option)
Commit: b22104265be285b5a1d461c3fab2aa4589eb08cc
SER: 67%
Manual validation result: 17.9

## Run 84 More training data with naturals

Date: 1 May 2024
Training time: ~17h (fast option)
Commit: cf7313f0bcec82f4f7da738fbacabd56084f6604
SER: 70%
Manual validation result: 17.5

## Run 83 CustomVisionTransformer

Date: 30 Apr 2024
Training time: ~18h (fast option)
Commit: 80896fdba4dbe4f9b2bbba3dd66377b3b0d1faa5
SER: 66%

Enabled CustomVisionTransformer again.

## Run 82 Increased alpha to 0.2

Date: 29 Apr 2024
Training time: ~18h (fast option)
Commit: acbdf6dc235f393ef75158bdcf539e3b2e5b435e
SER: 64%
Manual validation result: 12.9

Increased alpha to 0.2.

## Run 81 Decreased depth

Date: 29 Apr 2024
Training time: ~18h (fast option)
Commit: 185c235cd0979faa2c087e59e71dbba684a68fb6
SER: 67%
Manual validation result: 13.1

Reverting 9e2c14122607a63c25253d1c5378c706859395ab and reverting to a depth of 4.

## Run 80 fixes arround accidentals in the data set

Date: 28 Apr 2024
Training time: ~18h (fast option)
Commit: 840318915929e5efe780780a543ea053b479d375
SER: 76%

## Run 79 Use semantic encoding without changes to the accidentals

Date: 27 Apr 2024
Training time: ~18h (fast option)
Commit: f732c3abc10b5b0b3e8942f722d695eb725e3e53
SER: 76%
Manual validation result: 80.9 

So far we used the format which TrOMR seems to use: Semantic format but with accidentals depending on how they are placed.

E.g. the semantic format is Key D Major, Note C#, Note Cb, Note Cb
so the TrOMR will be: Key D Major, Note C, Note Cb, Note C because the flat is the only visible accidental in the image.

With this attempt we try to use the semantic format without any changes to the accidentals.

## Run 77 Increased depth

Date: 26 Apr 2024
Training time: ~19h (fast option)
Commit: 9e2c14122607a63c25253d1c5378c706859395ab
SER: 74%
Manual validation result: 22.3

Encoder & decoder depth was increased from 4 to 6

## Run 76 Training data fix for accidentals

Date: 25 Apr 2024
Training time: ~16h (fast option)
Commit: 75d8688719494169f4b629fc51224d4aa846eee7
SER: 77%

Fixed that the training data didn't contain any natural accidentals.

## Run 74 Backtracking

Date: 24 Apr 2024
Training time: ~24h (fast option)
Commit: b4af54249fca5bf93650c518c7220f5de98c843c
SER: 77%

After experiments with focal loss and weight decay, we are backtracking to run 63.

## Run 74 

Date: 23 Apr 2024
Training time: ~24h (fast option)
Commit: 6580500e71602d5c74decde2946498c8e883392e
SER: 77%

Adding a weight to the lift/accidental tokens.

## Run 71 Weight decay

Date: 22 Apr 2024
Training time: ~17h (fast option)
Commit: 3b92eee2e56647fcb538b4ef5ef3704f12bfb2d1
SER: 77%

Reduced weight decay.

## Run 70 Focal loss

Date: 21 Apr 2024
Training time: ~17h (fast option), aborted after epoch 16 from 25
Commit: a6b87b71b3b69d87d424f3c86500081f6146d436
SER: 75%

Looks like a focal loss doesn't help to improve the performance of the lift detection.

## Run 63 Negative data set

Date: 11 Apr 2024
Training time: ~26h (fast option)
Commit: c360ab726df18879973e6829a1423c627a99afd5
SER: 74%
Manual validation result: 13.7

Increased data set size by introducing a negative data set with no musical symbols. And by using positive data sets more often with different mask values.

## Run 57 Dropout 0.8

Date: 07 Apr 2024
Training time: ~14h (fast option)
Commit: 3fc893c0ab547fe1958adf500b0afaf0f6990f80
SER: 81%

Changes to the conversion of the grandstaff dataset haven't been applied yet.

## Run 56 Dropout 0.1

Date: 07 Apr 2024
Training time: ~14h (fast option)
Commit: 5ec6beaf461c034340ad0d2f832d842bef8bee75
SER: 72%
Manual validation result: 13.8

Changes to the conversion of the grandstaff dataset haven't been applied yet.

## Run 55 Dropout 0.2

Date: 06 Apr 2024
Training time: ~14h (fast option)
Commit: d73d5a9d342d4d934c21409632f4e2854d14d333
SER: 74%
Manual validation result: 17.0

Changes to the conversion of the grandstaff dataset haven't been applied yet.

## Run 51 Dropout 0

Start of dropout tests, number ranges for dropouts are mainly based on https://arxiv.org/pdf/2303.01500.pdf.

Date: 05 Apr 2024
Training time: ~14h (fast option)
Commit: cd445caa5337d86cf723854cb2ef9e98dd4c5b76
SER: 72%
Manual validation result: 18.4

## Run50 InceptionResnetV2

We changed how we number runs and established a link between the run number and the git history.

Date: 05 Apr 2024
Training time: ~19h (fast option)
Commit: a57ee4c046842c0135adca84f06260cff8af732f
SER: 88%

We tried InceptionResnetV2. The training run showed overfitting and the resulting SER indicates poor results. The model is over 3 times larger than the ResNetV2 model and might require more work to prevent overfitting.

### Run3

Date: 02 Apr 2024
Training time: ~24h (fast option)
Commit: 9ddfff8b5782473e8831ca3791d9bef99f726654
SER: 73%
Manual validation result: 23.4

We decreased the vocabulary, the alpha/beta ratio in the loss function and made changes to the grandstaff dataset. While still performing worse than Run 0 in the manual validation, it gets closer now and in some specific tests performs even better than Run 0. We will have to backtrack from this point to find out which of the changes lead to an improved result.

### Run2

Date: 01 Apr 2024
Training time: ~48h
Commit: 516093a3f3840cb82922b4d7300d1568455277d568f85ea96fe41235a06ca8de6759f1db6b8fc39a
SER: 79%

### Run1

Date: 24 Mar 2024
Training time: ~24h (fast option)
Commit: 516093a3f3841235a06ca8de6759f1db6b8fc39a
SER: 82%

### Run 0

The weights from the [original paper](https://arxiv.org/abs/2308.09370).
SER: 74%
Manual validation result: 9.3