# DEep Learning for BOT detection by Mouse movements (DELBOT Mouse)
This repository is a small TypeScript library to distinguish humans and bots from their mouse movements with
the usage of [TensorFlow.js](https://www.tensorflow.org/js). It was initiated by an internship with the
[Bureau404](https://www.bureau404.fr) company as part of the Master's in Mathematics:
Specialist Focus on Careers in Computer Science and Artificial Intelligence ([University of Mons](https://web.umons.ac.be/fr/))
- Author : Christophe Grandmont.
- Internship supervisor (Bureau404) : Loïc Jean-Fulcrand ([website](https://www.malt.ch/profile/loicjeanfulcrand))
- Internship supervisor (UMONS) : Christophe Troestler ([website](https://staff.umons.ac.be/christophe.troestler/))
## About
Delbot Mouse is a prototype open-source tool using [TensorFlow.js](https://www.tensorflow.org/js) and some more
other libraries like RandomForest to develop a generic way to distinguish humans and bots from their mouse movements on
a web page. The repository contains three distinct JavaScript or Typescript modules:
1. delbot-core: the core of the library that contains everything to load an existing model and use it.
2. delbot-training: the part of the code using delbot-core to train new models from scratch. It also contains a
Generative Adversarial Network (GAN) implementation to generate bot trajectories.
3. delbot-example: example of usage of delbot-core and delbot-training to train new models or use one.
We also have a `python/` folder being the script location to manipulate data.
Delbot-core:
[](https://www.npmjs.com/package/@chrisgdt/delbot-mouse)
Delbot-training:
[](https://www.npmjs.com/package/@chrisgdt/delbot-training)
## How to install
### For Node
You can install delbot-core or delbot-training from NPM for Node with:
```
npm install @chrisgdt/delbot-mouse
```
```
npm install @chrisgdt/delbot-training
```
You will also need to install TensorFlow, we let you chose the version.
```
npm install @tensorflow/tfjs
```
__Notice that Delbot Mouse is initially meant to be used in the web browser__. Using Node.js or other
JavaScript engines might lead to some unknown errors that you can report here as issue.
### For the browser
For a browser usage, you can simply add some script tags. To load delbot-mouse, you need TensorFlow.js.
```html
```
To load delbot-training, you need both Tensorflow.js and delbot-training, and it is highly encouraged to load tfjs-vis
for training visualization.
```html
```
## How to use
With Node, you can import delbot with `import * as delbot from "@chrisgdt/delbot-mouse";`.
If needed, you can also import tensorFlow with `import * as tf from '@tensorflow/tfjs';`.
To import delbot-training, use `import * as delbotrain from "@chrisgdt/delbot-training";`.
---
For browser, once you have loaded delbot mouse with the script tag, everything is packed in the single variable
`delbot`, same for the variable `delbotrain` with delbot-training. Everything is exported from `src/index.ts`, check the
[doc of delbot-core](https://chrisgdt.github.io/DELBOT-Mouse/delbot-core/docs/modules/index.html) or the
[doc of delbot-training](https://chrisgdt.github.io/DELBOT-Mouse/delbot-training/docs/modules/index.html) for more information.
The folder `trained-models/` contains some pre-trained models that you can use easily from delbot-core with, for example,
`delbot.Models.rnn3`.
Otherwise, you have an entire example of usage without the training in `/delbot-example/` [here](https://chrisgdt.github.io/DELBOT-Mouse/delbot-example/src/index.html).
## Basic usage - Example
```ts
import * as delbot from "@chrisgdt/delbot-mouse"; // not needed in browser
// Create an empty recorder to store and compute mouse features
// Set the screen size to normalize (x,y) positions in [0,1]^2
recorder = new Recorder(window.screen.width, window.screen.height);
// Set max size to prevent high memory usage
recorder.setMaxSize(1e6);
// For each small movement, store it in the recorder
document.addEventListener("mousemove", event => {
recorder.addRecord({
time: event.timeStamp,
x: event.clientX,
y: event.clientY,
type: "Move" // optional
});
});
myVerifyElement.addEventListener("click", event => {
if (recorder.getRecords().length > 100) {
// Models are obtained from delbot.Models
const isHuman = recorder.isHuman(delbot.Models.rnn1);
recorder.clearRecord();
// ...
}
});
```
## Training
Here some example and explanations about the training of new models.
### Models
Some pre-trained models are available through `delbot.Models` :
- rnn1 : a neural network with two LSTM layers, the most efficient so far.
- rrn1faster : same, but with less LSTM cells, slightly less efficient.
- rnn2 : same, but the second LSTM layer is replaced by a dense layer.
- rnn3 : a model between rnn1 and rnn3, two LSTM layers then one dense.
- denseMatrix : a neural network with only dense layers and that takes a movement matrix.
- convolutional : a convolutional model that takes a movement matrix.
- randomForest : a random forest machine learning model.
To train your own model with custom tfjs layers, you can look at `delbot-training/src/models.ts` and read
the documentation. You only have to extend a class and define a single method.
You also have a GAN architecture in `delbot-training/src/GANtrain.ts`.
### Dataset
To manipulate the dataset, you can use `python/` folder, here is a short description of each python script.
- parseSample.py : takes some folders with text files of either bot or human trajectories and parse them to a single
session.json file containing all relative paths to those text files. This JSON file is the entry
of an instance of dataTraining.
- redraw.py : verifies that text trajectories in a folder is what we except by drawing them with matplotlib.
- botDrawing.py : starts python selenium and draw same shapes from arbitrary heuristics and multiple librairies.
- ganParser.py : takes some GAN folder of generator ouputs and parse them to something readable by `parseSample.py`.
For the dataset usage in the training itself, unzip the file `dataset.tar.gz` to obtain multiple data folder inside
`python/`, then run `parseSample.py` to get `sessions.json` (for normal training) and `sessions_human_only.json` (for GAN).
⚠️ The dataset is relatively limited: it contains only a small set of human trajectories, along with
trajectories generated by different algorithms / programs created for that purpose or found online.
### Example
The code of a basic training looks like this :
```ts
const data = new delbot.data.DataFeatures2({xSize: 24, numClasses: 2});
const filePath = "../../python/sessions.json"; // path from code to session.json
const datatraining = new delbotrain.DataTraining({filePath: filePath, data: data});
const rnn1Features2 = new delbotrain.ModelRNN({
dataTraining: datatraining,
nameToSave: "downloads://model-rnn1-features2", // directly use layerModel.save(nameToSave)
epoch: 25,
batchSize: 256,
useTfjsVis: true,
consoleInfo: true
});
// or simply 'rfFeature2.run()'
document.addEventListener('DOMContentLoaded', () => {rfFeature2.run()});
```
And for a GAN training :
```ts
// DataSimplePosition: output (x,y) for each movement
const data = new delbot.data.DataSimplePosition({xSize: 35, numClasses: 1});
const filePath = "../../python/sessions_human_only.json"; // path from code to sessions_human_only.json
const datatraining = new delbotrain.DataTraining({
filePath: filePath,
data: data,
trainingRatio: .9 // use 90% of the dataset as training set and 10% as validation
});
// Delay for tfjs-vis used in constructor
document.addEventListener('DOMContentLoaded', () => {
const gan = new delbotrain.GAN({
dataTraining: datatraining, // xSize is 35 so the generator outputs 35 movements per trajectories
epoch: 1000,
batchSize: 64,
generatorSeedSize: 100, // generator input is 100 random numbers
useTfjsVis: true,
consoleInfo: true,
downloadModels: 50, // save models every 50 epochs
downloadSample: true
});
gan.train(1); // training number 1 to have unique ID per GAN train
});
```