{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# hindinballs:\n", "### Contributed by Praveen Yadav, [GitHub Repo](https://github.com/fnc11/nball4treehindi)\n", "## Resources Used:\n", "Files used for Hindi data generation are taken from this [github repo](https://bitbucket.org/sivareddyg/python-hindi-wordnet/src/master) which mainly took data from [IIT Bombay University](http://www.cfilt.iitb.ac.in/). \n", "You need to download w2v from this [website](https://fasttext.cc/docs/en/crawl-vectors.html).\n", "\n", "\n", "# Experiment 1: Training and evaluating nball embeddings\n", "## Experiment 1.1: Training nball embeddings\n", "* Please also go through this [Informative Report](https://drive.google.com/file/d/1ZZXAsNJxBQygkfmtakVvHw2gkLCan0rR/view?usp=sharing) on how Hindi Data is structure and how to process it to use it for this experiment or you can also go through data_preprocessing and data_generation notebooks for Hindi language." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```\n", "% you need to create an empty file nball.txt for output\n", "\n", "$ python nball.py --train_nball /Users//data/nball.txt --w2v /Users//data/cc.hi.300.vec --ws_child /Users//data/wordSenseChildren.txt --ws_catcode /Users//data/glove/catCodes.txt --log log.txt\n", "% --train_nball: output file of nball embeddings\n", "% --w2v: file of pre-trained word embeddings\n", "% --ws_child: file of parent-children relations among word-senses\n", "% --ws_catcode: file of the parent location code of a word-sense in the tree structure\n", "% --log: log file, shall be located in the same directory as the file of nball embeddings\n", "```" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "![exp1.1.jpeg](img/exp1.1.jpeg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The training process can take around 3 days. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Experiment 1.2: Checking whether tree structures are perfectly embedded into word-embeddings\n", "* main input is the output directory of nballs created in Experiment 1.1\n", "* shell command for running the nball construction and training process" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```\n", "$ python nball.py --zero_energy --ball --ws_child /Users//data/wordSenseChildren.txt\n", "% --zero_energy : output path of the nballs of Experiment 1.1, e.g. ```/Users//data/data_out```\n", "% --ball : the name of the output nball-embedding file\n", "% --ws_child /Users//data/wordSenseChildren.txt: file of parent-children relations among word-senses\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The checking process can take a very long time around 3-4 hours.\n", "* result\n", "\n", "If zero-energy is achieved, a big nball-embedding file will be created ```/```\n", "otherwise, failed relations and word-senses will be printed.\n", "\n", "** Test result at Ubuntu platform:\n", "![exp1.2.jpeg](img/exp1.2.jpeg)\n", " \n", "- [nball embeddings with 67152 balls](https://drive.google.com/open?id=1d-D7AF9rl2g_QFAGLD-m3N0DT_5-uZLS)\n", "- [nball.txt file](https://drive.google.com/open?id=1JWNuc2eBTWDrbG1MCdHlWtxenGVKX8to) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", " \n", " \n", " \n", " \n", "
\n", " \n", " \"Creative\n", " \n", " © T. Dong, C. Bauckhage
\n", " Licensed under a \n", " \n", " CC BY-NC 4.0\n", " .\n", "
\n", " Acknowledgments:\n", " This material was prepared within the project\n", " \n", " P3ML\n", " \n", " which is funded by the Ministry of Education and Research of Germany (BMBF)\n", " under grant number 01/S17064. The authors gratefully acknowledge this support.\n", "
" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" } }, "nbformat": 4, "nbformat_minor": 2 }