## Using SOMEF

The SOftwae MEtadata Extraction Framework (SOFEF) can be used to extract metadata from a software repository and its documentation. In this notebook we cover a few examples on how to configure and run the tool

### 1. Tool options
By executing the help command, you can see the different options for running SOMEF:

In [4]:
%%bash
somef --help

Usage: somef [OPTIONS] COMMAND [ARGS]...

Options:
 -h, --help Show this message and exit.

Commands:
 configure Configure GitHub credentials and classifiers file path
 describe Running SOMEF Command Line Interface
 version Show SOMEF version.


### 2. Setting up SOMEF
Before you run SOMEF for the first time, you have to configure it. This only needs to be done **once**. 

Running somef with the `-a` option will use the defaults, but it won't use any GitHub API token (i.e., it is limited by GitHub). You can edit the SOMEF configuration file afterwards to include the token, don't worry.

In [5]:
%%bash
somef configure -a

SOftware Metadata Extraction Framework (SOMEF) Command Line Interface
Configuring SOMEF automatically. To assign credentials edit the configuration file or run the intearctive mode
Success


[nltk_data] Downloading package wordnet to /home/dgarijo/nltk_data...
[nltk_data] Package wordnet is already up-to-date!


### 3. Running SOMEF
Now you are set up for running SOMEF. Let's analyze the repository KGTK (https://github.com/usc-isi-i2/kgtk), a repository of a Knowledge Graph Toolkit. If you want to analyze any other repository, just add its link. If you want to obtain only results with a high confidence, you may incrrease the confidence threshold used for the supervised classifiers (default: 0.8). This can be done with the flag `-t`. See `somef describe --help` for more information

In [6]:
%%bash
somef describe -r https://github.com/usc-isi-i2/kgtk -o test.json

SOftware Metadata Extraction Framework (SOMEF) Command Line Interface
Loading Repository https://github.com/usc-isi-i2/kgtk Information....
https://api.github.com/repos/usc-isi-i2/kgtk
Downloading https://github.com/usc-isi-i2/kgtk/archive/master.zip
['https://github.com/usc-isi-i2/kgtk/tree/master/docs']
['https://github.com/usc-isi-i2/kgtk/tree/master/docs', 'https://github.com/usc-isi-i2/kgtk/tree/master/examples/docs']
NOTEBOOKS:
['use-cases/Knowledge-Graph-Profiler.ipynb', 'use-cases/Generate-Triples-And-Load-Blazegraph.ipynb', 'use-cases/Wikidata Subsets.ipynb', 'use-cases/Wikidata Useful Files.ipynb', 'tutorial/3 Enhance KG.ipynb', 'tutorial/Knowledge-Graph-Profiler.out.ipynb', 'tutorial/5 Embeddings.ipynb', 'tutorial/4 Generate Triples.ipynb', 'tutorial/2 Construct KG.ipynb', 'tutorial/1 Introduction.ipynb', 'examples/Example2 - Curation and Statistics.ipynb', 'examples/CSKG Use Case.ipynb', 'examples/abbreviate_human_labels.ipynb', 'examples/Example12 - CSKG Analysis.ipynb', '



### 4. Browse obtained results:
Now let's see the result file, which contains a set of entries with the results found. For each entry, SOMEF returns the technique used in the extraction and the confidence associated with such technique. For example, if a supervised classifier has been used, somef returns the score for each sentence in the excerpt. To export results as RDF, just use the `-g` and `-f` options

In [7]:
import json
f = open('test.json',) 
results = json.load(f)
results

{'description': [{'excerpt': 'KGTK is a Python toolkit for building applications using knowledge graphs (KG). KGTK is designed for ease of use, scalability and speed. It represents KGs as simple TSV files with four columns to represent the head, relation and tail of a triple, as well as an identifier for each triple. This simple model allows KGTK to operate on property graphs and on RDF graphs. KGTK offers a comprehensive collection of 20+ commands to import, transform, query and analyze KGs, including wrappers for state of the art graph analytics and deep learning libraries. KGTK is optimized for batch processing, making it easy to write KG pipelines that process large KGs such as Wikidata on a laptop to produce datasets for use in downstream applications. KGTK is open-source software released under the MIT license. \n',
 'confidence': [0.9100667347076543],
 'technique': 'Supervised classification'},
 {'excerpt': 'Knowledge Graph Toolkit ',
 'confidence': [1.0],
 'technique': 'GitHub 