{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Introduction\n",
    "In an upcoming analysis, we want to calculate the structural similarity between test cases. For this, we need the information which test methods call which code in the application (the \"production code\"). \n",
    "\n",
    "In this blog post, I'll show how you can get this information by using [jQAssistant](http://buschmais.github.io/jqassistant/doc/1.3.0/) for a Java application. With jQAssistant, you can scan the structural information of your software. I'll also explain the relevant database query that delivers the information we need later on."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Dataset\n",
    "\n",
    "I've scanned a small pet project of mine called \"DropOver\" that was originally developed as a web application for organizing parties or bar-hoppings. I've just added jQAssistant as a Maven plugin to my project's Maven build ([see here for a mini tutorial](https://github.com/JavaOnAutobahn/spring-petclinic/blob/master/readme.md)). The structures of this application are stored by jQAssistant in a property graph within the graph database [Neo4j](https://neo4j.com/). A subgraph with the structural information that's relevant for our purposes looks like this:\n",
    "\n",
    "![](../notebooks/resources/test_refactoring.png)\n",
    " \n",
    "We can see the scanned software entities like Java types (red) or methods (blue) as well their relationships with each other. We can now explore the database's content with the included Neo4j browser frontend or access the data with a programming language. I use Python (the programming language we'll write our analysis later on) with the `py2neo` module (the bridge between Python and Neo4j). The information we need can be retrieved by creating and executing a Cypher query (explained in the following) &ndash; Neo4j's language for accessing information in the property graph.\n",
    "\n",
    "Last, we store the results in a Pandas `DataFrame` named `invocations` for a nice tabular representation of the outputs and for further analysis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>test_type</th>\n",
       "      <th>test_method</th>\n",
       "      <th>prod_type</th>\n",
       "      <th>prod_method</th>\n",
       "      <th>invocations</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>AddCommentTest</td>\n",
       "      <td>void blankSiteContainsRightComment()</td>\n",
       "      <td>AddComment</td>\n",
       "      <td>at.dropover.comment.boundary.GetCommentRespons...</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>AddCommentTest</td>\n",
       "      <td>void blankSiteContainsRightCreationTime()</td>\n",
       "      <td>AddComment</td>\n",
       "      <td>at.dropover.comment.boundary.GetCommentRespons...</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>AddCommentTest</td>\n",
       "      <td>void blankSiteContainsRightUser()</td>\n",
       "      <td>AddComment</td>\n",
       "      <td>at.dropover.comment.boundary.GetCommentRespons...</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>AddCommentTest</td>\n",
       "      <td>void failsAtCommentNull()</td>\n",
       "      <td>AddComment</td>\n",
       "      <td>at.dropover.comment.boundary.GetCommentRespons...</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>AddCommentTest</td>\n",
       "      <td>void failsAtCreatorNull()</td>\n",
       "      <td>AddComment</td>\n",
       "      <td>at.dropover.comment.boundary.GetCommentRespons...</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        test_type                                test_method   prod_type  \\\n",
       "0  AddCommentTest       void blankSiteContainsRightComment()  AddComment   \n",
       "1  AddCommentTest  void blankSiteContainsRightCreationTime()  AddComment   \n",
       "2  AddCommentTest          void blankSiteContainsRightUser()  AddComment   \n",
       "3  AddCommentTest                  void failsAtCommentNull()  AddComment   \n",
       "4  AddCommentTest                  void failsAtCreatorNull()  AddComment   \n",
       "\n",
       "                                         prod_method  invocations  \n",
       "0  at.dropover.comment.boundary.GetCommentRespons...            1  \n",
       "1  at.dropover.comment.boundary.GetCommentRespons...            1  \n",
       "2  at.dropover.comment.boundary.GetCommentRespons...            1  \n",
       "3  at.dropover.comment.boundary.GetCommentRespons...            1  \n",
       "4  at.dropover.comment.boundary.GetCommentRespons...            1  "
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import py2neo\n",
    "import pandas as pd\n",
    "\n",
    "graph = py2neo.Graph()\n",
    "\n",
    "query = \"\"\"\n",
    "MATCH \n",
    "  (testMethod:Method)\n",
    "    -[:ANNOTATED_BY]->()-[:OF_TYPE]->\n",
    "      (:Type {fqn:\"org.junit.Test\"}),\n",
    "  (testType:Type)-[:DECLARES]->(testMethod),\n",
    "  (type)-[:DECLARES]->(method:Method),\n",
    "  (testMethod)-[i:INVOKES]->(method)\n",
    "WHERE\n",
    "  NOT type.name ENDS WITH \"Test\" \n",
    "  AND type.fqn STARTS WITH \"at.dropover\"\n",
    "  AND NOT method.signature CONTAINS \"<init>\"\n",
    "RETURN \n",
    "  testType.name as test_type,\n",
    "  testMethod.signature as test_method,\n",
    "  type.name as prod_type,\n",
    "  method.signature as prod_method,\n",
    "  COUNT(DISTINCT i) as invocations\n",
    "ORDER BY \n",
    "  test_type, test_method, prod_type, prod_method\n",
    "\"\"\"\n",
    "\n",
    "invocations = pd.DataFrame(graph.data(query))\n",
    "# reverse sort columns for better representation\n",
    "invocations = invocations[invocations.columns[::-1]]\n",
    "invocations.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Cypher query explained\n",
    "Let's go through that query from above step by step. The Cypher query that finds all test methods that call methods of our production types works as follows:\n",
    "\n",
    "In the `MATCH` clause, we start our search for particular structural information. We first identify all test methods. These are methods that are annotated by `@Test`, which is an annotation that the JUnit4 framework provides.\n",
    "```cypher\n",
    "MATCH\n",
    "  (testMethod:Method)-[:ANNOTATED_BY]->()-[:OF_TYPE]->(:Type {fqn:\"org.junit.Test\"})\n",
    "```\n",
    "Next, we find all the test classes that declare (via the `DECLARES` relationship type) all test methods from above.\n",
    "```cypher\n",
    "  (testType:Type)-[:DECLARES]->(testMethod)\n",
    "```\n",
    "With the same approach, we first identify all the Java types and methods (at first regardless of their meaning. Later, we'll define them as production types and methods). \n",
    "```cypher\n",
    "  (type)-[:DECLARES]->(method:Method)\n",
    "```\n",
    "Last, we find test methods that call methods of the other methods by querying the appropriate `INVOKES` relationship.\n",
    "```cypher\n",
    "  (testMethod)-[i:INVOKES]->(method)\n",
    "```\n",
    "\n",
    "In the `WHERE` clause, we define what we see as production type (and thus implicitly production method). We achieve this by saying that a production type is not a test and that the types must be within our application. These are all types that start with the `fqn` (full qualified name) `at.dropover`. We also filter out any calls to constructors, because those are irrelevant for our analysis.\n",
    "```cypher\n",
    "WHERE\n",
    "  NOT type.name ENDS WITH \"Test\" \n",
    "  AND type.fqn STARTS WITH \"at.dropover\"\n",
    "  AND NOT method.signature CONTAINS \"<init>\"\n",
    "```\n",
    "\n",
    "In the `RETURN` clause, we just return the information needed for further analysis. These are all names of our test and production types as well as the signatures of the test methods and production methods. We also count the number of calls from the test methods to the production methods. This is a nice indicator for the cohesion of a test method to a production method.\n",
    "```cypher\n",
    "RETURN\n",
    "  testType.name as test_type,\n",
    "  testMethod.signature as test_method,\n",
    "  type.name as prod_type,\n",
    "  method.signature as prod_method,\n",
    "  COUNT(DISTINCT i) as invocations\n",
    "```\n",
    "In the `ORDER BY` clause, we simply order the results in a useful way (and for reproducible results):\n",
    "```cypher\n",
    "ORDER BY\n",
    "  test_type, test_method, prod_type, prod_method\n",
    "```\n",
    "A long explanation, but if you are familiar with Cypher and the underlying schema of your graph, you write those queries within half a minute."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Data export\n",
    "Because we need that data in a follow-up analysis, we store the information in a semicolon-separated file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "invocations.to_csv(\"datasets/test_code_invocations.csv\", sep=\";\", index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Conclusion\n",
    "This post was just the prelude for more in-depth analysis for structural test case similarity. We quickly got the information about which test method calls which production method. Albeit its a pure static (or structural) view of our code, it delivers valuable insights in further analysis.\n",
    "\n",
    "Stay tuned!"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}