{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction\n",
"In this notebook, I'll show you a way how to you can connect JavaDoc comments with the Java nodes of jQAssistant's scan result. I'll elaborate the way to the solutions, because I hope that you can do a similar problem solving analysis (aka XML importing and wrangling) on your own."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Context\n",
"In a [blog post](https://www.feststelltaste.de/my-experiences-with-jqassistant-so-far/), [Yoann Buch got me to thinking about](https://www.feststelltaste.de/my-experiences-with-jqassistant-so-far/#comment-6) a how to add comments to the already existing class nodes in Neo4j scanned by jQAssistant). I meant it would be possible to do it with the Python library Pygments. I experimented with it a little bit, but it seems that it isn't going well. The main problem is the lack of structural information. "
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"# Idea\n",
"I thought \"wait a minute, what about the JavaDoc that's generated in HTML\" and I thought one step further: \"If there are generators for HTML, is there a generator for XML, too?\". I just googled \"javadoc xml\" and found it: https://github.com/MarkusBernhardt/xml-doclet, \"A doclet to output javadoc as XML\" as Maven plugin. \n",
"\n",
"So the journey began..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![](resources/jqassistant_javadoc.jpg)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Implementation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Preparing the code\n",
"In this prototype, I work with an old jQAssistant version 1.1.3 (because I haven't figured out how to read XML like described in [Dirk's answer at StackOverflow](http://www.stackoverflow.com/questions/31425610/error-when-scanning-xml-file-with-jqassistant) with the new version). \n",
"\n",
"I use an corresponding, old version of jQAssistant's [Spring Petclinic demo repo](https://github.com/buschmais/spring-petclinic/): \n",
" \n",
"git checkout f5811bf2ed9c5369a749cb90ef9e7a261de03760 .\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Getting JavaDoc as XML\n",
"For getting all the JavaDoc from the source code, I just add the Maven plugin mentioned above additionally to the already existing ones:\n",
"```xml\n",
"\n",
" org.apache.maven.plugins\n",
" maven-javadoc-plugin\n",
" 2.10.4\n",
" \n",
" \n",
" xml-doclet\n",
" process-resources\n",
" \n",
" javadoc\n",
" \n",
" \n",
" com.github.markusbernhardt.xmldoclet.XmlDoclet\n",
" -d ${project.build.directory} -filename javadoc.xml\n",
" false\n",
" \n",
" com.github.markusbernhardt\n",
" xml-doclet\n",
" 1.0.5\n",
" \n",
" \n",
" \n",
" \n",
"\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Scanning the JavaDoc XML\n",
"In theory, I now have just to add this file to jQAssistant's scan configuration as <scanInclude>:\n",
"```xml\n",
"\n",
" ${project.build.directory}/javadoc.xml\n",
" xml:document\n",
"\n",
"```\n",
"I've added the <scope> to adivse jQAssistant to scan the whole XML content (according to this [StackOverflow answer](http://stackoverflow.com/questions/31425610/error-when-scanning-xml-file-with-jqassistant)).\n",
"\n",
"\n",
"Note: Unfortunately I haven't got this working as of today, i.e. the javadoc.xml isn't appearing in the database. So I've scanned the XML file in the target folder with an \n",
" \n",
"jqassistant.sh scan -f xml:document::javadoc.xml \n",
" \n",
"manually after scanning the project. This approach won't work with jQAssistant version 1.1.4+, so that's why I'm using an old version. I'll update this notebook when I've got this working with the newest version. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## First build\n",
"I just built the complete project: \n",
" \n",
"mvn clean install \n",
"\n",
"jQAssistant places some nice graphs into the Neo4j database:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"application/javascript": [
"$(\"head\").append($(\"\").attr({\n",
" rel: \"stylesheet\",\n",
" type: \"text/css\",\n",
" href: \"https://cdnjs.cloudflare.com/ajax/libs/vis/4.8.2/vis.css\"\n",
"}));\n",
"require.config({ paths: { vis: 'https://cdnjs.cloudflare.com/ajax/libs/vis/4.8.2/vis.min' } }); require(['vis'], function(vis) { window.vis = vis; window.vis = vis; }); "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import lib.neo4jupyter_mod as n4j\n",
"import py2neo\n",
"graph = py2neo.Graph()\n",
"\n",
"n4j.init_notebook_mode()\n",
"n4j.draw(graph, \n",
" n='n:Class { name: \"Pet\"}',\n",
" r=\"r:DEPENDS_ON\", \n",
" m=\"m:Class\", \n",
" options={\"Class\": \"name\"}, \n",
" limit=5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Additionaly, the build outputs a nice XML file named javadoc.xml with all the existing comments into the target folder\n",
"```xml\n",
"\n",
"\n",
" \n",
" \n",
" In Servlet 3.0+ [..]\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
"...\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I scanned this file already, i. e. it's also contained in the database. Let's have a look at it! \n",
" \n",
"![](resources/neo4j_comments.png)\n",
"\n",
"The corresponding texts for the comments are in the children of the <comment> elements:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"n4j.draw(graph, \n",
" n='n:Element { name: \"comment\"}',\n",
" m='m:Text',\n",
" options={\"Element\": \"name\", \"Text\": \"value\"}, \n",
" limit=10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"But there is one problem left: The XML scanner scans **all** elements and places each element into a separate node. So if we have some HTML formatting in the JavaDoc like:\n",
"\n",
"```xml\n",
"\n",
" <code>Validator</code> for <code>Pet</code> forms.\n",
" <p>\n",
" We're not using Bean Validation annotations here because it is easier to define such validation rule in Java.\n",
" </p>\n",
"\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This leads to an \"interesting\" looking graph:\n",
"\n",
"![](resources/neo4j_interesting_comments_graph.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There's an easy solution for this, but let's first start with graph database action and solve that problem later."
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"# Data Wrangling with Neo4j"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"'\\nMATCH \\n(element:Element)-[:HAS_ELEMENT]->(Element { name : \"comment\"})-[:HAS_TEXT]->(doc_text:Text),\\n(element)-[:HAS_ATTRIBUTE]->(building_block:Attribute{name: \"qualified\"})\\nOPTIONAL MATCH\\n(element)-[:HAS_ATTRIBUTE]->(signature:Attribute{name: \"signature\"}),\\n(element)-[:HAS_ELEMENT]->(t:Element{name: \"return\"})-[:HAS_ATTRIBUTE]->(return_type:Attribute{name: \"qualified\"})\\nWHERE element.name =~ \"(method|class|interface|constructor)\"\\nWITH\\n id(element) as id,\\n // class and method\\n CASE element.name\\n WHEN \"method\"\\n THEN return_type.value + \" \" + SPLIT(building_block.value, \".\")[-1] + signature.value\\n WHEN \"constructor\"\\n THEN \"void ()\"\\n END as signature,\\n\\n CASE element.name\\n WHEN \"method\"\\n THEN SUBSTRING(building_block.value, 0, $subLength)SPLIT(building_block.value, \".\")\\n ELSE building_block.value\\n END as type_fqn,\\n\\n return_type.value as r,\\n reduce(s = \"\", x IN collect(doc_text) | s + x.value) as comment\\n\\nRETURN id, type_fqn, signature, comment\\n'"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"\"\"\"\n",
"MATCH \n",
"(element:Element)-[:HAS_ELEMENT]->(Element { name : \"comment\"})-[:HAS_TEXT]->(doc_text:Text),\n",
"(element)-[:HAS_ATTRIBUTE]->(building_block:Attribute{name: \"qualified\"})\n",
"OPTIONAL MATCH\n",
"(element)-[:HAS_ATTRIBUTE]->(signature:Attribute{name: \"signature\"}),\n",
"(element)-[:HAS_ELEMENT]->(t:Element{name: \"return\"})-[:HAS_ATTRIBUTE]->(return_type:Attribute{name: \"qualified\"})\n",
"WHERE element.name =~ \"(method|class|interface|constructor)\"\n",
"WITH\n",
" id(element) as id,\n",
" // class and method\n",
" CASE element.name\n",
" WHEN \"method\"\n",
" THEN return_type.value + \" \" + SPLIT(building_block.value, \".\")[-1] + signature.value\n",
" WHEN \"constructor\"\n",
" THEN \"void ()\"\n",
" END as signature,\n",
"\n",
" CASE element.name\n",
" WHEN \"method\"\n",
" THEN SUBSTRING(building_block.value, 0, $subLength)SPLIT(building_block.value, \".\")\n",
" ELSE building_block.value\n",
" END as type_fqn,\n",
"\n",
" return_type.value as r,\n",
" reduce(s = \"\", x IN collect(doc_text) | s + x.value) as comment\n",
"\n",
"RETURN id, type_fqn, signature, comment\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"[]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"create_comment_nodes_for_types=\"\"\"\n",
"MATCH \n",
"(element)-[:HAS_ELEMENT]->(class_comment:Element { name : \"comment\"})-[:HAS_TEXT]->(doc_text:Text),\n",
"(element:Element)-[:HAS_ATTRIBUTE]->(qualified:Attribute{name: \"qualified\"})\n",
"OPTIONAL MATCH\n",
"(element)-[:HAS_ATTRIBUTE]->(signature:Attribute{name: \"signature\"})\n",
"WHERE element.name =~ \"(method|class|interface|constructor)\"\n",
"WITH\n",
" id(element) as id,\n",
" element.name as type,\n",
" // class and method\n",
" CASE WHEN signature.value IS NULL THEN qualified.value ELSE qualified.value+signature.value END as key,\n",
" reduce(s = \"\", x IN collect(doc_text) | s + x.value) as text\n",
"\n",
"RETURN id, type, key, text\n",
"\"\"\"\n",
"graph.data(create_comment_nodes_for_types)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"[{'COUNT(javadoc)': 0}]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"create_comment_nodes_for_classes=\"\"\"\n",
"MATCH \n",
"(package:Element { name : \"package\"})\n",
"-[:HAS_ELEMENT]->\n",
"(class:Element {name : \"class\"})\n",
"-[:HAS_ATTRIBUTE]->\n",
"(class_fqn:Attribute{name: \"qualified\"}),\n",
"(class_comment:Element { name : \"comment\"})\n",
"-[:HAS_TEXT]->(text:Text),\n",
"class-[:HAS_ELEMENT]->(class_comment)\n",
"\n",
"WITH DISTINCT \n",
" class.name as type_value, \n",
" class_fqn.value as fqn_value, \n",
" reduce(s = \"\", x IN collect(text) | s + x.value) as text_value\n",
" \n",
"CREATE (javadoc:JavaDoc { comment: text_value, type: type_value, fqn: fqn_value })\n",
"\n",
"RETURN COUNT(javadoc)\n",
"\"\"\"\n",
"graph.data(create_comment_nodes_for_classes)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"[{'rels': 0}]"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"create_relationship_query=\"\"\"\n",
"MATCH (type:Type), (javadoc:JavaDoc)\n",
"WHERE type.fqn = javadoc.fqn\n",
"MERGE (javadoc)-[r:COMMENTS]->(type)\n",
"RETURN COUNT(r) as rels\n",
"\"\"\"\n",
"graph.data(create_relationship_query)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"[{'COUNT(javadoc)': 0, 'COUNT(r)': 0}]"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"delete_comments_query=\"\"\"\n",
"MATCH (javadoc:JavaDoc)-[r:COMMENTS]->()\n",
"DELETE r, javadoc\n",
"RETURN COUNT(r), COUNT(javadoc)\n",
"\"\"\"\n",
"graph.data(delete_comments_query)"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python [Root]",
"language": "python",
"name": "Python [Root]"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
}