{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction\n", "In this blog post, I want to show how you can get a first impression on how you can cut a monolithic application into separated components that make sense from a business' perspective. This method can help you to identify meaningful Bounded Contexts (source code that has a high conceptual cohesion in means of business terminology) for your application that should follow a [Domain-Driven Design](https://en.wikipedia.org/wiki/Domain-driven_design).\n", "\n", "We achieve this with a nice [D3 chord visualization (created by Pasha, licensed under GPL 3.0)](http://bl.ocks.org/NPashaP/ba4c802d5ef68f70c019a9706f77ebf1) combined with [jQAssistant](http://buschmais.github.io/jqassistant/doc/1.3.0/#_overview) / [Neo4j](https://neo4j.com/) graph analysis. jQAssistant can scan your Java application and store a graph of your software in a Neo4j graph database. This graph can then be enriched with [Cypher](https://neo4j.com/docs/developer-manual/current/cypher) to get additional nodes and relationships as well as queried as you need it – for your specific use cases.\n", "\n", "To quickly get started, there is already a nice showcase project on GitHub that we can use to start immediately – the [\"jQAssistanterized\" version of Spring PetClinic](https://github.com/buschmais/spring-petclinic). Just clone this repository and build it with the [Maven](http://maven.apache.org) command `mvn clean install -DskipTests`. I also wrote about how one can [build a higher level view of the source code](https://www.feststelltaste.de/building-higher-level-abstractions-of-source-code/) as well as [analyzing dependencies between Java source code](https://www.feststelltaste.de/analyze-dependencies-between-business-subdomains/).\n", "\n", "It's time to combine both approaches!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Marking components in code\n", "In this section, we want to find out components in our application that would probably be suited for Bounded Contexts. The Spring PetClinic project already comes with some user-defined queries for subdomains (the \"concept\" `business:Subdomain` that's defined in the [self-validating, living architecture documentation](https://raw.githubusercontent.com/buschmais/spring-petclinic/master/jqassistant/business.adoc)) that are derived from naming conventions of some Java types. That means that there is already a relationship between source code and the business' concepts existing – this will fit for our purpose.\n", "\n", "We can query that graph directly via Python and [py2neo](http://py2neo.org/v3/) to get the data in a JSON format from Neo4j. For effectiveness reasons (and a nice inline, tabular visualization as well), we wrap that result in a [Pandas](https://pandas.pydata.org/) DataFrame for further processing." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameTypes
0Visit11
1Person2
2Specialty2
3Clinic7
4Vet11
\n", "
" ], "text/plain": [ " Name Types\n", "0 Visit 11\n", "1 Person 2\n", "2 Specialty 2\n", "3 Clinic 7\n", "4 Vet 11" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import py2neo\n", "import pandas as pd\n", "\n", "graph = py2neo.Graph()\n", "subdomains_query=\"\"\"\n", "MATCH (t:Type)-[:BELONGS_TO]->(s:Subdomain)\n", "RETURN s.name as Name, COUNT(t) as Types\n", "\"\"\"\n", "result = graph.run(subdomains_query).data()\n", "df = pd.DataFrame(result)\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For our analysis, we additionally need to mark all Java types that don't belong to an existing business subdomain. To keep it simple, we just add a new `Subdomain` node with the name \"Framework\" to the graph and connect all the remaining types without a connection to an existing `Subdomain` node." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameTypes
0Framework42
\n", "
" ], "text/plain": [ " Name Types\n", "0 Framework 42" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "framework_query=\"\"\"\n", "CREATE (s:Subdomain { name: \"Framework\" })\n", "WITH s\n", " MATCH (:Project)-[:CONTAINS*]->(t:Type)\n", " WHERE NOT (t)-[:BELONGS_TO]->(:Subdomain)\n", " MERGE (t)-[:BELONGS_TO]->(s)\n", "RETURN s.name as Name, COUNT(t) as Types\n", "\"\"\"\n", "result = graph.run(framework_query).data()\n", "df = pd.DataFrame(result)\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's all what it takes now for marking software elements and their relationships to business subdomains aka Bounded Countexts." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Dependencies between Subdomains\n", "Here comes the actual analysis part: We search all dependencies (with the `:DEPENDS_ON` relationship) between `Type`s and retrieve the `Subdomain` node / information for each of them.\n", "\n", "The Cypher query for Neo4j is straightforward:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
fromtox_number
0VisitFramework1
1ClinicVisit4
2OwnerFramework1
3VisitPet9
4VetSpecialty3
\n", "
" ], "text/plain": [ " from to x_number\n", "0 Visit Framework 1\n", "1 Clinic Visit 4\n", "2 Owner Framework 1\n", "3 Visit Pet 9\n", "4 Vet Specialty 3" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "query=\"\"\"\n", "MATCH\n", " (s1:Subdomain)<-[:BELONGS_TO]-\n", " (type:Type)-[r:DEPENDS_ON*0..1]->\n", " (dependency:Type)-[:BELONGS_TO]->(s2:Subdomain)\n", "RETURN s1.name as from, s2.name as to, COUNT(r) as x_number\n", "\"\"\"\n", "result = graph.run(query).data()\n", "df = pd.DataFrame(result)\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This query returns all `Subdomain`s with the number of relationships between all the types of the same `Subdomain` as well as other `Subdomain`s.\n", "\n", "For example, `from` the \"Visit\" subdomain, there is one relationship (`x_number`, the \"x\" is just there for the right ordering of the DataFrame columns) `to` the \"Framework\" subdomain.\n", "\n", "Because subdomain can be seen as Bounded Contexts, we get a nice overview between existing dependencies between them." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Export data for visualization\n", "The interactive chord visualization needs the result in the JSON format as list of dictionary entries in the format \"from, to, number\". So we export it accordingly." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[['Visit', 'Framework', 1],\n", " ['Clinic', 'Visit', 4],\n", " ['Owner', 'Framework', 1],\n", " ['Visit', 'Pet', 9],\n", " ['Vet', 'Specialty', 3]]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import json\n", "json_data = df.to_dict(orient='split')['data']\n", "with open ( \"vis/chord_data.json\", mode='w') as json_file:\n", " json_file.write(json.dumps(json_data, indent=3))\n", "json_data[:5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Interactive Analysis\n", "After this, we can open the [HTML page where D3 does it's magic](https://feststelltaste.github.io/software-analytics/notebooks/vis/graphical_approach_towards_bounded_contexts/chord.html).\n", "\n", "In a [chord diagram](https://en.wikipedia.org/wiki/Chord_diagram), the data / subdomains are ordered around a circle. The numbers in the parenthesis are the number of inter-relationships between data / types.\n", "\n", "![](https://raw.githubusercontent.com/feststelltaste/software-analytics/master/notebooks/resources/gbc_0.png)\n", "\n", "When hovering over a subdomain's text or the corresponding arc, the chord diagram will show you all dependencies that go from this subdomain to other subdomains (different color than the current subdomain) and the dependencies that refer to types in the same subdomain (same color). This can show you which subdomains already are on a high degree self-contained. this information can guide you for first separations of your application into Bounded Contexts.\n", "\n", "Additionally, the number of dependencies into the other subdomains is updated as well.\n", "\n", "![](https://raw.githubusercontent.com/feststelltaste/software-analytics/master/notebooks/resources/gbc_1.png)\n", "\n", "With this, we can identify subdomains where it could be hard to create a Bounded Context from. For example, the Clinic subdomain has dependencies in almost all other subdomains. Looking into the code, we can see that the Clinic subdomain is kind of orchestration layer or facade for the accessing all other subdomains. This is an important information because we have to treat this component different than others. \n", "\n", "![](https://raw.githubusercontent.com/feststelltaste/software-analytics/master/notebooks/resources/gbc_2.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Summary\n", "Thanks to jQAssistant and Neo4j, the necessary data for this visualization is easy to retrieve. The interactive chord diagram comes quite useful if you want to cut some intertwined components into separate ones e. g. for creating microservices (there, finally I said it!).\n", "\n", "But in huge systems, this could be very cumbersome. We could do that also more on an analytical basis based on the data. But this is a topic for another blog post :-)\n", "\n", "\n", "This blog post is also avaiable as [interactive notebook on GitHub](https://github.com/feststelltaste/software-analytics/blob/master/notebooks/A%20graph(ical)%20approach%20towards%20Bounded%20Contexts.ipynb)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1" } }, "nbformat": 4, "nbformat_minor": 2 }