{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction\n", "_This blog is a three-part series. See [part 1 for retrieving the dataset](https://www.feststelltaste.de/finding-tested-code-with-jqassistant/) and part 3 (upcoming) for visualization._\n", "\n", "In big and old legacy systems, tests are often a mess. Especially end-to-end-tests with UI testing frameworks like Selenium quickly become a PITA aka unmaintainable. They are running slow and you quickly get overwhelmed by plenty of tests that do partly the same, too.\n", "\n", "In this data analysis, I want to illustrate a way that can take us out of this misery. We want to spot test cases that are structurally very similar and thus can be seen as duplicate. We'll calculate the similarity between tests based on their invocations of production code. We can achieve this by treating our software data as observations of linear features. This opens up ways for us to leverage existing mathematically techniques like vector distance calculation (as we'll see in this post) as well as machine learning techniques like multidimensional scaling or clustering (in a follow-up post).\n", "\n", "As software data under analysis, we'll use the JUnit tests of a Java application for demonstrating the approach. We want to figure out, if there are any test cases that test production code where other, more dedicated tests are already testing as well. With the result, we could be able to delete some superfluous test cases (and always remember: less code is good code, no code is best :-)).\n", "\n", "**Reality check** \n", "The real use case originates from a software system with a massive amount of Selenium end-to-end-tests that uses the [Page Object pattern](https://martinfowler.com/bliki/PageObject.html). Each page object represents one HTML site of a web application. Technically, a page object exposes methods in the programming language you use that enables the interaction with websites programmatically. In such a scenario, you can infer which tests are calling the same websites and are triggering the same set of UI components (like buttons). This is a good estimator for test cases that test the same use cases in the application. We can use the results of such an analysis to find repeating test scenarios." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Dataset\n", "\n", "I'm using a dataset that I've created in a [previous blog post](https://www.feststelltaste.de/finding-tested-code-with-jqassistant/) with [jQAssistant](http://buschmais.github.io/jqassistant/doc/1.3.0/). It shows which test methods call which code in the application (the \"production code\"). It's a pure static and structural view of our code, but can be very helpful as we'll see shortly. \n", "\n", "_Note: There are also other ways to get these kinds of information e. g. by mining the log file of a test execution (this would even add real runtime information as well). But for the demonstration of the general approach, the pure static and structural information between the test code and our production code is sufficient._\n", "\n", "First, we read in the data with Pandas – my favorite data analysis framework for getting things easily done." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
test_typetest_methodprod_typeprod_methodinvocations
0AddCommentTestvoid blankSiteContainsRightComment()AddCommentat.dropover.comment.boundary.GetCommentRespons...1
1AddCommentTestvoid blankSiteContainsRightCreationTime()AddCommentat.dropover.comment.boundary.GetCommentRespons...1
2AddCommentTestvoid blankSiteContainsRightUser()AddCommentat.dropover.comment.boundary.GetCommentRespons...1
3AddCommentTestvoid failsAtCommentNull()AddCommentat.dropover.comment.boundary.GetCommentRespons...1
4AddCommentTestvoid failsAtCreatorNull()AddCommentat.dropover.comment.boundary.GetCommentRespons...1
\n", "
" ], "text/plain": [ " test_type test_method prod_type \\\n", "0 AddCommentTest void blankSiteContainsRightComment() AddComment \n", "1 AddCommentTest void blankSiteContainsRightCreationTime() AddComment \n", "2 AddCommentTest void blankSiteContainsRightUser() AddComment \n", "3 AddCommentTest void failsAtCommentNull() AddComment \n", "4 AddCommentTest void failsAtCreatorNull() AddComment \n", "\n", " prod_method invocations \n", "0 at.dropover.comment.boundary.GetCommentRespons... 1 \n", "1 at.dropover.comment.boundary.GetCommentRespons... 1 \n", "2 at.dropover.comment.boundary.GetCommentRespons... 1 \n", "3 at.dropover.comment.boundary.GetCommentRespons... 1 \n", "4 at.dropover.comment.boundary.GetCommentRespons... 1 " ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "invocations = pd.read_csv(\"datasets/test_code_invocations.csv\", sep=\";\")\n", "invocations.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What we've got here are \n", "* all names of our test types (`test_type`) and production types (`prod_type`) \n", "* the signatures of the test methods (`test_method`) and production methods (`prod_method`)\n", "* the number of calls from the test methods to the production methods (`invocations`)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Analysis\n", "OK, let's do some actual work! We want \n", "* to calculate the structural similarity of test cases \n", "* to spot possible duplications of tests \n", "\n", "to figure out which test cases are superfluous (and can be deleted).\n", "\n", "What we have are all tests cases (aka test methods) and their calls to the production code base (= the production methods). We can transform this data to a matrix representation that shows which test method triggers which production method by using Pandas' `pivot_table` function on our `invocations` `DataFrame`." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
prod_typeAddCommentAddScheduling
prod_methodat.dropover.comment.boundary.GetCommentResponseModel doSync(at.dropover.comment.boundary.AddCommentRequestModel)at.dropover.scheduling.boundary.AddSchedulingResponseModel doSync(at.dropover.scheduling.boundary.AddSchedulingRequestModel)
test_typetest_method
AddCommentTestvoid failsAtCreatorNull()10
void worksAtMinimalRequest()10
AddSchedulingDateTestvoid addDateToScheduling()00
void addTwoDatesToScheduling()00
\n", "
" ], "text/plain": [ "prod_type AddComment \\\n", "prod_method at.dropover.comment.boundary.GetCommentResponseModel doSync(at.dropover.comment.boundary.AddCommentRequestModel) \n", "test_type test_method \n", "AddCommentTest void failsAtCreatorNull() 1 \n", " void worksAtMinimalRequest() 1 \n", "AddSchedulingDateTest void addDateToScheduling() 0 \n", " void addTwoDatesToScheduling() 0 \n", "\n", "prod_type AddScheduling \n", "prod_method at.dropover.scheduling.boundary.AddSchedulingResponseModel doSync(at.dropover.scheduling.boundary.AddSchedulingRequestModel) \n", "test_type test_method \n", "AddCommentTest void failsAtCreatorNull() 0 \n", " void worksAtMinimalRequest() 0 \n", "AddSchedulingDateTest void addDateToScheduling() 0 \n", " void addTwoDatesToScheduling() 0 " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "invocation_matrix = invocations.pivot_table(\n", " index=['test_type', 'test_method'],\n", " columns=['prod_type', 'prod_method'],\n", " values='invocations', \n", " fill_value=0\n", ")\n", "# show interesting parts of results\n", "invocation_matrix.iloc[4:8,4:6]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What we've got now is the information for each invocation (or non-invocation) of test methods to production methods. In mathematical words, we've now got an __n-dimensional vector for each test method__ where n is the number of tested production methods in our code base. That means we've just transformed our software data to a representation so that we can work with standard Data Science tooling :-D! That means all further problem-solving techniques in this area can be reused by us.\n", "\n", "And this is exactly what we do now in our further analysis: We've reduced our problem to a distance calculation between vectors (we use distance instead of similarity because later used visualization techniques work with distances). For this, we can use the `cosine_distances` function (see [this article for the mathematical background](https://en.wikipedia.org/wiki/Cosine_similarity)) of the machine learning library [scikit-learn](http://scikit-learn.org) to calculate a pair-wise distance matrix between the test methods aka linear features." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0.10557281, 0.2 ],\n", " [ 0.10557281, 0.2 ],\n", " [ 0.80388386, 0.8245884 ],\n", " [ 1. , 1. ]])" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.metrics.pairwise import cosine_distances\n", "\n", "distance_matrix = cosine_distances(invocation_matrix)\n", "# show some interesting parts of results\n", "distance_matrix[81:85,60:62]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From this result, we create a `DataFrame` to get a better visual representation of the data." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
test_typeCommentGatewayTest
test_methodvoid readRoundtripWorksWithFullData()void readRoundtripWorksWithMandatoryData()
test_typetest_method
CommentsResourceTestvoid postCommentActuallyCreatesComment()0.1055730.200000
void postCommentActuallyCreatesCommentJSON()0.1055730.200000
void postTwiceCreatesTwoElements()0.8038840.824588
ConfigurationFileTestvoid keyWorks()1.0000001.000000
\n", "
" ], "text/plain": [ "test_type CommentGatewayTest \\\n", "test_method void readRoundtripWorksWithFullData() \n", "test_type test_method \n", "CommentsResourceTest void postCommentActuallyCreatesComment() 0.105573 \n", " void postCommentActuallyCreatesCommentJSON() 0.105573 \n", " void postTwiceCreatesTwoElements() 0.803884 \n", "ConfigurationFileTest void keyWorks() 1.000000 \n", "\n", "test_type \n", "test_method void readRoundtripWorksWithMandatoryData() \n", "test_type test_method \n", "CommentsResourceTest void postCommentActuallyCreatesComment() 0.200000 \n", " void postCommentActuallyCreatesCommentJSON() 0.200000 \n", " void postTwiceCreatesTwoElements() 0.824588 \n", "ConfigurationFileTest void keyWorks() 1.000000 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "distance_df = pd.DataFrame(distance_matrix, index=invocation_matrix.index, columns=invocation_matrix.index)\n", "# show some interesting parts of results\n", "distance_df.iloc[81:85,60:62]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You find the complete `DataFrame` [as Excel file as well (~0.5 MB)](https://github.com/feststelltaste/software-analytics/raw/master/notebooks/datasets/test_distance_matrix.xlsx). It shows all dissimilarities between test cases based on the static calls to production code and looks something like this:\n", "\n", "![](resources/test_refactoring_excel_sheet.png)\n", " \n", "Can you already spot some clusters? We'll have a detailed look at that in the next blog post ;-)!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Discussion\n", "Let's have a look at what we've achieved by discussing some of the results. We compare the actual source code of the test method `readRoundtripWorksWithFullData()` from the test class `CommentGatewayTest`\n", "\n", "```java\n", " @Test\n", " public void readRoundtripWorksWithFullData() {\n", " createDefaultComment();\n", " assertEquals(1, commentGateway.read(SITE_NAME).size());\n", " checkDefaultComment(commentGateway.read(SITE_NAME).get(0));\n", " }\n", "```\n", "\n", "with the test method `postCommentActuallyCreatesComment()` of the another test class `CommentsResourceTest`\n", "\n", "```java\n", " @Test\n", " public void postCommentActuallyCreatesComment() {\n", " this.client.path(\"sites/sitewith3comments/comments\").accept(...\n", " Assert.assertEquals(4L, (long)this.commentGateway.read(\"sitewith3comments\").size());\n", " Assert.assertEquals(\"comment3\", ((Comment)this.commentGateway.read(\"sitewith3comments\").get(3)).getContent());\n", " }\n", "```\n", "\n", "Albeit both classes represent different test levels (unit vs. integration test), they share some similarities (with ~0.1 dissimilarity aka ~90% similar calls to production methods). We can see exactly which invoked production methods are part of both test cases by filtering out the methods in the original `invocations` `DataFrame`." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
test_typetest_methodprod_typeprod_methodinvocations
112CommentGatewayTestvoid readRoundtripWorksWithFullData()CommentGatewayjava.util.List read(java.lang.String)2
147CommentsResourceTestvoid postCommentActuallyCreatesComment()Commentjava.lang.String getContent()1
148CommentsResourceTestvoid postCommentActuallyCreatesComment()CommentGatewayjava.util.List read(java.lang.String)2
\n", "
" ], "text/plain": [ " test_type test_method \\\n", "112 CommentGatewayTest void readRoundtripWorksWithFullData() \n", "147 CommentsResourceTest void postCommentActuallyCreatesComment() \n", "148 CommentsResourceTest void postCommentActuallyCreatesComment() \n", "\n", " prod_type prod_method invocations \n", "112 CommentGateway java.util.List read(java.lang.String) 2 \n", "147 Comment java.lang.String getContent() 1 \n", "148 CommentGateway java.util.List read(java.lang.String) 2 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "invocations[\n", " (invocations.test_method == \"void readRoundtripWorksWithFullData()\") |\n", " (invocations.test_method == \"void postCommentActuallyCreatesComment()\")]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see that both test methods share calls to the production method `read(...)`, but differ in the call of the method with the name `getContent()` in the class `Comment`, because only the test method `postCommentActuallyCreatesComment()` of `CommentsResourceTest` invokes it." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can repeat this discussion for another method named `postTwiceCreatesTwoElements()` in the test class `CommentsResourceTest`:\n", "\n", "```java\n", " public void postTwiceCreatesTwoElements() {\n", " this.client.path(\"sites/sitewith3comments/comments\").accept(...\n", " this.client.path(\"sites/sitewith3comments/comments\").accept(...\n", " Assert.assertEquals(5L, (long)comments.size());\n", " Assert.assertEquals(\"comment1\", ((Comment)comments.get(0)).getContent());\n", " Assert.assertEquals(\"comment2\", ((Comment)comments.get(1)).getContent());\n", " Assert.assertEquals(\"comment3\", ((Comment)comments.get(2)).getContent());\n", " Assert.assertEquals(\"comment4\", ((Comment)comments.get(3)).getContent());\n", " Assert.assertEquals(\"comment5\", ((Comment)comments.get(4)).getContent());\n", "```\n", "Albeit the test method is a little bit awkward (with all those subsequent `getContent()` calls), we can see a slight slimilarity of ~20%. Here are details on the production method calls as well:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
test_typetest_methodprod_typeprod_methodinvocations
112CommentGatewayTestvoid readRoundtripWorksWithFullData()CommentGatewayjava.util.List read(java.lang.String)2
151CommentsResourceTestvoid postTwiceCreatesTwoElements()Commentjava.lang.String getContent()5
152CommentsResourceTestvoid postTwiceCreatesTwoElements()CommentGatewayjava.util.List read(java.lang.String)1
\n", "
" ], "text/plain": [ " test_type test_method \\\n", "112 CommentGatewayTest void readRoundtripWorksWithFullData() \n", "151 CommentsResourceTest void postTwiceCreatesTwoElements() \n", "152 CommentsResourceTest void postTwiceCreatesTwoElements() \n", "\n", " prod_type prod_method invocations \n", "112 CommentGateway java.util.List read(java.lang.String) 2 \n", "151 Comment java.lang.String getContent() 5 \n", "152 CommentGateway java.util.List read(java.lang.String) 1 " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "invocations[\n", " (invocations.test_method == \"void readRoundtripWorksWithFullData()\") |\n", " (invocations.test_method == \"void postTwiceCreatesTwoElements()\")]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Both test classes invoke the `read(...)` method, but only `postTwiceCreatesTwoElements()` calls `getContent()` – and this for five times. This explains the dissimilarity between both test methods." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In contrast, we can have a look at the method `void keyWorks()` from the test class `ConfigurationFileTest`, which has absolutely nothing to do (= dissimilarity 1.0) with the method `readRoundtripWorksWithFullData()` nor the underlying calls to the production code.\n", "\n", "```java\n", " @Test\n", " public void keyWorks() {\n", " assertEquals(\"InMemory\", config.get(\"gateway.type\"));\n", " }\n", "```\n", "Looking at the corresponding invocation data, we see, that there are no common uses of production methods." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
test_typetest_methodprod_typeprod_methodinvocations
112CommentGatewayTestvoid readRoundtripWorksWithFullData()CommentGatewayjava.util.List read(java.lang.String)2
153ConfigurationFileTestvoid keyWorks()ConfigurationFilejava.lang.String get(java.lang.String)1
\n", "
" ], "text/plain": [ " test_type test_method \\\n", "112 CommentGatewayTest void readRoundtripWorksWithFullData() \n", "153 ConfigurationFileTest void keyWorks() \n", "\n", " prod_type prod_method invocations \n", "112 CommentGateway java.util.List read(java.lang.String) 2 \n", "153 ConfigurationFile java.lang.String get(java.lang.String) 1 " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "invocations[\n", " (invocations.test_method == \"void readRoundtripWorksWithFullData()\") |\n", " (invocations.test_method == \"void keyWorks()\")]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Conclusion\n", "We've calculated the (structural) distances between test cases depending on the invocations to production methods. We've seen that we can simplify a question about our complex software data to a question that can be answered by standard Data Science techniques.\n", "\n", "In the next blog post, we'll have a deeper look into how we can get some insights into the cohesion of all test classes. We'll use our distance matrix to visualize and cluster the data by using some simple machine learning techniques.\n", "\n", "I hope I could illustrate how the (dis-)similarity calculation of test cases works behind the scenes. If there are any questions or shortcomings I made in my analysis: Please let me know!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }