{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction\n", "In big and old legacy systems, tests are often a mess. Especially end-to-end-tests with UI testing frameworks like Selenium quickly become a PITA aka unmaintainable. They are running slow and you are confronted with plenty of tests that do partly the same.\n", "\n", "In this data analysis, I want to illustrate a way that can take us out of this misery. We want to spot test cases that are structural very similar and thus can be seen as duplicated. We'll calculate the similarity between tests based on their invocations of production code. We can achieve this treating our software data as observations of linear features. This opens up ways for us to leverage existing machine learning techniques like multidimensional scaling or clustering.\n", "\n", "As software data under analysis, we'll use the JUnit tests of a Java application for demonstrating the approach. \n", "\n", "_Note: The real use case originates from a software system with a massive amount of Selenium tests that uses the [Page Object pattern](https://martinfowler.com/bliki/PageObject.html). Each page object represents one HTML site of your web application. So, a page object exposes methods in the programming language you use enabling the interaction with a web site programmatically. In such a scenario, you can infer which tests are triggering the same set of UI components (like buttons). This is a good estimator for test cases that test the same use cases in the application. We can use the results of such an analysis to find repeating test scenarios as well as tests that just differ from a minor nuance of an otherwise similar use case (which could probably be tested with other means like integration or pure UI tests with a mocked backend)._" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Dataset\n", "\n", "I'm using a dataset that I've created in a previous blog post. It shows which test methods call which code in the main line (\"production\"). \n", "\n", "_Note: There are also other ways to get this structural information e. g. by mining the log file of a test execution (this would even add real runtime information as well). But for our demo purpose, the pure structural information between the test code and our production code is sufficient._\n", "\n", "First, we read in the data with Pandas." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
test_typetest_methodprod_typeprod_methodinvocations
0AddCommentTestvoid blankSiteContainsRightComment()AddCommentat.dropover.comment.boundary.GetCommentRespons...1
1AddCommentTestvoid blankSiteContainsRightCreationTime()AddCommentat.dropover.comment.boundary.GetCommentRespons...1
2AddCommentTestvoid blankSiteContainsRightUser()AddCommentat.dropover.comment.boundary.GetCommentRespons...1
3AddCommentTestvoid failsAtCommentNull()AddCommentat.dropover.comment.boundary.GetCommentRespons...1
4AddCommentTestvoid failsAtCreatorNull()AddCommentat.dropover.comment.boundary.GetCommentRespons...1
\n", "
" ], "text/plain": [ " test_type test_method prod_type \\\n", "0 AddCommentTest void blankSiteContainsRightComment() AddComment \n", "1 AddCommentTest void blankSiteContainsRightCreationTime() AddComment \n", "2 AddCommentTest void blankSiteContainsRightUser() AddComment \n", "3 AddCommentTest void failsAtCommentNull() AddComment \n", "4 AddCommentTest void failsAtCreatorNull() AddComment \n", "\n", " prod_method invocations \n", "0 at.dropover.comment.boundary.GetCommentRespons... 1 \n", "1 at.dropover.comment.boundary.GetCommentRespons... 1 \n", "2 at.dropover.comment.boundary.GetCommentRespons... 1 \n", "3 at.dropover.comment.boundary.GetCommentRespons... 1 \n", "4 at.dropover.comment.boundary.GetCommentRespons... 1 " ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "invocations = pd.read_csv(\"datasets/test_code_invocations.csv\", sep=\";\")\n", "invocations.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What we've got here are all names of our test types (`test_type`) and production types (`prod_type`) as well as the signatures of the test methods (`test_method`) and production methods (`prod_method`). We also have the amount of calls from the test methods to the production methods (`invocations`)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Analysis\n", "OK, let's do some actual work! We want to calculate the structural similarity of test cases to spot possible duplications of tests.\n", "\n", "What we have are all tests cases (aka test methods) and their calls to the production code base (= the production methods). We can transform this data to get a matrix representation that shows which test method triggers which production method by using Pandas' `pivot_table` function on our `invocations` `DataFrame`." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
prod_typeAddCommentAddScheduling
prod_methodat.dropover.comment.boundary.GetCommentResponseModel doSync(at.dropover.comment.boundary.AddCommentRequestModel)at.dropover.scheduling.boundary.AddSchedulingResponseModel doSync(at.dropover.scheduling.boundary.AddSchedulingRequestModel)
test_typetest_method
AddCommentTestvoid failsAtCreatorNull()10
void worksAtMinimalRequest()10
AddSchedulingDateTestvoid addDateToScheduling()00
void addTwoDatesToScheduling()00
\n", "
" ], "text/plain": [ "prod_type AddComment \\\n", "prod_method at.dropover.comment.boundary.GetCommentResponseModel doSync(at.dropover.comment.boundary.AddCommentRequestModel) \n", "test_type test_method \n", "AddCommentTest void failsAtCreatorNull() 1 \n", " void worksAtMinimalRequest() 1 \n", "AddSchedulingDateTest void addDateToScheduling() 0 \n", " void addTwoDatesToScheduling() 0 \n", "\n", "prod_type AddScheduling \n", "prod_method at.dropover.scheduling.boundary.AddSchedulingResponseModel doSync(at.dropover.scheduling.boundary.AddSchedulingRequestModel) \n", "test_type test_method \n", "AddCommentTest void failsAtCreatorNull() 0 \n", " void worksAtMinimalRequest() 0 \n", "AddSchedulingDateTest void addDateToScheduling() 0 \n", " void addTwoDatesToScheduling() 0 " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\n", "invocation_matrix = invocations.pivot_table(\n", " index=['test_type', 'test_method'],\n", " columns=['prod_type', 'prod_method'],\n", " values='invocations', \n", " fill_value=0\n", ")\n", "# show interesting parts of results\n", "invocation_matrix.iloc[4:8,4:6]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What we've got now is the information for each invocation (or non-invocation) of test methods to production methods. In mathematical words, we've got now a __n-dimensional vector for each test method__ where n is the number of tested production methods in our code base! That means we've just transformed our software data to a representation that we can now work on with standard Data Science tools :-D! That means all further problem solving techniques in this area can be reused by us. \n", "\n", "This is exactly what we do now in our further analysis. We reduced our problem to a distance calculation between vectors (we use distance instead of similarity because later used visualization techniques work with distances). For this, we can use the `cosine_distances` function of the machine learning library [http://scikit-learn.org](scikit-learn) to calculate a pair-wise distance matrix between the test methods." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0.10557281, 0.2 ],\n", " [ 0.10557281, 0.2 ],\n", " [ 0.80388386, 0.8245884 ],\n", " [ 1. , 1. ]])" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.metrics.pairwise import cosine_distances\n", "\n", "distance_matrix = cosine_distances(invocation_matrix)\n", "# show some interesting parts of results\n", "distance_matrix[81:85,60:62]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From this data, we create a `DataFrame` to get a better representation. You can find the complete `DataFrame` here as excel file as well." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
test_typeCommentGatewayTest
test_methodvoid readRoundtripWorksWithFullData()void readRoundtripWorksWithMandatoryData()
test_typetest_method
CommentsResourceTestvoid postCommentActuallyCreatesComment()0.1055730.200000
void postCommentActuallyCreatesCommentJSON()0.1055730.200000
void postTwiceCreatesTwoElements()0.8038840.824588
ConfigurationFileTestvoid keyWorks()1.0000001.000000
\n", "
" ], "text/plain": [ "test_type CommentGatewayTest \\\n", "test_method void readRoundtripWorksWithFullData() \n", "test_type test_method \n", "CommentsResourceTest void postCommentActuallyCreatesComment() 0.105573 \n", " void postCommentActuallyCreatesCommentJSON() 0.105573 \n", " void postTwiceCreatesTwoElements() 0.803884 \n", "ConfigurationFileTest void keyWorks() 1.000000 \n", "\n", "test_type \n", "test_method void readRoundtripWorksWithMandatoryData() \n", "test_type test_method \n", "CommentsResourceTest void postCommentActuallyCreatesComment() 0.200000 \n", " void postCommentActuallyCreatesCommentJSON() 0.200000 \n", " void postTwiceCreatesTwoElements() 0.824588 \n", "ConfigurationFileTest void keyWorks() 1.000000 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "distance_df = pd.DataFrame(distance_matrix, index=invocation_matrix.index, columns=invocation_matrix.index)\n", "# show some interesting parts of results\n", "distance_df.iloc[81:85,60:62]" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
test_typetest_methodprod_typeprod_methodinvocations
112CommentGatewayTestvoid readRoundtripWorksWithFullData()CommentGatewayjava.util.List read(java.lang.String)2
147CommentsResourceTestvoid postCommentActuallyCreatesComment()Commentjava.lang.String getContent()1
148CommentsResourceTestvoid postCommentActuallyCreatesComment()CommentGatewayjava.util.List read(java.lang.String)2
\n", "
" ], "text/plain": [ " test_type test_method \\\n", "112 CommentGatewayTest void readRoundtripWorksWithFullData() \n", "147 CommentsResourceTest void postCommentActuallyCreatesComment() \n", "148 CommentsResourceTest void postCommentActuallyCreatesComment() \n", "\n", " prod_type prod_method invocations \n", "112 CommentGateway java.util.List read(java.lang.String) 2 \n", "147 Comment java.lang.String getContent() 1 \n", "148 CommentGateway java.util.List read(java.lang.String) 2 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "invocations[\n", " (invocations.test_method == \"void readRoundtripWorksWithFullData()\") |\n", " (invocations.test_method == \"void postCommentActuallyCreatesComment()\")]" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
test_typetest_methodprod_typeprod_methodinvocations
112CommentGatewayTestvoid readRoundtripWorksWithFullData()CommentGatewayjava.util.List read(java.lang.String)2
151CommentsResourceTestvoid postTwiceCreatesTwoElements()Commentjava.lang.String getContent()5
152CommentsResourceTestvoid postTwiceCreatesTwoElements()CommentGatewayjava.util.List read(java.lang.String)1
\n", "
" ], "text/plain": [ " test_type test_method \\\n", "112 CommentGatewayTest void readRoundtripWorksWithFullData() \n", "151 CommentsResourceTest void postTwiceCreatesTwoElements() \n", "152 CommentsResourceTest void postTwiceCreatesTwoElements() \n", "\n", " prod_type prod_method invocations \n", "112 CommentGateway java.util.List read(java.lang.String) 2 \n", "151 Comment java.lang.String getContent() 5 \n", "152 CommentGateway java.util.List read(java.lang.String) 1 " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "invocations[\n", " (invocations.test_method == \"void readRoundtripWorksWithFullData()\") |\n", " (invocations.test_method == \"void postTwiceCreatesTwoElements()\")]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Visualization\n", "Our now 422x422 big distance matrix `distance_df` isn't a good way to spot similarities very well. Let's break down the result into two dimensions using multidimensional scaling (`MDS`) from scikit-learn and plot the results with the plotting library `matplotlib`.\n", "\n", "MDS tries to find a representation of our 422-dimensional data set into the two-dimensional space while retaining the distance information between all data points (=test methods). We use the `MDS` technique with our precomputed dissimilarity matrix `distance_df`." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-0.02495802, 0.10768622],\n", " [ 0.34902428, 0.58676902],\n", " [-0.0249776 , 0.10768132],\n", " [-0.26850959, 0.32472212],\n", " [-0.02497707, 0.10768145]])" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.manifold import MDS\n", "\n", "model = MDS(dissimilarity='precomputed', random_state=10)\n", "distance_df_2d = model.fit_transform(distance_df)\n", "distance_df_2d[:5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we plot the now two-dimensional matrix with `matplotlib`. We colorize all data points according to the name of the test types. We can achieve this by assigning each type a number within 0 and 1 (`relative_index`) and draw a color from a predefined color spectrum (`cm.hsv`) for each type. With this, each test class gets its own color. This enables us to quickly reason about test classes that belong together." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%matplotlib inline\n", "from matplotlib import cm\n", "import matplotlib.pyplot as plt\n", "\n", "relative_index = distance_df.index.labels[0].values() / distance_df.index.labels[0].max()\n", "colors = [x for x in cm.hsv(relative_index)]\n", "plt.figure(figsize=(8,8))\n", "x = distance_df_2d[:,0]\n", "y = distance_df_2d[:,1]\n", "plt.scatter(x, y, c=colors)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now have the visual information about which test methods call similar production code! Let's discuss this plot:\n", "* Groups of data points (aka clusters) of the same color are the good ones (like the blue colored ones in the lower middle). They show that there is a high cohesion of test methods with test classes that test the corresponding production code.\n", "* Clusters with mixed colored data points (like in the upper middle) require further analysis. Here, different test classes test the similar production code\n", "\n", "Let's quickly find those spots programmatically by using another machine learning technique: density-based clustering! Here, we use `DBSCAN` to find data points that are close together. We plot this information into the plot above to visualize dense groups of data." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from sklearn.cluster import DBSCAN\n", "\n", "dbscan = DBSCAN(eps=0.08, min_samples=10)\n", "clustering_results = dbscan.fit(distance_df_2d)\n", "plt.figure(figsize=(8,8))\n", "cluster_members = clustering_results.components_\n", "\n", "# plot all data points\n", "plt.scatter(x, y, c='k', alpha=0.2)\n", "\n", "# plot cluster members\n", "plt.scatter(\n", " cluster_members[:,0],\n", " cluster_members[:,1],\n", " c='r', s=100, alpha=0.1)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cluster
test_typetest_method
AddSchedulingTestvoid add2EmptySchedulingWidgetsToSite()0
void addEmptySchedulingWidgetToSite()0
void failsIfPositionIsNull()0
AddTodoTestvoid addedTodoIsPersisted()4
CommentGatewayTestvoid readFailsOnNullSite()1
\n", "
" ], "text/plain": [ " cluster\n", "test_type test_method \n", "AddSchedulingTest void add2EmptySchedulingWidgetsToSite() 0\n", " void addEmptySchedulingWidgetToSite() 0\n", " void failsIfPositionIsNull() 0\n", "AddTodoTest void addedTodoIsPersisted() 4\n", "CommentGatewayTest void readFailsOnNullSite() 1" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tests = pd.DataFrame(index=distance_df.index)\n", "tests['cluster'] = clustering_results.labels_\n", "cohesive_tests = tests[tests.cluster != -1]\n", "cohesive_tests.head()" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nuniquecount
cluster
0710
1316
2119
3131
4717
5112
6632
\n", "
" ], "text/plain": [ " nunique count\n", "cluster \n", "0 7 10\n", "1 3 16\n", "2 1 19\n", "3 1 31\n", "4 7 17\n", "5 1 12\n", "6 6 32" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "test_measures = cohesive_tests.reset_index().groupby(\"cluster\").test_type.agg({\"nunique\", \"count\"})\n", "test_measures" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "cluster\n", "0 {AddSchedulingTest, GetSchedulingsTest, Delete...\n", "1 {CommentResourceTest, CommentGatewayTest, Comm...\n", "2 {CreateSiteTest}\n", "3 {CreateTimeDiffTest}\n", "4 {TodoResourceTest, GetTodoListTest, TodoGatewa...\n", "5 {GetSiteTest}\n", "6 {SchedulingDateResourceTest, SchedulingDatesRe...\n", "Name: test_type, dtype: object" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "test_list = cohesive_tests.reset_index().groupby(\"cluster\").test_type.apply(set)\n", "test_list" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nuniquecounttest_type
cluster
0710{AddSchedulingTest, GetSchedulingsTest, Delete...
1316{CommentResourceTest, CommentGatewayTest, Comm...
2119{CreateSiteTest}
3131{CreateTimeDiffTest}
4717{TodoResourceTest, GetTodoListTest, TodoGatewa...
5112{GetSiteTest}
6632{SchedulingDateResourceTest, SchedulingDatesRe...
\n", "
" ], "text/plain": [ " nunique count test_type\n", "cluster \n", "0 7 10 {AddSchedulingTest, GetSchedulingsTest, Delete...\n", "1 3 16 {CommentResourceTest, CommentGatewayTest, Comm...\n", "2 1 19 {CreateSiteTest}\n", "3 1 31 {CreateTimeDiffTest}\n", "4 7 17 {TodoResourceTest, GetTodoListTest, TodoGatewa...\n", "5 1 12 {GetSiteTest}\n", "6 6 32 {SchedulingDateResourceTest, SchedulingDatesRe..." ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "test_analysis_result = test_measures.join(test_list)\n", "test_analysis_result" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'AddSchedulingTest',\n", " 'DeleteTodoListTest',\n", " 'DeleteTodoTest',\n", " 'DownloadFileTest',\n", " 'GetSchedulingTest',\n", " 'GetSchedulingsTest',\n", " 'ReportAbuseTest'}" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "test_analysis_result.iloc[0].test_type" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Conclusion\n", "What a trip! We've started from a data set that showed us the invocations of production methods by test methods. We also went our way deep through the three mathematical / machine learning techniques `cosine_distances`, `MDS` and `DBSCAN`. Finally, we've found out which different test class classes test the same production code. The result is a helpful starting point to reorganizing your tests.\n", "\n", "In general, we saw how we can transform software specific problems to questions that can be answered by using standard Data Science tooling." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }