{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# DataVec \n", "\n", "The [DataVec](https://deeplearning4j.org/datavec) library from DL4J is easy to add to the BeakerX kernel, including displaying its tables with the BeakerX interactive table widget. DataVec is an ETL Library for Machine Learning, including data pipelines, data munging, and wrangling." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%classpath add mvn\n", "org.datavec datavec-api 0.9.1\n", "org.datavec datavec-local 0.9.1\n", "org.datavec datavec-dataframe 0.9.1\n", "org.deeplearning4j deeplearning4j-core 0.9.1\n", "org.nd4j nd4j-native-platform 0.9.1" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%import org.nd4j.linalg.api.ndarray.INDArray\n", "%import org.datavec.api.split.FileSplit\n", "%import org.deeplearning4j.datasets.datavec.RecordReaderDataSetIterator\n", "%import java.nio.file.Paths\n", "%import org.nd4j.linalg.factory.Nd4j\n", "%import org.datavec.api.transform.TransformProcess\n", "%import org.datavec.api.records.reader.impl.csv.CSVRecordReader" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "import org.datavec.api.transform.schema.Schema\n", "\n", "inputDataSchema = new Schema.Builder()\n", " //We can for convenience define multiple columns of the same type\n", " .addColumnsString(\"DateString\", \"TimeString\")\n", " //We can define different column types for different types of data:\n", " .addColumnCategorical(\"State\", Arrays.asList(\"GA\",\"VA\",\"IL\",\"MO\",\"IN\",\"KY\",\"MS\",\"LA\",\"AL\",\"TN\",\"OH\",\"NC\",\"MD\",\"CA\",\"AZ\",\"FL\",\"IA\",\"MN\",\"KS\",\"TX\",\"OK\",\"AR\",\"NE\",\"WA\",\"WY\",\"CO\",\"ID\",\"SD\",\"PA\",\"MT\",\"NV\",\"NY\",\"DE\",\"NM\",\"ME\",\"ND\",\"SC\",\"WV\",\"MI\",\"WI\",\"NH\",\"CT\",\"MA\"))\n", " .addColumnsInteger(\"State No\", \"Scale\", \"Injuries\", \"Fatalities\")\n", " //Some columns have restrictions on the allowable values, that we consider valid:\n", " .addColumnsDouble(\"Start Lat\", \"Start Lon\", \"Length\", \"Width\")\n", " .build();" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import org.datavec.api.transform.condition.ConditionOp\n", "import org.datavec.api.transform.condition.column.CategoricalColumnCondition\n", "import org.datavec.api.transform.filter.ConditionFilter\n", "\n", "transformProcess = new TransformProcess.Builder(inputDataSchema)\n", " //Let's remove some column we don't need\n", " .removeColumns(\"DateString\", \"TimeString\", \"State No\")\n", " //Now, suppose we only want to analyze tornadoes involving NY, MI, IL, MA. Let's filter out\n", " // everything except for those states.\n", " //Here, we are applying a conditional filter. We remove all of the examples that match the condition\n", " // The condition is \"State\" isn't one of {\"NY\", \"MI\", \"IL\", \"MA\"}\n", " .filter(new ConditionFilter(\n", " new CategoricalColumnCondition(\"State\", ConditionOp.NotInSet, new HashSet<>(Arrays.asList(\"NY\", \"WA\")))))\n", " .build();" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "import org.datavec.local.transforms.TableRecords\n", "import jupyter.Displayer;\n", "import jupyter.Displayers;\n", "\n", "//JVM Repr to display table using our widget instead raw string table\n", "Displayers.register(org.datavec.dataframe.api.Table.class, new Displayer() {\n", " @Override\n", " public Map display(org.datavec.dataframe.api.Table table) {\n", " return new HashMap() {{\n", " put(MIMEContainer.MIME.HIDDEN, \"\");\n", " List> values = new ArrayList<>();\n", " for (int row=0; row rowValues = new ArrayList<>();\n", " for (int column=0; column \n", " if (v==0) {\n", " plot << new Points(x: [k[0]], y: [k[1]], color: Color.orange)\n", " } else {\n", " plot << new Points(x: [k[0]], y: [k[1]], color: Color.red)\n", " }\n", "}\n", "\n", "plot" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def plot = new Plot(title: \"Predicted data Plot\")\n", "plot.setXBound([0.0, 1.0])\n", "plot.setYBound([-0.2, 0.8])\n", "\n", "rawPredictedData.each{k, v -> \n", " if (v==0) {\n", " plot << new Points(x: [k[0]], y: [k[1]], color: Color.orange)\n", " } else {\n", " plot << new Points(x: [k[0]], y: [k[1]], color: Color.red)\n", " }\n", "}\n", "\n", "plot" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Groovy", "language": "groovy", "name": "groovy" }, "language_info": { "codemirror_mode": "groovy", "file_extension": ".groovy", "mimetype": "", "name": "Groovy", "nbconverter_exporter": "", "version": "2.4.3" } }, "nbformat": 4, "nbformat_minor": 2 }