{ "cells": [ { "cell_type": "markdown", "id": "a4cdce42", "metadata": {}, "source": [ "# NAME\n", "\n", "InferenceUsingTFHubCenterNetObjDetect - Using TensorFlow to do object detection using a pre-trained model" ] }, { "cell_type": "markdown", "id": "d74d6520", "metadata": {}, "source": [ "# SYNOPSIS\n", "\n", "The following tutorial is based on the [TensorFlow Hub Object Detection Colab notebook](https://www.tensorflow.org/hub/tutorials/tf2_object_detection). It uses a pre-trained model based on the *CenterNet* architecture trained on the *COCO 2017* dataset. Running the code requires an Internet connection to download the model (from Google servers) and testing data (from GitHub servers).\n", "\n", "Some of this code is identical to that of `InferenceUsingTFHubMobileNetV2Model` notebook. Please look there for an explanation for that code. As stated there, this will later be wrapped up into a high-level library to hide the details behind an API.\n", "\n", "# COLOPHON\n", "\n", "The following document is either a POD file which can additionally be run as a Perl script or a Jupyter Notebook which can be run in [IPerl](https://p3rl.org/Devel::IPerl) (viewable online at [nbviewer](https://nbviewer.org/github/EntropyOrg/perl-AI-TensorFlow-Libtensorflow/blob/master/notebook/InferenceUsingTFHubCenterNetObjDetect.ipynb)). If you are reading this as POD, there should be a generated list of Perl dependencies in the [CPANFILE](#CPANFILE) section. Furthermore,\n", "\n", " * `PDL::Graphics::Gnuplot` requires `gnuplot`.\n", "\n", "If you are running the code, you may optionally install the [`tensorflow` Python package](https://www.tensorflow.org/install/pip) in order to access the `saved_model_cli` command, but this is only used for informational purposes." ] }, { "cell_type": "markdown", "id": "bb098c5b", "metadata": {}, "source": [ "# TUTORIAL\n", "\n", "## Load the library\n", "\n", "First, we need to load the `AI::TensorFlow::Libtensorflow` library and more helpers. We then create an `AI::TensorFlow::Libtensorflow::Status` object and helper function to make sure that the calls to the `libtensorflow` C library are working properly." ] }, { "cell_type": "code", "execution_count": null, "id": "81eaec3a", "metadata": {}, "outputs": [], "source": [ "use strict;\n", "use warnings;\n", "use utf8;\n", "use constant IN_IPERL => !! $ENV{PERL_IPERL_RUNNING};\n", "no if IN_IPERL, warnings => 'redefine'; # fewer messages when re-running cells\n", "\n", "use feature qw(say state postderef);\n", "use Syntax::Construct qw(each-array);\n", "\n", "use lib::projectroot qw(lib);\n", "\n", "BEGIN {\n", " if( IN_IPERL ) {\n", " $ENV{TF_CPP_MIN_LOG_LEVEL} = 3;\n", " }\n", " require AI::TensorFlow::Libtensorflow;\n", "}\n", "\n", "use URI ();\n", "use HTTP::Tiny ();\n", "use Path::Tiny qw(path);\n", "\n", "use File::Which ();\n", "\n", "use List::Util 1.56 qw(mesh);\n", "\n", "use Data::Printer ( output => 'stderr', return_value => 'void', filters => ['PDL'] );\n", "use Data::Printer::Filter::PDL ();\n", "use Text::Table::Tiny qw(generate_table);\n", "\n", "use Imager;\n", "\n", "my $s = AI::TensorFlow::Libtensorflow::Status->New;\n", "sub AssertOK {\n", " die \"Status $_[0]: \" . $_[0]->Message\n", " unless $_[0]->GetCode == AI::TensorFlow::Libtensorflow::Status::OK;\n", " return;\n", "}\n", "AssertOK($s);" ] }, { "cell_type": "markdown", "id": "b99b9097", "metadata": {}, "source": [ "And create helpers for converting between `PDL` ndarrays and `TFTensor` ndarrays." ] }, { "cell_type": "code", "execution_count": null, "id": "3d52ab35", "metadata": {}, "outputs": [], "source": [ "use PDL;\n", "use AI::TensorFlow::Libtensorflow::DataType qw(FLOAT UINT8);\n", "\n", "use FFI::Platypus::Memory qw(memcpy);\n", "use FFI::Platypus::Buffer qw(scalar_to_pointer);\n", "\n", "sub FloatPDLTOTFTensor {\n", " my ($p) = @_;\n", " return AI::TensorFlow::Libtensorflow::Tensor->New(\n", " FLOAT, [ reverse $p->dims ], $p->get_dataref, sub { undef $p }\n", " );\n", "}\n", "\n", "sub FloatTFTensorToPDL {\n", " my ($t) = @_;\n", "\n", " my $pdl = zeros(float,reverse( map $t->Dim($_), 0..$t->NumDims-1 ) );\n", "\n", " memcpy scalar_to_pointer( ${$pdl->get_dataref} ),\n", " scalar_to_pointer( ${$t->Data} ),\n", " $t->ByteSize;\n", " $pdl->upd_data;\n", "\n", " $pdl;\n", "}\n", "\n", "sub Uint8PDLTOTFTensor {\n", " my ($p) = @_;\n", " return AI::TensorFlow::Libtensorflow::Tensor->New(\n", " UINT8, [ reverse $p->dims ], $p->get_dataref, sub { undef $p }\n", " );\n", "}\n", "\n", "sub Uint8TFTensorToPDL {\n", " my ($t) = @_;\n", "\n", " my $pdl = zeros(byte,reverse( map $t->Dim($_), 0..$t->NumDims-1 ) );\n", "\n", " memcpy scalar_to_pointer( ${$pdl->get_dataref} ),\n", " scalar_to_pointer( ${$t->Data} ),\n", " $t->ByteSize;\n", " $pdl->upd_data;\n", "\n", " $pdl;\n", "}" ] }, { "cell_type": "markdown", "id": "e6eb3d2d", "metadata": {}, "source": [ "## Fetch the model and labels\n", "\n", "We are going to use an [object detection model](https://tfhub.dev/tensorflow/centernet/hourglass_512x512/1) from TensorFlow Hub based on the CenterNet architecture. We download both the model and COCO 2017 labels." ] }, { "cell_type": "code", "execution_count": null, "id": "4f6194a1", "metadata": {}, "outputs": [], "source": [ "# image_size => [width, height] (but usually square images)\n", "my %model_name_to_params = (\n", " centernet_hourglass_512x512 => {\n", " handle => 'https://tfhub.dev/tensorflow/centernet/hourglass_512x512/1',\n", " image_size => [ 512, 512 ],\n", " },\n", ");\n", "\n", "my $model_name = 'centernet_hourglass_512x512';\n", "\n", "say \"Selected model: $model_name : $model_name_to_params{$model_name}{handle}\";" ] }, { "cell_type": "markdown", "id": "cf912979", "metadata": {}, "source": [ "We download the model to the current directory and then extract the model to a folder with the name given in `$model_base`." ] }, { "cell_type": "code", "execution_count": null, "id": "94cae10a", "metadata": { "scrolled": true }, "outputs": [], "source": [ "my $model_uri = URI->new( $model_name_to_params{$model_name}{handle} );\n", "$model_uri->query_form( 'tf-hub-format' => 'compressed' );\n", "my $model_base = substr( $model_uri->path, 1 ) =~ s,/,_,gr;\n", "my $model_archive_path = \"${model_base}.tar.gz\";\n", "\n", "my $http = HTTP::Tiny->new;\n", "\n", "for my $download ( [ $model_uri => $model_archive_path ],) {\n", " my ($uri, $path) = @$download;\n", " say \"Downloading $uri to $path\";\n", " next if -e $path;\n", " $http->mirror( $uri, $path );\n", "}\n", "\n", "use Archive::Extract;\n", "my $ae = Archive::Extract->new( archive => $model_archive_path );\n", "die \"Could not extract archive\" unless $ae->extract( to => $model_base );\n", "\n", "my $saved_model = path($model_base)->child('saved_model.pb');\n", "say \"Saved model is in $saved_model\" if -f $saved_model;" ] }, { "cell_type": "markdown", "id": "a06551c6", "metadata": {}, "source": [ "We need to download the COCO 2017 classification labels and parse out the mapping from the numeric index to the textual descriptions." ] }, { "cell_type": "code", "execution_count": null, "id": "b6a1ed79", "metadata": {}, "outputs": [], "source": [ "# Get the labels\n", "my $response = $http->get('https://raw.githubusercontent.com/tensorflow/models/a4944a57ad2811e1f6a7a87589a9fc8a776e8d3c/object_detection/data/mscoco_label_map.pbtxt');\n", "\n", "my %labels_map = $response->{content} =~ m<\n", "(?:item \\s+ \\{ \\s+\n", " \\Qname:\\E \\s+ \"[^\"]+\" \\s+\n", " \\Qid:\\E \\s+ (\\d+) \\s+\n", " \\Qdisplay_name:\\E \\s+ \"([^\"]+)\" \\s+\n", "})+\n", ">sgx;\n", "\n", "my $label_count = List::Util::max keys %labels_map;\n", "\n", "say \"We have a label count of $label_count. These labels include: \",\n", " join \", \", List::Util::head( 5, @labels_map{ sort keys %labels_map } );" ] }, { "cell_type": "markdown", "id": "08f41a54", "metadata": {}, "source": [ "## Load the model and session\n", "\n", "We define the tag set `[ 'serve' ]` which we will use to load the model." ] }, { "cell_type": "code", "execution_count": null, "id": "5e1e3e03", "metadata": {}, "outputs": [], "source": [ "my @tags = ( 'serve' );" ] }, { "cell_type": "markdown", "id": "57eac4d7", "metadata": {}, "source": [ "We can examine what computations are contained in the graph in terms of the names of the inputs and outputs of an operation found in the graph by running `saved_model_cli`." ] }, { "cell_type": "code", "execution_count": null, "id": "e39a4af1", "metadata": { "scrolled": false }, "outputs": [], "source": [ "if( File::Which::which('saved_model_cli')) {\n", " local $ENV{TF_CPP_MIN_LOG_LEVEL} = 3; # quiet the TensorFlow logger for the following command\n", " system(qw(saved_model_cli show),\n", " qw(--dir) => $model_base,\n", " qw(--tag_set) => join(',', @tags),\n", " qw(--signature_def) => 'serving_default'\n", " ) == 0 or die \"Could not run saved_model_cli\";\n", "} else {\n", " say \"Install the tensorflow Python package to get the `saved_model_cli` command.\";\n", "}" ] }, { "cell_type": "markdown", "id": "67ce3f37", "metadata": {}, "source": [ "The above `saved_model_cli` output shows that the model input is at `serving_default_input_tensor:0` which means the operation named `serving_default_input_tensor` at index `0` and there are multiple outputs with different shapes.\n", "\n", "Per the [model description](https://tfhub.dev/tensorflow/centernet/hourglass_512x512/1) on TensorFlow Hub:\n", "\n", "> **Inputs**\n", ">\n", "> A three-channel image of variable size - the model does NOT support batching. The input tensor is a `tf.uint8` tensor with shape [1, height, width, 3] with values in [0, 255].\n", ">\n", "> **Outputs**\n", ">\n", "> The output dictionary contains:\n", ">\n", "> - `num_detections`: a `tf.int` tensor with only one value, the number of detections [N].\n", "> - `detection_boxes`: a `tf.float32` tensor of shape [N, 4] containing bounding box coordinates in the following order: [ymin, xmin, ymax, xmax].\n", "> - `detection_classes`: a `tf.int` tensor of shape [N] containing detection class index from the label file.\n", "> - `detection_scores`: a `tf.float32` tensor of shape [N] containing detection scores.\n", "\n", "Note that the above documentation has two errors: both `num_detections` and `detection_classes` are not of type `tf.int`, but are actually `tf.float32`." ] }, { "cell_type": "markdown", "id": "def187ec", "metadata": {}, "source": [ "Now we can load the model from that folder with the tag set `[ 'serve' ]` by using the `LoadFromSavedModel` constructor to create a `::Graph` and a `::Session` for that graph." ] }, { "cell_type": "code", "execution_count": null, "id": "aeae64c9", "metadata": {}, "outputs": [], "source": [ "my $opt = AI::TensorFlow::Libtensorflow::SessionOptions->New;\n", "\n", "my $graph = AI::TensorFlow::Libtensorflow::Graph->New;\n", "my $session = AI::TensorFlow::Libtensorflow::Session->LoadFromSavedModel(\n", " $opt, undef, $model_base, \\@tags, $graph, undef, $s\n", ");\n", "AssertOK($s);" ] }, { "cell_type": "markdown", "id": "4a3b4151", "metadata": {}, "source": [ "So let's use the names from the `saved_model_cli` output to create our `::Output` `ArrayRef`s." ] }, { "cell_type": "code", "execution_count": null, "id": "87e63b30", "metadata": {}, "outputs": [], "source": [ "my %ops = (\n", " in => {\n", " op => $graph->OperationByName('serving_default_input_tensor'),\n", " dict => {\n", " input_tensor => 0,\n", " }\n", " },\n", " out => {\n", " op => $graph->OperationByName('StatefulPartitionedCall'),\n", " dict => {\n", " detection_boxes => 0,\n", " detection_classes => 1,\n", " detection_scores => 2,\n", " num_detections => 3,\n", " }\n", " },\n", ");\n", "\n", "my %outputs;\n", "\n", "%outputs = map {\n", " my $put_type = $_;\n", " my $op = $ops{$put_type}{op};\n", " my $port_dict = $ops{$put_type}{dict};\n", "\n", " $put_type => +{\n", " map {\n", " my $dict_key = $_;\n", " my $index = $port_dict->{$_};\n", " $dict_key => AI::TensorFlow::Libtensorflow::Output->New( {\n", " oper => $op,\n", " index => $index,\n", " });\n", " } keys %$port_dict\n", " }\n", "} keys %ops;\n", "\n", "p %outputs;" ] }, { "cell_type": "markdown", "id": "e28fbf5b", "metadata": {}, "source": [ "Now we can get the following testing image from GitHub." ] }, { "cell_type": "code", "execution_count": null, "id": "fa18a474", "metadata": {}, "outputs": [], "source": [ "use HTML::Tiny;\n", "\n", "my %images_for_test_to_uri = (\n", " \"beach_scene\" => 'https://github.com/tensorflow/models/blob/master/research/object_detection/test_images/image2.jpg?raw=true',\n", ");\n", "\n", "my @image_names = sort keys %images_for_test_to_uri;\n", "my $h = HTML::Tiny->new;\n", "\n", "my $image_name = 'beach_scene';\n", "if( IN_IPERL ) {\n", " IPerl->html(\n", " $h->a( { href => $images_for_test_to_uri{$image_name} },\n", " $h->img({\n", " src => $images_for_test_to_uri{$image_name},\n", " alt => $image_name,\n", " width => '100%',\n", " })\n", " ),\n", " );\n", "}" ] }, { "cell_type": "markdown", "id": "853815b0", "metadata": {}, "source": [ "## Download the test image and transform it into suitable input data\n", "\n", "We now fetch the image and prepare it to be in the needed format by using `Imager`. Note that this model does not need the input image to be of a certain size so no resizing or padding is required.\n", "\n", "Then we turn the `Imager` data into a `PDL` ndarray. Since we just need the 3 channels of the image as they are, they can be stored directly in a `PDL` ndarray of type `byte`.\n", "\n", "The reason why we need to concatenate the `PDL` ndarrays here despite the model only taking a single image at a time is to get an ndarray with four (4) dimensions with the last `PDL` dimension of size one (1)." ] }, { "cell_type": "code", "execution_count": null, "id": "bdcf4f61", "metadata": {}, "outputs": [], "source": [ "sub load_image_to_pdl {\n", " my ($uri, $image_size) = @_;\n", "\n", " my $http = HTTP::Tiny->new;\n", " my $response = $http->get( $uri );\n", " die \"Could not fetch image from $uri\" unless $response->{success};\n", " say \"Downloaded $uri\";\n", "\n", " my $img = Imager->new;\n", " $img->read( data => $response->{content} );\n", "\n", " # Create PDL ndarray from Imager data in-memory.\n", " my $data;\n", " $img->write( data => \\$data, type => 'raw' )\n", " or die \"could not write \". $img->errstr;\n", "\n", " die \"Image does not have 3 channels, it has @{[ $img->getchannels ]} channels\"\n", " if $img->getchannels != 3;\n", "\n", " # $data is packed as PDL->dims == [w,h] with RGB pixels\n", " my $pdl_raw = zeros(byte, $img->getchannels, $img->getwidth, $img->getheight);\n", " ${ $pdl_raw->get_dataref } = $data;\n", " $pdl_raw->upd_data;\n", "\n", " $pdl_raw;\n", "}\n", "\n", "my @pdl_images = map {\n", " load_image_to_pdl(\n", " $images_for_test_to_uri{$_},\n", " $model_name_to_params{$model_name}{image_size}\n", " );\n", "} ($image_names[0]);\n", "\n", "my $pdl_image_batched = cat(@pdl_images);\n", "my $t = Uint8PDLTOTFTensor($pdl_image_batched);\n", "\n", "die \"There should be 4 dimensions\" unless $pdl_image_batched->ndims == 4;\n", "\n", "die \"With the final dimension of length 1\" unless $pdl_image_batched->dim(3) == 1;\n", "\n", "p $pdl_image_batched;\n", "p $t;" ] }, { "cell_type": "markdown", "id": "02a87430", "metadata": {}, "source": [ "## Run the model for inference\n", "\n", "We can use the `Run` method to run the session and get the multiple output `TFTensor`s. The following uses the names in `$outputs` mapping to help process the multiple outputs more easily." ] }, { "cell_type": "code", "execution_count": null, "id": "e7e06051", "metadata": {}, "outputs": [], "source": [ "my $RunSession = sub {\n", " my ($session, $t) = @_;\n", " my @outputs_t;\n", "\n", " my @keys = keys %{ $outputs{out} };\n", " my @values = $outputs{out}->@{ @keys };\n", " $session->Run(\n", " undef,\n", " [ values %{$outputs{in} } ], [$t],\n", " \\@values, \\@outputs_t,\n", " undef,\n", " undef,\n", " $s\n", " );\n", " AssertOK($s);\n", "\n", " return { mesh \\@keys, \\@outputs_t };\n", "};\n", "\n", "undef;" ] }, { "cell_type": "code", "execution_count": null, "id": "ffd5bbb6", "metadata": {}, "outputs": [], "source": [ "my $tftensor_output_by_name = $RunSession->($session, $t);\n", "\n", "my %pdl_output_by_name = map {\n", " $_ => FloatTFTensorToPDL( $tftensor_output_by_name->{$_} )\n", "} keys $tftensor_output_by_name->%*;\n", "\n", "undef;" ] }, { "cell_type": "markdown", "id": "caec73f1", "metadata": {}, "source": [ "## Results summary\n", "\n", "Then we use a score threshold to select the objects of interest." ] }, { "cell_type": "code", "execution_count": null, "id": "d8f638e0", "metadata": { "scrolled": true }, "outputs": [], "source": [ "my $min_score_thresh = 0.30;\n", "\n", "my $which_detect = which( $pdl_output_by_name{detection_scores} > $min_score_thresh );\n", "\n", "my %subset;\n", "\n", "$subset{detection_boxes} = $pdl_output_by_name{detection_boxes}->dice('X', $which_detect);\n", "$subset{detection_classes} = $pdl_output_by_name{detection_classes}->dice($which_detect);\n", "$subset{detection_scores} = $pdl_output_by_name{detection_scores}->dice($which_detect);\n", "\n", "$subset{detection_class_labels}->@* = map { $labels_map{$_} } $subset{detection_classes}->list;\n", "\n", "p %subset;" ] }, { "cell_type": "markdown", "id": "88dfead5", "metadata": {}, "source": [ "The following uses the bounding boxes and class label information to draw boxes and labels on top of the image using Gnuplot." ] }, { "cell_type": "code", "execution_count": null, "id": "c052958d", "metadata": {}, "outputs": [], "source": [ "use PDL::Graphics::Gnuplot;\n", "\n", "my $plot_output_path = 'objects-detected.png';\n", "my $gp = gpwin('pngcairo', font => \",12\", output => $plot_output_path, aa => 2, size => [10] );\n", "\n", "my @qual_cmap = ('#a6cee3','#1f78b4','#b2df8a','#33a02c','#fb9a99','#e31a1c','#fdbf6f','#ff7f00','#cab2d6');\n", "\n", "$gp->options(\n", " map {\n", " my $idx = $_;\n", " my $lc_rgb = $qual_cmap[ $subset{detection_classes}->slice(\"($idx)\")->squeeze % @qual_cmap ];\n", "\n", " my $box_corners_yx_norm = $subset{detection_boxes}->slice([],$idx,[0,0,0]);\n", " $box_corners_yx_norm->reshape(2,2);\n", "\n", " my $box_corners_yx_img = $box_corners_yx_norm * $pdl_images[0]->shape->slice('-1:-2');\n", "\n", " my $from_xy = join \",\", $box_corners_yx_img->slice('-1:0,(0)')->list;\n", " my $to_xy = join \",\", $box_corners_yx_img->slice('-1:0,(1)')->list;\n", " my $label_xy = join \",\", $box_corners_yx_img->at(1,1), $box_corners_yx_img->at(0,1);\n", "\n", " (\n", " [ object => [ \"rect\" =>\n", " from => $from_xy, to => $to_xy,\n", " qq{front fs empty border lc rgb \"$lc_rgb\" lw 5} ], ],\n", " [ label => [\n", " sprintf(\"%s: %.1f\",\n", " $subset{detection_class_labels}[$idx],\n", " 100*$subset{detection_scores}->at($idx,0) ) =>\n", " at => $label_xy, 'left',\n", " offset => 'character 0,-0.25',\n", " qq{font \",12\" boxed front tc rgb \"#ffffff\"} ], ],\n", " )\n", " } 0..$subset{detection_boxes}->dim(1)-1\n", ");\n", "\n", "$gp->plot(\n", " topcmds => q{set style textbox opaque fc \"#505050f0\" noborder},\n", " square => 1,\n", " yrange => [$pdl_images[0]->dim(2),0],\n", " with => 'image', $pdl_images[0],\n", ");\n", "\n", "$gp->close;\n", "\n", "IPerl->png( bytestream => path($plot_output_path)->slurp_raw ) if IN_IPERL;" ] }, { "cell_type": "markdown", "id": "f9cb0dd9", "metadata": {}, "source": [ "# RESOURCE USAGE" ] }, { "cell_type": "code", "execution_count": null, "id": "69c9cf1d", "metadata": {}, "outputs": [], "source": [ "use Filesys::DiskUsage qw/du/;\n", "\n", "my $total = du( { 'human-readable' => 1, dereference => 1 },\n", " $model_archive_path, $model_base );\n", "\n", "say \"Disk space usage: $total\"; undef;" ] } ], "metadata": { "kernelspec": { "display_name": "IPerl 0.012", "language": "perl", "name": "iperl" }, "language_info": { "file_extension": ".pl", "mimetype": "text/x-perl", "name": "perl", "version": "5.30.0" } }, "nbformat": 4, "nbformat_minor": 5 }