{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tutorial: Visual Image Processing with OSCAR, MinIO, and Jupyter\n", "\n", "This notebook is designed to be used **after** deploying the `imagemagick` OSCAR service and sending one or more images to its input bucket.\n", "\n", "> Hint: run each cell by pressing `Ctrl + Enter`.\n", "\n", "The goal is not only to inspect the generated files, but also to understand how the complete workflow is split into three complementary layers:\n", "\n", "1. **MinIO** stores the input and output files.\n", "2. **OSCAR** reacts to new uploads and runs the ImageMagick processing script automatically.\n", "3. **Jupyter** helps us interpret the resulting images and metrics in an interactive, reproducible way.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Learning Goals\n", "\n", "By the end of this notebook, you should be able to:\n", "\n", "- explain the role of OSCAR in an event-driven data-processing pipeline,\n", "- describe what ImageMagick is computing for each uploaded image,\n", "- interpret simple visual descriptors such as average brightness, contrast, and edge density,\n", "- compare original images with their derived grayscale and edge-enhanced versions,\n", "- export a compact summary of the experiment for later discussion or assessment.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Before You Run the Analysis\n", "\n", "Check these conditions first:\n", "\n", "- The OSCAR service has already been deployed from the provided service definition.\n", "- You have uploaded one or more images to the service input path.\n", "- The images and processed results are available in the local `output/` directory.\n", "\n", "Each processed image should generate three artifacts:\n", "\n", "- `*_gray.png`: a grayscale version of the image,\n", "- `*_edges.png`: a simple edge-enhanced representation,\n", "- `*_metrics.json`: a machine-readable file with visual metrics.\n", "\n", "This notebook assumes that `output/` is the working directory where the bucket contents are available locally, including the original images and the generated artifacts.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import importlib.util\n", "import subprocess\n", "import sys\n", "\n", "required = {\n", " 'matplotlib': 'matplotlib',\n", " 'pandas': 'pandas',\n", " 'PIL': 'pillow',\n", "}\n", "\n", "missing = [package for module, package in required.items() if importlib.util.find_spec(module) is None]\n", "\n", "if missing:\n", " print('Installing missing dependencies:', ', '.join(missing))\n", " subprocess.check_call([sys.executable, '-m', 'pip', 'install', *missing])\n", "else:\n", " print('All notebook dependencies are already available.')\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "import json\n", "\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "from PIL import Image\n", "from IPython.display import Markdown, display\n", "\n", "NOTEBOOK_DIR = Path.cwd()\n", "OUTPUT_DIR = NOTEBOOK_DIR / 'output'\n", "OUTPUT_DIR.mkdir(exist_ok=True)\n", "\n", "print(f'Notebook directory: {NOTEBOOK_DIR.resolve()}')\n", "print(f'Output directory: {OUTPUT_DIR.resolve()}')\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Inspect the Available Images\n", "\n", "The `output/` directory may contain original images, generated grayscale images, generated edge maps, or a combination of all of them. Inspecting the available files first helps you understand what kind of visual structures are present in the local bucket snapshot.\n", "\n", "Visual variety matters because the output metrics are only meaningful if we compare images with different structures. For example:\n", "\n", "- images with many abrupt transitions usually produce stronger edge maps,\n", "- low-contrast images tend to have smaller standard deviation values,\n", "- smooth gradients often look bright but contain relatively few strong edges.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "all_image_files = sorted(\n", " path for path in OUTPUT_DIR.iterdir()\n", " if path.is_file() and path.suffix.lower() in {'.png', '.jpg', '.jpeg'}\n", ")\n", "\n", "original_image_files = [\n", " path for path in all_image_files\n", " if not path.stem.endswith('_gray') and not path.stem.endswith('_edges')\n", "]\n", "gray_image_files = [path for path in all_image_files if path.stem.endswith('_gray')]\n", "edge_image_files = [path for path in all_image_files if path.stem.endswith('_edges')]\n", "\n", "if original_image_files:\n", " image_files = original_image_files\n", " image_set_label = 'Original images found in output/'\n", "else:\n", " image_files = all_image_files\n", " image_set_label = 'Image files found in output/'\n", "\n", "print(f'Total image files found: {len(all_image_files)}')\n", "print(f'Original images: {len(original_image_files)}')\n", "print(f'Grayscale outputs: {len(gray_image_files)}')\n", "print(f'Edge outputs: {len(edge_image_files)}')\n", "print(f'{image_set_label}: {len(image_files)}')\n", "\n", "for path in image_files:\n", " print('-', path.name)\n", "\n", "if image_files:\n", " cols = 4\n", " rows = (len(image_files) + cols - 1) // cols\n", " fig, axes = plt.subplots(rows, cols, figsize=(14, 3.5 * rows))\n", " axes = axes.flatten() if hasattr(axes, 'flatten') else [axes]\n", "\n", " for ax, image_path in zip(axes, image_files):\n", " ax.imshow(Image.open(image_path))\n", " ax.set_title(image_path.name, fontsize=9)\n", " ax.axis('off')\n", "\n", " for ax in axes[len(image_files):]:\n", " ax.axis('off')\n", "\n", " plt.tight_layout()\n", " plt.show()\n", "else:\n", " print('No image files were found in output/.')\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load the Service Outputs\n", "\n", "The ImageMagick script generates a JSON file per input image. Each JSON record contains both file references and numeric features:\n", "\n", "- `width` and `height`: the original input dimensions,\n", "- `brightness_mean`: the average grayscale intensity,\n", "- `contrast_stddev`: the grayscale standard deviation, used here as a simple contrast estimate,\n", "- `edge_density`: the proportion of pixels that survive a threshold after edge detection.\n", "\n", "These descriptors are intentionally simple. They are not meant to replace a full computer-vision pipeline, but they are excellent for teaching how automated services can transform raw files into structured, analyzable outputs.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "metrics_files = sorted(OUTPUT_DIR.glob('*_metrics.json'))\n", "print(f'Metric files found: {len(metrics_files)}')\n", "\n", "records = []\n", "for metrics_path in metrics_files:\n", " with metrics_path.open() as fh:\n", " data = json.load(fh)\n", " data['metrics_file'] = metrics_path.name\n", " records.append(data)\n", "\n", "if not records:\n", " display(Markdown(\n", " '**No processed results were found in `output/`.** \\n'\n", " 'Download the service outputs or mount the MinIO output bucket into this directory, then run the notebook again.'\n", " ))\n", "else:\n", " df = pd.DataFrame(records).sort_values('original_file').reset_index(drop=True)\n", " numeric_columns = ['width', 'height', 'brightness_mean', 'contrast_stddev', 'edge_density']\n", " df[numeric_columns] = df[numeric_columns].apply(pd.to_numeric)\n", " display(df)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## First Interpretation Pass\n", "\n", "Before plotting anything, read the table like a scientist rather than like a spreadsheet user.\n", "\n", "Ask yourself:\n", "\n", "- Which images are likely to contain the strongest boundaries?\n", "- Which ones look visually flat or smooth?\n", "- Do bright images always have many edges?\n", "- Are noisy textures being captured more by contrast, by edge density, or by both?\n", "\n", "The next cells help turn those questions into visual evidence.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if records:\n", " summary = df[['original_file', 'brightness_mean', 'contrast_stddev', 'edge_density']].copy()\n", " summary = summary.sort_values('edge_density', ascending=False).reset_index(drop=True)\n", " display(summary)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Brightness vs. Edge Density\n", "\n", "This scatter plot compares two different concepts:\n", "\n", "- **brightness**: how light the image is on average,\n", "- **edge density**: how many strong structural transitions appear after edge detection.\n", "\n", "If the points are widely spread, that is useful: it means the service is distinguishing different visual patterns rather than producing nearly identical metrics for every image.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if records:\n", " fig, ax = plt.subplots(figsize=(8, 5.5))\n", " ax.scatter(df['brightness_mean'], df['edge_density'], s=70)\n", "\n", " for _, row in df.iterrows():\n", " ax.annotate(\n", " row['original_file'],\n", " (row['brightness_mean'], row['edge_density']),\n", " fontsize=8,\n", " xytext=(4, 4),\n", " textcoords='offset points'\n", " )\n", "\n", " ax.set_xlabel('Mean brightness')\n", " ax.set_ylabel('Edge density')\n", " ax.set_title('Brightness compared with edge density')\n", " ax.grid(True, alpha=0.3)\n", " plt.show()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Compare the Metrics Side by Side\n", "\n", "A bar chart makes relative differences easier to spot. It is especially useful when you want students to identify outliers or defend an interpretation with visible evidence.\n", "\n", "Look for images that dominate one metric but not the others. Those cases often lead to the best classroom discussions because they show that a single number never captures the full visual content of an image.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if records:\n", " chart_df = df.set_index('original_file')[['brightness_mean', 'contrast_stddev', 'edge_density']]\n", " ax = chart_df.plot(kind='bar', figsize=(12, 5))\n", " ax.set_ylabel('Metric value')\n", " ax.set_title('Visual metrics per processed image')\n", " ax.grid(True, axis='y', alpha=0.3)\n", " plt.xticks(rotation=45, ha='right')\n", " plt.tight_layout()\n", " plt.show()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Gallery: Original vs. Grayscale vs. Edges\n", "\n", "Numbers become more meaningful once we compare them against the actual images. The gallery below is deliberately organized in three columns:\n", "\n", "- the original input,\n", "- the grayscale conversion,\n", "- the edge-enhanced output.\n", "\n", "This layout helps connect the service internals with the visual results. In teaching terms, it closes the loop between data, processing, and interpretation.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def resolve_original_image(row, output_dir):\n", " candidate = output_dir / row['original_file']\n", " return candidate if candidate.exists() else None\n", "\n", "\n", "def show_gallery(dataframe, output_dir, max_items=6):\n", " sample = dataframe.head(max_items)\n", " rows = len(sample)\n", " fig, axes = plt.subplots(rows, 3, figsize=(12, 3.8 * rows))\n", "\n", " if rows == 1:\n", " axes = [axes]\n", "\n", " for row_axes, (_, row) in zip(axes, sample.iterrows()):\n", " original_path = resolve_original_image(row, output_dir)\n", " gray_path = output_dir / row['gray_file']\n", " edges_path = output_dir / row['edges_file']\n", "\n", " if original_path is not None:\n", " row_axes[0].imshow(Image.open(original_path))\n", " row_axes[0].set_title(f\"Original: {row['original_file']}\")\n", " else:\n", " row_axes[0].text(0.5, 0.5, 'Original image\\nnot available locally', ha='center', va='center')\n", " row_axes[0].set_title(f\"Original: {row['original_file']}\")\n", "\n", " row_axes[0].axis('off')\n", "\n", " row_axes[1].imshow(Image.open(gray_path), cmap='gray')\n", " row_axes[1].set_title(f\"Grayscale: {row['gray_file']}\")\n", " row_axes[1].axis('off')\n", "\n", " row_axes[2].imshow(Image.open(edges_path), cmap='gray')\n", " row_axes[2].set_title(f\"Edges: {row['edges_file']}\")\n", " row_axes[2].axis('off')\n", "\n", " plt.tight_layout()\n", " plt.show()\n", "\n", "\n", "if records:\n", " show_gallery(df, OUTPUT_DIR, max_items=min(6, len(df)))\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Export the Summary Table\n", "\n", "Exporting the metrics to CSV is useful when the notebook is part of a broader activity. For example, students can reuse the table in a report, compare several runs, or submit their observations together with the raw output files.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if records:\n", " csv_path = NOTEBOOK_DIR / 'metrics_summary.csv'\n", " df.to_csv(csv_path, index=False)\n", " print(f'Summary table exported to: {csv_path.resolve()}')\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reflection Questions\n", "\n", "Use these prompts to close the activity:\n", "\n", "1. Why is MinIO a good fit for this kind of asynchronous workflow?\n", "2. What does OSCAR automate that would be tedious to do by hand?\n", "3. Why is it useful to separate storage, execution, and analysis into different tools?\n", "4. Which metric seems most informative for your image set, and why?\n", "5. What would you improve if this service had to process thousands of images instead of ten?\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python", "version": "3.11" } }, "nbformat": 4, "nbformat_minor": 5 }