{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# find_files: Find files based on substring matches" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A function that finds files in a given directory based on substring matches and returns a list of the file names found." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> from mlxtend.file_io import find_files" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This function finds files based on substring search. This is especially useful if we want to find specific files in a directory tree and return their absolute paths for further processing in Python." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### References\n", "\n", "- -" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example 1 - Grouping related files in a dictionary" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Given the following directory and file structure\n", "\n", " dir_1/\n", " file_1.log\n", " file_2.log\n", " file_3.log\n", " dir_2/\n", " file_1.csv\n", " file_2.csv\n", " file_3.csv\n", " dir_3/\n", " file_1.txt\n", " file_2.txt\n", " file_3.txt\n", " \n", "we can use `find_files` to return the paths to all files that contain the substring `_2` as follows: " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['./data_find_filegroups/dir_1/file_2.log',\n", " './data_find_filegroups/dir_2/file_2.csv',\n", " './data_find_filegroups/dir_3/file_2.txt']" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from mlxtend.file_io import find_files\n", "\n", "find_files(substring='_2', path='./data_find_filegroups/', recursive=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## API" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "## find_files\n", "\n", "*find_files(substring, path, recursive=False, check_ext=None, ignore_invisible=True, ignore_substring=None)*\n", "\n", "Find files in a directory based on substring matching.\n", "\n", "**Parameters**\n", "\n", "- `substring` : `str`\n", "\n", " Substring of the file to be matched.\n", "\n", "- `path` : `str`\n", "\n", " Path where to look.\n", "\n", "- `recursive` : `bool`\n", "\n", " If true, searches subdirectories recursively.\n", "\n", "- `check_ext` : `str`\n", "\n", " If string (e.g., '.txt'), only returns files that\n", " match the specified file extension.\n", "\n", "- `ignore_invisible` : `bool`\n", "\n", " If `True`, ignores invisible files\n", " (i.e., files starting with a period).\n", "\n", "- `ignore_substring` : `str`\n", "\n", " Ignores files that contain the specified substring.\n", "\n", "**Returns**\n", "\n", "- `results` : `list`\n", "\n", " List of the matched files.\n", "\n", "**Examples**\n", "\n", "For usage examples, please see\n", " [https://rasbt.github.io/mlxtend/user_guide/file_io/find_files/](https://rasbt.github.io/mlxtend/user_guide/file_io/find_files/)\n", "\n", "\n" ] } ], "source": [ "with open('../../api_modules/mlxtend.file_io/find_files.md', 'r') as f:\n", " print(f.read())" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" }, "toc": { "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }