{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# PyDuct - a Data Engineering pipeline in Python\n", "> A simple framework for building and running simple data engineering pipelines in Python.\n", "\n", "- comments: true\n", "- categories: [python, jupyter, data, data engineering, PyPi]\n", "- image: images/pyduct_small.png\n", "- permalink: /:title:output_ext" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In Data Science or Data Engineering you constantly hear term “data pipeline”. But there are so many meanings to this term and people often are refering to very specific tools or packages depending on their own background/needs. There are pipelines for pretty much everything and in Python alone I can think of [Luigi](https://luigi.readthedocs.io/en/stable/), [Airflow](https://airflow.apache.org/), [scikit-learn pipelines](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html), and [Pandas pipes](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pipe.html) just off the top of my head - [this article](https://towardsdatascience.com/data-pipelines-what-why-and-which-ones-1f674ba49946) does a good job of helping you understand what is out there.\n", "\n", "It can be quite confusing especially if you want a simple and agnostic pipeline that you can customize for your specific needs with no bells and whistles or lock-ins to libraries etc. That is where PyDuct comes in. It is for the data engineer who just wants to get stuff done in an ordered and repeatable way.\n", "\n", "PyDuct is a simple data pipeline that automates a chain of transformations performed on some data.\n", "\n", "PyDuct data pipelines are a great way of introducing automation, reproducibility, structure, and flow to your data engineering projects." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What is it?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The PyDuct transformation pipelines use user defined transformation functions linked together into a TransformationPipe. The key feature of PyDuct is that the datasource passed in can be almost anything that you desire - e.g. a pandas dataframe, a geopandas dataframe, and iris datacube, a numppy array, so long as your transformation steps read and write the same object PyDuct will work for you.\n", "\n", "![](images/pypipe.jpeg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Where to find out more:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "PyPi Package: https://pypi.org/project/pyduct/0.0.1/\n", "\n", "GitHub Repo: https://github.com/RobTheOceanographer/pyduct\n", "\n", "Docs: https://robtheoceanographer.github.io/pyduct/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "PyDuct was made by [Robert Johnson](https://www.robtheoceanographer.com/) and [Alexander Kozlov](https://alexkozlov.com/) and [Mohammadreza Khanarmuei](https://www.linkedin.com/in/mohammadreza-khanarmuei-437a3163)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }