{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "![April 31st is a perfectly fine date in Grumpy Cat's data entry system](https://github.com/snorehorse/sas-demos/raw/master/program-flow-and-error-checking/images/april_31st.jpg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Simplified data quality checks and program flow in SAS programs\n", "\n", "*ETL design patterns, naming conventions and other tips for a saner life *\n", "\n", "Based on the Spring 2017 GVSUG presentation by [Louis Herczeg](https://www.linkedin.com/in/louisherczeg/ \"LinkedIn\") & [Matthew Giglia](https://www.linkedin.com/in/matthewgiglia/ \"LinkedIn\"). \n", "\n", "## Basic concept walk-through" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Real world data\n", "Common problems include:\n", "* typos\n", "* duplicate records\n", "* missing values\n", "* business logic violations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### This demo\n", "\n", "1. Create a combination of \"good\" and \"bad\" data. \n", "2. Create quality checks.\n", "3. Automatically summarize the results.\n", "4. Route program flow accordingly (do scary, permanent things if there's only good data, otherwise stop and complain)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 01 - Demo data\n", "\n", "Data sets are prefixed in the project so that they'll sort in order of creation when viewed in the library or with `proc contents`. \"s01\" is step 01, and will appear as the top entry." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Data Extract

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ObsMakeModelTypeOriginDriveTrainMSRPCylinders
426VolvoS80 T6 4drSedanEuropeFront$45,2106.0
427VolvoV40WagonEuropeFront$26,1354.0
428VolvoXC70WagonEuropeAll$35,1455.0
429FooRAV5SUVAsiaFront$25,000-4.0
430FooRAV8aSUVAsiaFront$31,00080.0
431FooRAV8bSUVAsiaFront$31,0003.5
432FooBarSUVAsiaFront$586.0
433FizzBuzzSUVAsiaFront$-25,0006.0
434FizzSnorehorseSUVAsiaFront.6.0
435Saab9-5 AeroWagonEuropeFront$50,0004.0
436SnorehorseTownlanderBadValueUSAFront$40,0004.0
437SnowplowHighlanderWagonUSAApple$45,0004.0
438LlamaLoafWagonUSAFront$10000004.0
\n", "
\n", "
\n", "

Simulated bad data starts on row 429.

\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "options nosource nonotes;\n", "\n", "/* Get some good data */\n", "data s01_test_data;\n", " set sashelp.cars(keep=make model Type drivetrain Origin MSRP Cylinders);\n", " model = left(model);\n", "run;\n", "\n", "/* Insert bad data */\n", "proc sql;\n", " insert into s01_test_data\n", "\n", " /* Bad cylinder counts */\n", " values (\"Foo\", \"RAV5\", \"SUV\", \"Asia\", \"Front\", 25000, -4) \n", " values (\"Foo\", \"RAV8a\", \"SUV\", \"Asia\", \"Front\", 31000, 80)\n", " values (\"Foo\", \"RAV8b\", \"SUV\", \"Asia\", \"Front\", 31000, 3.5)\n", "\n", " /* Bad MSRP */\n", " values (\"Foo\", \"Bar\", \"SUV\", \"Asia\", \"Front\", 58, 6)\n", " values (\"Fizz\", \"Buzz\", \"SUV\", \"Asia\", \"Front\", -25000, 6)\n", " values (\"Fizz\", \"Snorehorse\", \"SUV\", \"Asia\", \"Front\", ., 6)\n", "\n", " /* Duplicate record based on make, model, drivetrain */\n", " values (\"Saab\", \"9-5 Aero\", \"Wagon\", \"Europe\", \"Front\", 50000, 4) \n", "\n", " /* Bad Type */\n", " values (\"Snorehorse\", \"Townlander\", \"BadValue\", \"USA\", \"Front\", 40000, 4) \n", "\n", " /* Bad Drive Train */\n", " values (\"Snowplow\", \"Highlander\", \"Wagon\", \"USA\", \"Apple\", 45000, 4) \n", "\n", " /* MSRP outlier */\n", " values (\"Llama\", \"Loaf\", \"Wagon\", \"USA\", \"Front\", 1000000, 4) \n", " ;\n", "quit;\n", "\n", "title \"Data Extract\"; footnote 'Simulated bad data starts on row 429.';\n", "proc print data=s01_test_data(firstobs=426); run; title; footnote;" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Step 02 - Write an arbitrary number of quality checks\n", "\n", "*Note: In Part 2, these will be wrapped in macros, for reuse throughout the program. *\n", "\n", "- Each check gets its own data set\n", "\n", "- Notice the naming convention, \"\\_CHK\\_\"! This allows the checks to be automatically summarized.\n", "\n", "- The data set labels will provide the description of the check, on the summary table." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Check cylinder count" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Obscheck_descriptionMakeModelTypeOriginDriveTrainMSRPCylinders
101-Cylinders must be between 3 and 12, and whole numbersMazdaRX-8 4dr automaticSportsAsiaRear$25,700.
201-Cylinders must be between 3 and 12, and whole numbersMazdaRX-8 4dr manualSportsAsiaRear$27,200.
301-Cylinders must be between 3 and 12, and whole numbersFooRAV5SUVAsiaFront$25,000-4.0
401-Cylinders must be between 3 and 12, and whole numbersFooRAV8aSUVAsiaFront$31,00080.0
501-Cylinders must be between 3 and 12, and whole numbersFooRAV8bSUVAsiaFront$31,0003.5
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%let lbl = '01-Cylinders must be between 3 and 12, and whole numbers';\n", "\n", "data s02_chk_01_BAD_CYLINDERS(Label=&lbl);\n", " length check_description $100;\n", " SET s01_test_data;\n", " check_description=&lbl.;\n", " WHERE not((3 <=cylinders <=12) and (floor(cylinders)=cylinders));\n", "RUN;\n", "\n", "proc print; run;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looks like we detected some bad data in the original `SASHELP.CARS`!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Check MSRP bad values" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Obscheck_descriptionMakeModelTypeOriginDriveTrainMSRPCylinders
102-MSRP must be greater than $5000FooBarSUVAsiaFront$586
202-MSRP must be greater than $5000FizzBuzzSUVAsiaFront$-25,0006
302-MSRP must be greater than $5000FizzSnorehorseSUVAsiaFront.6
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%let lbl = '02-MSRP must be greater than $5000';\n", "\n", "data s02_chk_02_BAD_MSRP(Label=&lbl);\n", " length check_description $100;\n", " SET s01_test_data;\n", " check_description=&lbl.;\n", " WHERE not(MSRP >=5000);\n", "RUN;\n", "\n", "proc print; run;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Check for dupes" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Obscheck_descriptionMakeModelTypeOriginDriveTrainMSRPCylinders
103-Must be unique on (make model drivetrain)Saab9-5 AeroWagonEuropeFront$40,8454
203-Must be unique on (make model drivetrain)Saab9-5 AeroWagonEuropeFront$50,0004
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%let lbl = '03-Must be unique on (make model drivetrain)';\n", "\n", "proc sort nouniquekey data=s01_test_data uniqueout=_null_ \n", " out=s02_chk_03_NOT_UNIQUE(Label=&lbl);\n", " by make model drivetrain;\n", "run;\n", "\n", "/* Add the description */\n", "data s02_chk_03_NOT_UNIQUE(Label=&lbl);\n", " length check_description $100;\n", " SET s02_chk_03_NOT_UNIQUE;\n", " check_description=&lbl.;\n", "RUN;\n", "\n", "proc print; run;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Check for MSRP outliers" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Obscheck_descriptionMakeModelTypeOriginDriveTrainMSRPCylinders
104- Extreme MSRP outliers. The MSRP output is standard deviations from the mean MSRP.LlamaLoafWagonUSAFront19.2164
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%let lbl = '04- Extreme MSRP outliers. The MSRP output is standard deviations from the mean MSRP.';\n", "\n", "PROC STANDARD DATA=s01_test_data MEAN=0 STD=1 \n", " OUT=s02_chk_04_msrp_outlier(label=&lbl. where=(msrp ge 5)); *Detect anything 5 standard deviations from normal;\n", " format msrp 6.3;\n", " VAR msrp;\n", "RUN;\n", "\n", "/* Add the desciption*/\n", "data s02_chk_04_msrp_outlier(Label=&lbl);\n", " length check_description $100;\n", " SET s02_chk_04_msrp_outlier;\n", " check_description=&lbl.;\n", "RUN;\n", "\n", "proc print; run;" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Step 03 - Automatically gather and summarize the bad rows\n", "\n", "See that by keeping to a naming convention above, any arbitrary number of checks can be added with the same summary effort. No inspecting individual data sets by hand!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Get the row count for each check" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Summary of the checks

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Obsdescriptiontblcount
101-Cylinders must be between 3 and 12, and whole numbersS02_CHK_01_BAD_CYLINDERS5
202-MSRP must be greater than $5000S02_CHK_02_BAD_MSRP3
303-Must be unique on (make model drivetrain)S02_CHK_03_NOT_UNIQUE2
404- Extreme MSRP outliers. The MSRP output is standard deviations from the mean MSRP.S02_CHK_04_MSRP_OUTLIER1
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "/* Create a table to store the summarized results */\n", "proc sql ;\n", " create table s03_summarize_checks\n", " (description char(256), tbl char(32), count num);\n", "quit;\n", "\n", "/* Do a fuzzy match for the data sets matching our naming convention. \n", " Here's where the data set label is picked up and becomes the description.\n", "*/\n", "proc sql ;\n", " insert into s03_summarize_checks select memlabel, memname, nobs from \n", " dictionary.tables where libname eq \"WORK\" and memname like \"%^_CHK^_%\" escape '^';\n", "quit;\n", "\n", "title 'Summary of the checks';\n", "proc print data=s03_summarize_checks; run; title;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### About the DICTIONARY.TABLES above:\n", "\n", "This view contains information about all the data sets your session knows about. Watch out! In big corporate environments, that's a lot of data sets! ** Avoid doing a `select * from DICTIONARY.TABLES`... it might take forever to run. **\n", "See [this pdf](http://www2.sas.com/proceedings/sugi30/070-30.pdf) for additional details and examples.\n", "\n", "#### Note on the SQL fuzzy matching above:\n", "\n", "- In ANSI and `PROC SQL`, `LIKE` is used for wild-card searching.\n", "\n", " - `%` matches any zero or more characters\n", "\n", " - `_` matches any single character\n", "\n", "- In our naming convention, the underscores in `_CHK_` need to be escaped in order to take them literally, because underscores are part of the `LIKE` logic, so it becomes `%^_CHK^_%`." ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "### Get the total number of bad rows" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "

\n", "\n", "
The total number of bad rows is:       11
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "proc sql noprint;\n", " select sum(count) into :sum_bad_rows from s03_summarize_checks;\n", "quit;\n", "%put The total number of bad rows is: &sum_bad_rows.;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above code gets the sum of the summary table's count column, and stores the result in a macro variable we're calling `sum_bad_rows`. The variable is then available for use throughout the program. \n", "\n", "### Route program flow\n", "\n", "You could use it as the determiner for having your program do one set of things if there are no problems, and another set of things otherwise. Below is a basic example of this:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "

\n", "\n", "
ERROR: Bad rows detected. Fix data in source system and try again. Keep your job another day.
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%macro decide_what_to_do();\n", " %if &sum_bad_rows. eq 0 %then \n", " %put Doing scary things like email stakeholders, commit to production, sftp to third parties, etc;\n", " %else\n", " %put ERROR: Bad rows detected. Fix data in source system and try again. Keep your job another day.;\n", "%mend decide_what_to_do;\n", "\n", "%decide_what_to_do();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 04 - Get a detailed summary of bad rows\n", "\n", "Maybe one of the things to do when bad data is detected is to do a detailed summary. Let's grab the checks' data set names from the summary table:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "

\n", "\n", "
Data quality check tables are: S02_CHK_01_BAD_CYLINDERS S02_CHK_02_BAD_MSRP S02_CHK_03_NOT_UNIQUE S02_CHK_04_MSRP_OUTLIER
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "proc sql noprint;\n", " select tbl into :check_tbls separated by ' ' from s03_summarize_checks;\n", "quit;\n", "%put Data quality check tables are: &check_tbls.;" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Obscheck_descriptionMakeModelTypeOriginDriveTrainMSRPCylinders
101-Cylinders must be between 3 and 12, and whole numbersMazdaRX-8 4dr automaticSportsAsiaRear$25,700.
201-Cylinders must be between 3 and 12, and whole numbersMazdaRX-8 4dr manualSportsAsiaRear$27,200.
301-Cylinders must be between 3 and 12, and whole numbersFooRAV5SUVAsiaFront$25,000-4.0
401-Cylinders must be between 3 and 12, and whole numbersFooRAV8aSUVAsiaFront$31,00080.0
501-Cylinders must be between 3 and 12, and whole numbersFooRAV8bSUVAsiaFront$31,0003.5
602-MSRP must be greater than $5000FooBarSUVAsiaFront$586.0
702-MSRP must be greater than $5000FizzBuzzSUVAsiaFront$-25,0006.0
802-MSRP must be greater than $5000FizzSnorehorseSUVAsiaFront.6.0
903-Must be unique on (make model drivetrain)Saab9-5 AeroWagonEuropeFront$40,8454.0
1003-Must be unique on (make model drivetrain)Saab9-5 AeroWagonEuropeFront$50,0004.0
1104- Extreme MSRP outliers. The MSRP output is standard deviations from the mean MSRP.LlamaLoafWagonUSAFront$194.0
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "/* Append all the check data sets together.\n", "\n", " For those unfamiliar with the macro language, imagine replacing &check_tbls. below with the contents of the macro \n", " variable (as seen above)... and then executing the data step with the newly overwritten text. So the data step is executed\n", " with this middle line:\n", " \n", " set S02_CHK_01_BAD_CYLINDERS S02_CHK_02_BAD_MSRP S02_CHK_03_NOT_UNIQUE S02_CHK_04_MSRP_OUTLIER;\n", " \n", "*/\n", "data s04_gather_bad_rows;\n", " set &check_tbls.;\n", "run;\n", "\n", "proc print; run;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Next steps\n", "\n", "Imagine multiple groups of checks, based on the stage you're at in processing the data. Maybe you want some checks to print `ERROR` to the log, while others are just `WARNING`. If a different naming convention was used for each, separate summaries could be generated. \n", "\n", "Macros could be used to make the fuzzy search and summary gathering reusuable. Just give it your naming convention and it will automatically process. \n", "\n", "Again macros could be used to create a reusable series of checks. Maybe the first pass would be to detect the bad data, while the 2nd pass is after a manual corrections file has been loaded.\n", "\n", "More demos to come, expanding on these ideas!" ] } ], "metadata": { "kernelspec": { "display_name": "SAS", "language": "sas", "name": "sas" }, "language_info": { "codemirror_mode": "sas", "file_extension": ".sas", "mimetype": "text/x-sas", "name": "sas" } }, "nbformat": 4, "nbformat_minor": 0 }