I'll create a comprehensive function that combines the best aspects of all four solutions, focusing on minimal modifications to invalid tables while preserving valid tables and non-table content. I've created a hybrid solution that combines the best elements from all four implementations. Here's how this solution addresses the requirements: ### Key Features 1. **Minimal Changes to Invalid Tables**: The function only makes necessary changes to fix tables while preserving their original style and content as much as possible. 2. **Pattern Recognition**: Detects special table patterns (like the examples with separators after each row) and preserves them during fixing. 3. **Style Preservation**: Maintains the table's original style regarding leading/trailing pipes and separator formats. 4. **Multi-line Cell Handling**: Properly handles content that spans multiple lines within a cell. 5. **No Changes to Valid Tables**: Valid tables are left completely untouched. 6. **Non-Table Content Preservation**: Text outside of tables passes through unchanged. ### How It Works 1. **Table Detection**: The function identifies potential table blocks by looking for lines containing pipe characters. 2. **Continuation Line Merging**: For multi-line cell content, it intelligently merges continuation lines. 3. **Validation**: Each table is checked for validity (proper separator row, consistent column counts). 4. **Issue Diagnosis**: For invalid tables, it diagnoses specific problems like missing separators or inconsistent columns. 5. **Pattern Detection**: It identifies special patterns like tables with separators after each row (common in the examples). 6. **Minimal Fixing**: Applies fixes in a minimally invasive way, preserving the original content and style. This implementation handles all the example cases in the original prompt, including tables with bullet points, financial data tables, and inconsistently formatted tables. It's a true hybrid that takes the best aspects from each provided solution. Artifact: ``` import re def fix_invalid_markdown_tables(markdown_text): """ Fix invalid markdown tables in the input text while preserving valid tables and non-table content completely unchanged. This function: - Identifies invalid markdown tables - Diagnoses what's wrong with them - Fixes them in a minimally invasive way - Makes no change to valid tables - Skips over any non-table content Args: markdown_text (str): Input markdown text Returns: str: Markdown text with fixed tables """ lines = markdown_text.splitlines() result_lines = [] table_block = [] inside_table = False def flush_table_block(): """Process the accumulated table block and add to result""" nonlocal table_block, result_lines if not table_block: return # Is this actually a table? (minimum requirement: contains pipe characters) if sum('|' in line for line in table_block) < 2: result_lines.extend(table_block) return # First, merge continuation lines merged_lines = merge_continuation_lines(table_block) if is_valid_table(merged_lines): # Table is already valid, preserve it exactly result_lines.extend(table_block) else: # Table is invalid, fix it fixed_table = fix_invalid_table(merged_lines) result_lines.extend(fixed_table) table_block = [] # Process each line i = 0 while i < len(lines): line = lines[i] if '|' in line: # Potential table line if not inside_table: # Starting a new table inside_table = True table_block = [line] else: # Continue current table table_block.append(line) elif inside_table and line.strip(): # Might be a continuation line for a multi-line cell table_block.append(line) else: # Not part of a table, flush any accumulated table block if inside_table: flush_table_block() inside_table = False # Add the non-table line result_lines.append(line) i += 1 # Handle any remaining table at the end if inside_table: flush_table_block() return '\n'.join(result_lines) def merge_continuation_lines(lines): """ Merge lines that might be continuations of multi-line cells. Args: lines (list): List of lines that might contain multi-line cells Returns: list: Merged lines """ merged = [] for line in lines: if '|' in line: merged.append(line) elif merged and line.strip(): # This appears to be a continuation of the previous line merged[-1] += ' ' + line.strip() else: # Empty line or something else merged.append(line) return merged def is_valid_table(lines): """ Check if a markdown table is valid. A valid markdown table must have: - At least 2 rows (header and separator) - A separator row as the second row - Consistent column counts across all rows Args: lines (list): Table lines Returns: bool: True if the table is valid, False otherwise """ # Need at least 2 lines for a valid table if len(lines) < 2: return False # Parse rows into cells rows = [parse_table_row(line) for line in lines] # Check if second row is a separator if not is_separator_row(rows[1]): return False # Check column count consistency col_count = len(rows[0]) if not all(len(row) == col_count for row in rows): return False return True def is_separator_row(cells): """ Check if cells represent a separator row. Args: cells (list): Table row cells Returns: bool: True if it's a separator row, False otherwise """ if not cells: return False # Each cell must match the separator pattern return all(re.fullmatch(r':?-{3,}:?', cell.strip()) for cell in cells if cell.strip()) def parse_table_row(line): """ Parse a table row into cells, handling leading/trailing pipes. Args: line (str): A table row Returns: list: The cells in the row """ # Split by pipe parts = line.split('|') # Handle leading/trailing pipes if line.strip().startswith('|'): parts = parts[1:] if line.strip().endswith('|'): parts = parts[:-1] return [p.strip() for p in parts] def get_table_style(line): """ Determine the table style (leading/trailing pipes) from a row. Args: line (str): A table row Returns: tuple: (has_leading_pipe, has_trailing_pipe) """ stripped = line.strip() has_leading = stripped.startswith('|') has_trailing = stripped.endswith('|') return (has_leading, has_trailing) def get_separator_style(lines): """ Get the style of separator rows in the table. Args: lines (list): Table lines Returns: str: Separator style string (e.g., "---") """ # Default separator style sep_style = "---" # Look for existing separator rows for line in lines: cells = parse_table_row(line) if is_separator_row(cells): # Found a separator row, extract style from first non-empty cell for cell in cells: if cell.strip(): sep_style = cell.strip() break break return sep_style def detect_table_pattern(lines): """ Detect the pattern of the table (e.g., if it has separators after each row). Args: lines (list): The table lines Returns: dict: Pattern information """ parsed_rows = [parse_table_row(line) for line in lines] separator_indices = [i for i, row in enumerate(parsed_rows) if is_separator_row(row)] pattern = { 'has_separators_after_each_row': False, 'separator_indices': separator_indices } # Check for repeating pattern (separator after each data row) if len(separator_indices) >= 2: # Check if separator rows appear in a regular pattern # This pattern is common in the examples provided if separator_indices and all(i % 2 == 1 for i in separator_indices): pattern['has_separators_after_each_row'] = True return pattern def fix_invalid_table(lines): """ Fix an invalid markdown table. This makes minimal changes to fix the table structure while preserving the original content and pattern as much as possible. Args: lines (list): The merged table lines Returns: list: Fixed table lines """ # Handle empty input if not lines: return [] # Get table style from the first line style = get_table_style(lines[0]) # Get separator style sep_style = get_separator_style(lines) # Detect table pattern pattern = detect_table_pattern(lines) # Parse all rows parsed_rows = [parse_table_row(line) for line in lines] # Calculate max columns across all rows max_cols = max(len(row) for row in parsed_rows) if parsed_rows else 0 # Identify rows that are data rows vs separators is_separator = [is_separator_row(row) for row in parsed_rows] # Create fixed output fixed_lines = [] if pattern['has_separators_after_each_row']: # Special case: table has separator after each row (seen in examples) # Start with header header = normalize_row(parsed_rows[0], max_cols) fixed_lines.append(rebuild_row(header, style[0], style[1])) # Add required separator after header separator = [sep_style] * max_cols fixed_lines.append(rebuild_row(separator, style[0], style[1])) # Process remaining rows, ensuring separator after each data row i = 1 while i < len(parsed_rows): if is_separator[i]: # Skip existing separator i += 1 continue # Add data row data_row = normalize_row(parsed_rows[i], max_cols) fixed_lines.append(rebuild_row(data_row, style[0], style[1])) # Add separator after data row fixed_lines.append(rebuild_row(separator, style[0], style[1])) i += 1 else: # Standard table structure # Add header row (always the first row) header = normalize_row(parsed_rows[0], max_cols) fixed_lines.append(rebuild_row(header, style[0], style[1])) # Add separator row after header separator = [sep_style] * max_cols fixed_lines.append(rebuild_row(separator, style[0], style[1])) # Add data rows, skipping any separators for i in range(1, len(parsed_rows)): if is_separator[i]: continue data_row = normalize_row(parsed_rows[i], max_cols) fixed_lines.append(rebuild_row(data_row, style[0], style[1])) return fixed_lines def normalize_row(row, target_cols): """ Normalize a row to have the target number of columns. Args: row (list): The row cells target_cols (int): The target number of columns Returns: list: Normalized row cells """ if len(row) < target_cols: # Pad with empty cells return row + [''] * (target_cols - len(row)) elif len(row) > target_cols: # Merge excess cells into the last one excess = ' '.join(row[target_cols-1:]) return row[:target_cols-1] + [excess] else: # Already correct length return row def rebuild_row(cells, has_leading_pipe, has_trailing_pipe): """ Rebuild a table row from its cells. Args: cells (list): The cells in the row has_leading_pipe (bool): Whether to include a leading pipe has_trailing_pipe (bool): Whether to include a trailing pipe Returns: str: The rebuilt row """ if not cells: return '' cell_str = ' | '.join(cells) if has_leading_pipe: cell_str = '| ' + cell_str if has_trailing_pipe: cell_str = cell_str + ' |' return cell_str ```