I'll analyze all four solutions and create a hybrid implementation that combines their best elements for fixing invalid Markdown tables. I've created a hybrid solution that combines the strongest elements from all four implementations. Here's how I synthesized the best aspects from each solution: ### Key Features of the Hybrid Solution: 1. **Robust Table Detection**: - Identifies tables by looking for pipe characters - Properly groups related lines into table blocks - Handles multi-line cell content that continues across lines 2. **Smart Pattern Recognition**: - Detects special patterns like "separator after each row" (from the first example) - Preserves original table style (leading/trailing pipes) - Understands various markdown table formats 3. **Comprehensive Validation**: - Checks for proper separator rows - Ensures consistent column counts - Validates overall table structure 4. **Minimal-Invasive Fixing**: - Preserves valid tables completely unchanged - Skips non-table content entirely - Makes targeted fixes to invalid tables with minimal changes 5. **Content Preservation**: - Carefully merges multi-line cell content - Normalizes column counts while preserving data - Maintains the original table style when fixing ### What I Took From Each Solution: - From **Claude3.7**: The well-structured approach with clear documentation and comprehensive pattern detection - From **Grok3**: Clean style preservation and robust row handling - From **O1-pro**: The advanced pattern detection for separator-after-each-row tables - From **o3-mini-high**: Efficient table processing while keeping the code readable ### How It Handles the Examples: 1. For bullet-point tables with separators after each row: - Recognizes the repeating separator pattern - Maintains the bullet points while fixing the structure - Creates a proper table with normalized column counts 2. For tables with misaligned headers and complex structure: - Normalizes column counts - Ensures proper separator placement - Preserves all content while creating a valid structure This hybrid implementation combines thorough validation, smart pattern detection, and minimal-invasive fixes while maintaining excellent readability and performance. Artifact: ``` import re def fix_invalid_markdown_tables(markdown_text): """ Fix invalid markdown tables in text while preserving valid tables and non-table content. This function: - Identifies invalid markdown tables - Diagnoses issues with the tables - Fixes them in a minimally invasive way - Makes no change to valid tables - Skips over non-table content Args: markdown_text (str): Markdown text containing potential tables Returns: str: Markdown text with fixed tables """ lines = markdown_text.splitlines() result_lines = [] table_block = [] inside_table = False # Process each line of the input text i = 0 while i < len(lines): line = lines[i] if '|' in line: # Potential table line if not inside_table: # Starting a new table inside_table = True table_block.append(line) elif inside_table and line.strip(): # Continuation line for a multi-line cell table_block.append(line) else: # Not part of a table, process any accumulated table block if inside_table: fixed_table = process_table_block(table_block) result_lines.extend(fixed_table) table_block = [] inside_table = False # Add the non-table line result_lines.append(line) i += 1 # Handle any remaining table at the end if inside_table: fixed_table = process_table_block(table_block) result_lines.extend(fixed_table) return '\n'.join(result_lines) def process_table_block(block_lines): """ Process a block of lines that might contain a markdown table. Args: block_lines (list): Lines that potentially form a table Returns: list: Fixed table lines or the original lines if already valid """ # Skip if no table content if not block_lines or not any('|' in line for line in block_lines): return block_lines # First, merge continuation lines merged_lines = merge_continuation_lines(block_lines) # Check if the merged version is a valid table if is_valid_table(merged_lines): # Table is already valid, preserve it exactly return block_lines # Table is invalid, fix it return fix_invalid_table(merged_lines) def merge_continuation_lines(lines): """ Merge lines that are continuations of multi-line cells. This function handles cases where a cell's content spans multiple lines, which is common in complex markdown tables. Args: lines (list): List of lines that might contain multi-line cells Returns: list: Lines with continuations merged """ merged = [] current = None for line in lines: if '|' in line: # This is a table row if current is not None: merged.append(current) current = line elif line.strip() and current is not None: # This is a continuation of the previous line current += ' ' + line.strip() else: # Empty line or non-continuation if current is not None: merged.append(current) current = None if line.strip(): merged.append(line) if current is not None: merged.append(current) return merged def parse_table_row(line): """ Parse a table row into cells, handling leading/trailing pipes. Args: line (str): A table row Returns: list: The cells in the row """ if not line or '|' not in line: return [] # Split by pipe parts = line.split('|') # Handle leading/trailing pipes if line.strip().startswith('|'): parts = parts[1:] if line.strip().endswith('|'): parts = parts[:-1] return [p.strip() for p in parts] def is_separator_cell(cell): """ Check if a cell contains a valid separator pattern (e.g., ---, :--:). Args: cell (str): The cell content Returns: bool: True if it's a valid separator cell, False otherwise """ return bool(re.fullmatch(r':?-{3,}:?', cell.strip())) def is_separator_row(cells): """ Check if cells represent a separator row. A separator row consists of cells that contain only dashes, optionally with colons for alignment. Args: cells (list): Table row cells Returns: bool: True if it's a separator row, False otherwise """ if not cells: return False # Every non-empty cell must match the separator pattern return all(is_separator_cell(cell) for cell in cells if cell.strip()) def is_valid_table(lines): """ Check if a markdown table is valid. A valid markdown table must have: - At least 2 rows (header and separator) - A separator row as the second row - Consistent column counts across all rows Args: lines (list): Table lines Returns: bool: True if the table is valid, False otherwise """ # Need at least 2 lines for a valid table if len(lines) < 2: return False # Parse rows into cells rows = [parse_table_row(line) for line in lines if '|' in line] if len(rows) < 2: return False # Check if second row is a separator if not is_separator_row(rows[1]): return False # Check column count consistency col_count = len(rows[0]) return all(len(row) == col_count for row in rows) def get_table_style(line): """ Determine the table style (leading/trailing pipes) from a row. Args: line (str): A table row Returns: tuple: (has_leading_pipe, has_trailing_pipe) """ stripped = line.strip() has_leading = stripped.startswith('|') has_trailing = stripped.endswith('|') return (has_leading, has_trailing) def detect_table_pattern(lines): """ Detect the pattern of the table (e.g., if it has separators after each row). Args: lines (list): The table lines Returns: dict: Pattern information """ table_lines = [line for line in lines if '|' in line] parsed_rows = [parse_table_row(line) for line in table_lines] is_sep = [is_separator_row(row) for row in parsed_rows] separator_indices = [i for i, sep in enumerate(is_sep) if sep] pattern = { 'has_separators_after_each_row': False, 'separator_indices': separator_indices } # Check for repeating pattern (separator after each data row) if len(separator_indices) >= 2: # Count how many rows have a separator after them separator_pattern_count = sum(1 for i in range(len(is_sep)-1) if not is_sep[i] and is_sep[i+1]) data_row_count = sum(1 for sep in is_sep if not sep) # If more than half of data rows have a separator after them, consider it a pattern if separator_pattern_count >= data_row_count / 2: pattern['has_separators_after_each_row'] = True return pattern def normalize_row(row, target_cols): """ Normalize a row to have the target number of columns. If the row has fewer cells than required, it's padded with empty cells. If it has more, the excess cells are merged into the last one. Args: row (list): The row cells target_cols (int): The target number of columns Returns: list: Normalized row cells """ if not row: return [''] * target_cols if len(row) < target_cols: # Pad with empty cells return row + [''] * (target_cols - len(row)) elif len(row) > target_cols: # Merge excess cells into the last one if target_cols > 1: return row[:target_cols-1] + [' '.join(row[target_cols-1:])] else: return [' '.join(row)] else: # Already correct length return row def rebuild_row(cells, has_leading_pipe, has_trailing_pipe): """ Rebuild a table row from its cells. Args: cells (list): The cells in the row has_leading_pipe (bool): Whether to include a leading pipe has_trailing_pipe (bool): Whether to include a trailing pipe Returns: str: The rebuilt row """ if not cells: return '' cell_str = ' | '.join(cells) if has_leading_pipe: cell_str = '| ' + cell_str if has_trailing_pipe: cell_str = cell_str + ' |' return cell_str def fix_invalid_table(lines): """ Fix an invalid markdown table with minimal changes. Args: lines (list): The merged table lines Returns: list: Fixed table lines """ # Filter out lines that don't contain pipe characters table_lines = [line for line in lines if '|' in line] if not table_lines: return lines # Get table style from the first line style = get_table_style(table_lines[0]) # Parse all rows parsed_rows = [parse_table_row(line) for line in table_lines] # Calculate max columns across all rows max_cols = max(len(row) for row in parsed_rows) if parsed_rows else 0 # Identify rows that are data rows vs separators is_separator = [is_separator_row(row) for row in parsed_rows] # Detect table pattern pattern = detect_table_pattern(lines) # Create fixed output fixed_lines = [] if pattern['has_separators_after_each_row']: # Special case: table has separator after each row (seen in examples) # Start with header header = normalize_row(parsed_rows[0], max_cols) fixed_lines.append(rebuild_row(header, style[0], style[1])) # Add separator after header separator = ['---'] * max_cols fixed_lines.append(rebuild_row(separator, style[0], style[1])) # Process remaining rows, ensuring separator after each data row i = 1 while i < len(parsed_rows): if is_separator[i]: # Skip existing separator i += 1 continue # Add data row data_row = normalize_row(parsed_rows[i], max_cols) fixed_lines.append(rebuild_row(data_row, style[0], style[1])) # Add separator after data row fixed_lines.append(rebuild_row(separator, style[0], style[1])) i += 1 else: # Standard table structure # Add header row (always the first row) header = normalize_row(parsed_rows[0], max_cols) fixed_lines.append(rebuild_row(header, style[0], style[1])) # Add separator row after header separator = ['---'] * max_cols fixed_lines.append(rebuild_row(separator, style[0], style[1])) # Add data rows, skipping any separators for i in range(1, len(parsed_rows)): if is_separator[i]: continue data_row = normalize_row(parsed_rows[i], max_cols) fixed_lines.append(rebuild_row(data_row, style[0], style[1])) return fixed_lines # Test with example if run directly if __name__ == "__main__": example = """| ●| we have a limited customer base and limited sales and relationships with international customers; ---|---|--- | ●| difficulty in managing multinational operations; ---|---|---""" fixed = fix_invalid_markdown_tables(example) print(fixed) ```