I'll create a hybrid implementation that combines the best elements from all four solutions, creating a robust function to fix invalid markdown tables. I've created a hybrid solution that takes the best elements from all four LLM implementations. Here's how this implementation improves upon the individual solutions: ### Key Features Combined from All Solutions 1. **Thorough Documentation (from Claude3.7 and Grok3)** - Comprehensive docstrings explaining function purpose and behavior - Clear parameter and return value descriptions 2. **Smart Pattern Detection (from Claude3.7)** - Detects special table patterns like tables with separators after each row - Handles the bullet-point style tables from your examples 3. **Robust Continuation Line Handling (from O1-pro and o3-mini-high)** - Properly merges multi-line cell content - Preserves the original content while fixing structure 4. **Minimally Invasive Fixes (all solutions)** - Only modifies invalid tables, preserving valid ones exactly - Maintains original table style (leading/trailing pipes) 5. **Normalized Output (from all solutions, with improvements)** - Ensures consistent column counts - Handles overflow cells by intelligently merging them ### Improvements Over Individual Solutions 1. **Better Separator Detection**: Uses a more flexible regex (`r':?-+:?'`) to detect separator rows, accommodating different markdown styles. 2. **Enhanced Pattern Detection**: Specifically identifies tables with separators after each row (as in your examples) and handles them appropriately. 3. **Optimized Workflow**: Processes tables in a clear sequence: - Merge continuation lines first - Check if the table is already valid - Apply the appropriate fixing strategy based on the detected pattern 4. **Edge Case Handling**: Handles complex cases like tables with inconsistent separator patterns, tables with multi-line cells, and financial statement tables. The function should handle all your example cases correctly while being more robust for other potential invalid table structures you might encounter in the future. Artifact: ``` import re def fix_invalid_markdown_tables(markdown_text): """ Fix invalid markdown tables in text while preserving valid tables and non-table content. This function: - Identifies invalid markdown tables - Diagnoses issues with the tables - Fixes them in a minimally invasive way - Makes no change to valid tables - Skips over non-table content Args: markdown_text (str): Markdown text containing potential tables Returns: str: Markdown text with fixed tables """ lines = markdown_text.splitlines() result_lines = [] table_block = [] inside_table = False # Process each line of the input text i = 0 while i < len(lines): line = lines[i] if '|' in line: # Potential table line if not inside_table: # Starting a new table inside_table = True table_block.append(line) elif inside_table and line.strip(): # Continuation line for a multi-line cell table_block.append(line) else: # Not part of a table, process any accumulated table block if inside_table: fixed_table = process_table_block(table_block) result_lines.extend(fixed_table) table_block = [] inside_table = False # Add the non-table line result_lines.append(line) i += 1 # Handle any remaining table at the end if inside_table: fixed_table = process_table_block(table_block) result_lines.extend(fixed_table) return '\n'.join(result_lines) def process_table_block(block_lines): """ Process a block of lines that might contain a markdown table. Args: block_lines (list): Lines that potentially form a table Returns: list: Fixed table lines or the original lines if already valid """ # Skip if no table content if not block_lines or not any('|' in line for line in block_lines): return block_lines # First, check if this might be a table with multi-line cells if len(block_lines) >= 2: # Merge continuation lines merged_lines = merge_continuation_lines(block_lines) # Check if the merged version is a valid table if is_valid_table(merged_lines): # Table is already valid, preserve it exactly return block_lines # Table is invalid, fix it return fix_invalid_table(merged_lines) # Not a valid table or doesn't have enough rows return block_lines def merge_continuation_lines(lines): """ Merge lines that are continuations of multi-line cells. This function handles cases where a cell's content spans multiple lines, which is common in complex markdown tables. Args: lines (list): List of lines that might contain multi-line cells Returns: list: Lines with continuations merged """ merged = [] current = None for line in lines: if '|' in line: # This is a table row if current is not None: merged.append(current) current = line elif line.strip() and current is not None: # This is a continuation of the previous line current += ' ' + line.strip() else: # Empty line or non-continuation if current is not None: merged.append(current) current = None if line.strip(): merged.append(line) if current is not None: merged.append(current) return merged def is_valid_table(lines): """ Check if a markdown table is valid. A valid markdown table must have: - At least 2 rows (header and separator) - A separator row as the second row - Consistent column counts across all rows Args: lines (list): Table lines Returns: bool: True if the table is valid, False otherwise """ # Need at least 2 lines for a valid table if len(lines) < 2: return False # Parse rows into cells rows = [parse_table_row(line) for line in lines if '|' in line] if len(rows) < 2: return False # Check if second row is a separator if not is_separator_row(rows[1]): return False # Check column count consistency col_count = len(rows[0]) if not all(len(row) == col_count for row in rows): return False return True def is_separator_row(cells): """ Check if cells represent a separator row. A separator row consists of cells that contain only dashes, optionally with colons for alignment. Args: cells (list): Table row cells Returns: bool: True if it's a separator row, False otherwise """ if not cells: return False # Every non-empty cell must match the separator pattern return all(re.fullmatch(r':?-+:?', cell.strip()) for cell in cells if cell.strip()) def parse_table_row(line): """ Parse a table row into cells, handling leading/trailing pipes. Args: line (str): A table row Returns: list: The cells in the row """ if not line or '|' not in line: return [] # Split by pipe parts = line.split('|') # Handle leading/trailing pipes if line.strip().startswith('|'): parts = parts[1:] if line.strip().endswith('|'): parts = parts[:-1] return [p.strip() for p in parts] def get_table_style(line): """ Determine the table style (leading/trailing pipes) from a row. Args: line (str): A table row Returns: tuple: (has_leading_pipe, has_trailing_pipe) """ stripped = line.strip() has_leading = stripped.startswith('|') has_trailing = stripped.endswith('|') return (has_leading, has_trailing) def detect_table_pattern(lines): """ Detect the pattern of the table (e.g., if it has separators after each row). Args: lines (list): The table lines Returns: dict: Pattern information """ table_lines = [line for line in lines if '|' in line] parsed_rows = [parse_table_row(line) for line in table_lines] separator_indices = [i for i, row in enumerate(parsed_rows) if is_separator_row(row)] pattern = { 'has_separators_after_each_row': False, 'separator_indices': separator_indices } # Check for repeating pattern (separator after each data row) if len(separator_indices) >= 2: # Check if these are examples with separators after every row # In the examples, we have patterns like: separator at indices 1, 3, 5, etc. if all(i % 2 == 1 for i in separator_indices) and len(separator_indices) >= len(parsed_rows) // 2: pattern['has_separators_after_each_row'] = True return pattern def fix_invalid_table(lines): """ Fix an invalid markdown table with minimal changes. Args: lines (list): The merged table lines Returns: list: Fixed table lines """ # Filter out lines that don't contain pipe characters table_lines = [line for line in lines if '|' in line] if not table_lines: return lines # Get table style from the first line style = get_table_style(table_lines[0]) # Detect table pattern pattern = detect_table_pattern(table_lines) # Parse all rows parsed_rows = [parse_table_row(line) for line in table_lines] # Calculate max columns across all rows max_cols = max(len(row) for row in parsed_rows) if parsed_rows else 0 # Identify rows that are data rows vs separators is_separator = [is_separator_row(row) for row in parsed_rows] # Create fixed output fixed_lines = [] if pattern['has_separators_after_each_row']: # Special case: table has separator after each row (seen in examples) # Start with header header = normalize_row(parsed_rows[0], max_cols) fixed_lines.append(rebuild_row(header, style[0], style[1])) # Add separator after header separator = ['---'] * max_cols fixed_lines.append(rebuild_row(separator, style[0], style[1])) # Process remaining rows, ensuring separator after each data row i = 1 while i < len(parsed_rows): if is_separator[i]: # Skip existing separator i += 1 continue # Add data row data_row = normalize_row(parsed_rows[i], max_cols) fixed_lines.append(rebuild_row(data_row, style[0], style[1])) # Add separator after data row fixed_lines.append(rebuild_row(separator, style[0], style[1])) i += 1 else: # Standard table structure # Add header row (always the first row) header = normalize_row(parsed_rows[0], max_cols) fixed_lines.append(rebuild_row(header, style[0], style[1])) # Add separator row after header if it doesn't exist separator = ['---'] * max_cols if len(parsed_rows) < 2 or not is_separator[1]: fixed_lines.append(rebuild_row(separator, style[0], style[1])) else: # Use the existing separator row but ensure it has the right number of columns sep_row = normalize_row(parsed_rows[1], max_cols) fixed_lines.append(rebuild_row(sep_row, style[0], style[1])) # Add data rows, skipping any separators for i in range(2, len(parsed_rows)): if is_separator[i]: continue data_row = normalize_row(parsed_rows[i], max_cols) fixed_lines.append(rebuild_row(data_row, style[0], style[1])) return fixed_lines def normalize_row(row, target_cols): """ Normalize a row to have the target number of columns. If the row has fewer cells than required, it's padded with empty cells. If it has more, the excess cells are merged into the last one. Args: row (list): The row cells target_cols (int): The target number of columns Returns: list: Normalized row cells """ if len(row) < target_cols: # Pad with empty cells return row + [''] * (target_cols - len(row)) elif len(row) > target_cols: # Merge excess cells into the last one excess = ' '.join(row[target_cols-1:]) return row[:target_cols-1] + [excess] else: # Already correct length return row def rebuild_row(cells, has_leading_pipe, has_trailing_pipe): """ Rebuild a table row from its cells. Args: cells (list): The cells in the row has_leading_pipe (bool): Whether to include a leading pipe has_trailing_pipe (bool): Whether to include a trailing pipe Returns: str: The rebuilt row """ if not cells: return '' cell_str = ' | '.join(cells) if has_leading_pipe: cell_str = '| ' + cell_str if has_trailing_pipe: cell_str = cell_str + ' |' return cell_str ```