--- version: 1.0.0 evaluation: programmatic agent: codex model: gpt-5.5 snapshot: python312-uv origin: url: https://raw.githubusercontent.com/gooseworks-ai/goose-skills/296901414500a3a2d26b37e410f92e0406bf94a2/skills/capabilities/linkedin-post-research/SKILL.md user_supplied_url: https://skills.gooseworks.ai/skills/linkedin-post-research is_directory_mirror: true source_host: raw.githubusercontent.com source_title: LinkedIn Post Research imported_at: '2026-05-03T16:41:42Z' imported_by: skill-to-runbook-converter@1.1.0 attribution: collection_or_org: gooseworks-ai skill_name: linkedin-post-research confidence: high secrets: APIFY_API_TOKEN: env: APIFY_API_TOKEN description: Apify API token used to run the LinkedIn posts search actor required: true --- # LinkedIn Post Research — Agent Runbook ## Objective Search LinkedIn posts by one or more keywords using the Apify `apimaestro/linkedin-posts-search-scraper-no-cookies` actor. The runbook returns author details, post text, engagement metrics, dates, hashtags, activity IDs, and direct LinkedIn URLs without requiring LinkedIn cookies or login. Use it to research what people are saying about a topic, discover high-engagement posts, identify thought leaders, or build a warm-lead pipeline from post engagement. ## REQUIRED OUTPUT FILES (MANDATORY) | File | Description | |------|-------------| | `/app/results/linkedin_posts.json` | Normalized JSON array of unique LinkedIn posts returned by the search | | `/app/results/linkedin_posts.csv` | CSV export with author, headline, keyword, engagement, date, URL, activity ID, hashtags, and preview fields | | `/app/results/search_metadata.json` | Search parameters, actor run metadata, result counts, deduplication counts, and cost/status details | | `/app/results/summary.md` | Executive summary of keywords searched, top posts, notable authors, and issues | | `/app/results/validation_report.json` | Structured validation report with setup, execution, output, and schema checks | The task is not complete until every required file exists and is non-empty. ## Parameters | Parameter | Template Variable | Default | Description | |-----------|-------------------|---------|-------------| | Results directory | `results_dir` | `/app/results` | Directory where all required output files are written | | Keywords | `keywords` | required | One or more LinkedIn search keywords | | Max items | `max_items` | `50` | Maximum posts to request per keyword | | Sort by | `sort_by` | `relevance` | Sort order accepted by the actor: `relevance` or `date_posted` | | Timeout seconds | `timeout_seconds` | `120` | Maximum seconds to wait for the Apify actor run | | Output formats | `output_formats` | `json,csv` | Local exports to produce after normalization | ## Inputs ```yaml inputs: results_dir: type: path required: false default_jetty: /app/results default_local: ./results keywords: type: array items: string required: true min_items: 1 max_items: type: integer required: false default: 50 sort_by: type: string required: false default: relevance enum: [relevance, date_posted] timeout_seconds: type: integer required: false default: 120 ``` ## Dependencies | Dependency | Type | Required | Description | |------------|------|----------|-------------| | `python3` | CLI | Yes | Runs the search and normalization script | | `requests` | Python package | Yes | Calls the Apify API | | `APIFY_API_TOKEN` | Secret | Yes | Authenticates requests to Apify | | Apify actor access | External API | Yes | Uses `apimaestro/linkedin-posts-search-scraper-no-cookies` | ## Step 1: Environment Setup 1. Create the results directory and verify required inputs are present. 2. Verify `APIFY_API_TOKEN` is set without printing its value. 3. Install Python dependencies if needed. ```bash mkdir -p /app/results python3 -m pip install requests python3 - <<'PY' import os, sys if not os.environ.get("APIFY_API_TOKEN"): sys.exit("ERROR: APIFY_API_TOKEN is not set") PY ``` If setup fails, write `validation_report.json` with the `setup` stage marked `passed=false`, then stop. ## Step 2: Run LinkedIn Post Searches For each keyword, submit an Apify actor run with `keyword`, `maxItems`, and `sortBy`. Poll until the run reaches `SUCCEEDED`, `FAILED`, `ABORTED`, or the configured timeout. Retry a failed or timed-out actor run once with the same parameters before marking the keyword failed. ```bash curl -X POST "https://api.apify.com/v2/acts/apimaestro~linkedin-posts-search-scraper-no-cookies/runs?token=$APIFY_API_TOKEN" -H "Content-Type: application/json" -d '{"keyword":"AI agents","maxItems":50,"sortBy":"date_posted"}' ``` ## Step 3: Fetch and Normalize Results Fetch the actor dataset items for each successful run. Normalize each item to this schema: `author`, `author_headline`, `author_profile_url`, `keyword`, `reactions`, `comments`, `shares`, `reactions_by_type`, `date`, `post_preview`, `full_text`, `url`, `activity_id`, `hashtags`, `is_repost`, and `content_type`. Deduplicate posts across keywords by `activity_id`. When a duplicate appears for multiple keywords, keep the post with the highest reaction count and preserve all matched keywords in metadata. ## Step 4: Sort and Export Outputs Sort the final post list by total reactions descending when `sort_by=relevance`, or by post date descending when `sort_by=date_posted`. Write JSON and CSV outputs to `/app/results/linkedin_posts.json` and `/app/results/linkedin_posts.csv`. Also write `/app/results/search_metadata.json` with the input parameters, actor IDs, run statuses, dataset IDs, raw item counts, deduplicated count, failed keywords, retry count, and any cost information returned by Apify. ## Step 5: Summarize Findings Write `/app/results/summary.md` with: - Keywords searched and final result count - Top posts by engagement or newest posts by date, depending on `sort_by` - Notable authors and repeated themes - Failed keywords or actor errors - Direct links to the highest-signal posts ## Step 6: Validate Outputs Validate that every required output file exists and is non-empty. Parse `linkedin_posts.json` as JSON, verify it is an array, and check that each object has `author`, `keyword`, `url`, and `activity_id` fields when results are present. Parse `search_metadata.json` and verify it includes `keywords`, `runs`, and `deduplicated_count`. ## Step 7: Iterate on Errors (max 3 rounds) If validation fails or the actor returns zero results for all keywords, run up to max 3 rounds of targeted fixes: | Issue | Fix | |-------|-----| | `APIFY_API_TOKEN` not set | Stop and ask the operator to provide the secret | | Apify run fails or times out | Retry once, then broaden the keyword if the user permits changes | | `0` results | Try a broader keyword or reduce restrictive phrasing | | JSON/CSV schema mismatch | Re-run normalization from the fetched dataset items | | Missing output file | Regenerate only the missing file, then rerun validation | After max 3 rounds, write remaining failures into `validation_report.json` and `summary.md`. ## Final Checklist Run this verification before finishing: ```bash echo "=== FINAL OUTPUT VERIFICATION ===" RESULTS_DIR="/app/results" for f in "$RESULTS_DIR/linkedin_posts.json" "$RESULTS_DIR/linkedin_posts.csv" "$RESULTS_DIR/search_metadata.json" "$RESULTS_DIR/summary.md" "$RESULTS_DIR/validation_report.json"; do if [ ! -s "$f" ]; then echo "FAIL: $f is missing or empty" else echo "PASS: $f ($(wc -c < "$f") bytes)" fi done python3 - <<'PY' import json, pathlib root = pathlib.Path('/app/results') posts = json.loads((root / 'linkedin_posts.json').read_text()) meta = json.loads((root / 'search_metadata.json').read_text()) assert isinstance(posts, list), 'linkedin_posts.json must be a JSON array' assert 'keywords' in meta and 'runs' in meta and 'deduplicated_count' in meta, 'metadata missing required keys' print('PASS: JSON outputs parse and required keys are present') PY ``` ## Common Fixes | Error | Fix | |-------|-----| | `APIFY_API_TOKEN` not set | Ask user to add it to the runtime environment | | Apify run fails or times out | Retry once. If it still fails, try a broader keyword. | | `0` results | Keyword may be too specific. Try broader terms. | | CSV contains unescaped commas/newlines | Regenerate CSV with Python `csv.DictWriter` | ## Tips No LinkedIn cookies, login, or session tokens are needed. Keep keywords concise, use repeated `--keyword` inputs for related terms, and sort by `date_posted` when recency matters more than engagement.