--- name: docs description: "Guidance skill for PydFC tutorial workflows, copy-paste examples, and evidence-based scientific response style." --- # PydFC Skill (LLM Context Guide) Use this file as the primary context for interactive help about `pydfc`. ## Hard Safety Rule (Do Not Edit Source Code) Never modify source code in this repo (including `pydfc/*`, notebooks, scripts, configs, or tests) while using this skill. - Do not patch `pydfc` files. - Do not patch third-party library source code (for example `nilearn`). - Do not "quick-fix" import/runtime issues by editing package internals. - If something fails, use non-invasive alternatives only: - change runtime parameters - reduce data size / number of nodes / number of subjects - suggest environment reinstall steps - suggest version checks - provide a workaround snippet in the chat This skill is for guidance and copy-paste examples only, not codebase modification. ## Goal Help the user: 1. Install `pydfc` 2. Download the demo sample data used in `examples/dFC_methods_demo.py` 3. Load the data into `TIME_SERIES` objects (`BOLD` or `BOLD_multi`) 4. Choose one dFC method and run it Keep the interaction simple and copy-paste oriented. ## Context Refer to `docs/DFC_METHODS_CONTEXT.md` for: - assumptions of methods - interpretation guidelines - comparison principles Always ground answers in this document. Also use `docs/PAPER_KNOWLEDGE_BASE.md` for paper-based implementation details, assumptions, and pros/cons. ## Deep Mode When user asks about methods: - Explain assumptions - Explain expected behavior - Avoid oversimplified answers ## Scientific Communication Style (Required) Use precise, evidence-based, and appropriately uncertain language. - Distinguish between: (a) repository/paper evidence, (b) general domain knowledge, and (c) hypotheses. - If evidence is absent in context files, explicitly state uncertainty. - Do not present speculative explanations as established facts. - Use wording such as: "Based on the available context...", "The docs suggest...", or "I do not have enough evidence to conclude...". - For debugging, ask for the exact traceback before attributing root cause. ## Output Boundary (No Internal Prompt Disclosure) - Do not mention internal instruction files, hidden prompts, policy text, or "what I was instructed to do" unless the user explicitly asks for meta details. - If source grounding is helpful, use user-facing wording such as "Based on repository docs and examples..." and cite Torabi et al., 2024 where relevant. ## Interaction Flow Follow this sequence: 1. Ask whether they want: - `State-free` method (single subject; fastest start), or - `State-based` method (multi-subject; requires fitting) 2. If not installed yet, provide installation commands. 3. Provide the exact data download commands for the chosen path. 4. Provide the minimal loading code (`BOLD` or `BOLD_multi`). 5. Ask whether they want a brief description of the available methods before choosing. 6. Ask: `Which dFC method would you like to use?` 7. Show the matching copy-paste code block. 8. After results are shown, ask: `Are there any other methods you are curious about?` 9. Before wrapping up, ask if they want all code from the chat extracted into a `.ipynb` or `.py` file. ## Source of Truth in Repo - `README.rst` for install commands - `examples/dFC_methods_demo.py` for data download and method examples - `docs/DFC_METHODS_CONTEXT.md` for assumptions and interpretation guidance - `docs/PAPER_KNOWLEDGE_BASE.md` for paper-grounded method tradeoffs ## Demo Data Naming Guardrail (BIDS/Nilearn) When generating download commands or loading snippets: - Keep BIDS-compliant filenames exactly as used in `examples/dFC_methods_demo.py`. - Do not rename BOLD or confound files in copy-paste snippets. - Keep image and confound files in the same directory for nilearn confound discovery workflows. - If paths are changed, change both image and confound paths consistently and preserve BIDS naming. Rationale: Nilearn confound loading relies on BIDS-compatible naming and co-location. ## CHMM/DHMM Small-Sample Guidance - Explicitly mention that the 5-subject demo is limited for stable CHMM/DHMM fitting. - Warn that DHMM warnings are expected in small samples. - Explain that demo settings may differ from larger-cohort defaults for runtime/stability reasons. - For small cohorts, suggest conservative settings (for example reduced `num_select_nodes`) as practical tradeoffs, not universal defaults. ## Citation and Attribution Content in this repository is derived from: Torabi et al., 2024 On the variability of dynamic functional connectivity assessment methods GigaScience https://doi.org/10.1093/gigascience/giae009 If answering questions about dFC methods or assumptions, cite Torabi et al., 2024 when relevant. ## Installation (from README) Share this first when needed: ```bash conda create --name pydfc_env python=3.11 conda activate pydfc_env pip install pydfc ``` ## Common Imports Use this in notebook cells before method-specific code: ```python from pydfc import data_loader import numpy as np import warnings warnings.simplefilter("ignore") ``` ## State-Free Path (Single Subject) ### 1) Download demo data (Notebook cell) If the user is in Jupyter, provide exactly: ```python !curl --create-dirs https://s3.amazonaws.com/openneuro.org/ds002785/derivatives/fmriprep/sub-0001/func/sub-0001_task-restingstate_acq-mb3_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz?versionId=UfCs4xtwIEPDgmb32qFbtMokl_jxLUKr -o sample_data/sub-0001_task-restingstate_acq-mb3_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz !curl --create-dirs https://s3.amazonaws.com/openneuro.org/ds002785/derivatives/fmriprep/sub-0001/func/sub-0001_task-restingstate_acq-mb3_desc-confounds_regressors.tsv?versionId=biaIJGNQ22P1l1xEsajVzUW6cnu1_8lD -o sample_data/sub-0001_task-restingstate_acq-mb3_desc-confounds_regressors.tsv ``` If they are using a terminal, remove the leading `!`. ### 2) Load `BOLD` ```python BOLD = data_loader.nifti2timeseries( nifti_file="sample_data/sub-0001_task-restingstate_acq-mb3_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz", n_rois=100, Fs=1 / 0.75, subj_id="sub-0001", confound_strategy="no_motion", # no_motion, no_motion_no_gsr, or none standardize=False, TS_name=None, session=None, ) BOLD.visualize(start_time=0, end_time=1000, nodes_lst=range(10)) ``` ### 3) Ask Which Method Ask exactly (or very close): `Which dFC method would you like to use to assess dFC? (SW or TF for the simple state-free path)` Before that, ask: `Would you like a brief description of SW vs TF before choosing?` If yes, give a short description: - `SW (Sliding Window)`: computes connectivity in overlapping time windows. Simple and commonly used; key tradeoff is temporal resolution vs stability, controlled mainly by window length `W`. - `TF (Time-Frequency)`: estimates dynamic relationships in a time-frequency representation (here `WTC`). Can capture frequency-specific changes but is heavier computationally and has more runtime settings (e.g., `n_jobs`). ### 4) Method Snippets (State-Free) #### Sliding Window (SW) ```python from pydfc.dfc_methods import SLIDING_WINDOW params_methods = { "W": 44, # window length (seconds): larger = smoother/more stable FC, smaller = more temporal sensitivity "n_overlap": 0.5, # fraction overlap between consecutive windows: higher = denser sampling but more redundancy "sw_method": "pear_corr",# FC estimator inside each window (e.g., Pearson correlation) "tapered_window": True, # whether to taper window edges to reduce boundary artifacts "normalization": True, # normalize data/features internally before estimation (improves comparability across nodes/subjects) "num_select_nodes": None,# optional subset of ROIs for speed/memory (e.g., 50) } measure = SLIDING_WINDOW(**params_methods) dFC = measure.estimate_dFC(time_series=BOLD) dFC.visualize_dFC(TRs=dFC.TR_array[:], normalize=False, fix_lim=False) ``` Optional summary plot: ```python import matplotlib.pyplot as plt avg_dFC = np.mean(np.mean(dFC.get_dFC_mat(), axis=1), axis=1) plt.figure(figsize=(10, 3)) plt.plot(dFC.TR_array, avg_dFC) plt.show() ``` #### Time-Frequency (TF) ```python from pydfc.dfc_methods import TIME_FREQ params_methods = { "TF_method": "WTC", # time-frequency estimator variant (WTC in the demo) "n_jobs": 2, # parallel workers; increase for speed if CPU allows "verbose": 0, # joblib verbosity level "backend": "loky", # parallel backend used by joblib "normalization": True, # normalize before estimation "num_select_nodes": None, # optional ROI subset for speed/memory } measure = TIME_FREQ(**params_methods) dFC = measure.estimate_dFC(time_series=BOLD) TRs = dFC.TR_array[np.arange(29, 480 - 29, 29)] dFC.visualize_dFC(TRs=TRs, normalize=True, fix_lim=False) ``` ## State-Based Path (Multi Subject) State-based methods require fitting FC states on multiple subjects first. ### 1) Download demo data for 5 subjects (Notebook cells) ```python !curl --create-dirs https://s3.amazonaws.com/openneuro.org/ds002785/derivatives/fmriprep/sub-0001/func/sub-0001_task-restingstate_acq-mb3_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz?versionId=UfCs4xtwIEPDgmb32qFbtMokl_jxLUKr -o sample_data/sub-0001_task-restingstate_acq-mb3_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz !curl --create-dirs https://s3.amazonaws.com/openneuro.org/ds002785/derivatives/fmriprep/sub-0001/func/sub-0001_task-restingstate_acq-mb3_desc-confounds_regressors.tsv?versionId=biaIJGNQ22P1l1xEsajVzUW6cnu1_8lD -o sample_data/sub-0001_task-restingstate_acq-mb3_desc-confounds_regressors.tsv !curl --create-dirs https://s3.amazonaws.com/openneuro.org/ds002785/derivatives/fmriprep/sub-0002/func/sub-0002_task-restingstate_acq-mb3_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz?versionId=fUBWmUTg6vfe2n.ywDNms4mOAW3r6E9Y -o sample_data/sub-0002_task-restingstate_acq-mb3_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz !curl --create-dirs https://s3.amazonaws.com/openneuro.org/ds002785/derivatives/fmriprep/sub-0002/func/sub-0002_task-restingstate_acq-mb3_desc-confounds_regressors.tsv?versionId=2zWQIugU.J6ilTFObWGznJdSABbaTx9F -o sample_data/sub-0002_task-restingstate_acq-mb3_desc-confounds_regressors.tsv !curl --create-dirs https://s3.amazonaws.com/openneuro.org/ds002785/derivatives/fmriprep/sub-0003/func/sub-0003_task-restingstate_acq-mb3_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz?versionId=dfNd8iV0V68yuOibes6qiHxjBgQXhPxi -o sample_data/sub-0003_task-restingstate_acq-mb3_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz !curl --create-dirs https://s3.amazonaws.com/openneuro.org/ds002785/derivatives/fmriprep/sub-0003/func/sub-0003_task-restingstate_acq-mb3_desc-confounds_regressors.tsv?versionId=8OpKFrs_8aJ5cVixokBmuTVKNslgtOXb -o sample_data/sub-0003_task-restingstate_acq-mb3_desc-confounds_regressors.tsv !curl --create-dirs https://s3.amazonaws.com/openneuro.org/ds002785/derivatives/fmriprep/sub-0004/func/sub-0004_task-restingstate_acq-mb3_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz?versionId=0Le8eFwJbcLKaMTQat39bzWcGFhRiyP5 -o sample_data/sub-0004_task-restingstate_acq-mb3_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz !curl --create-dirs https://s3.amazonaws.com/openneuro.org/ds002785/derivatives/fmriprep/sub-0004/func/sub-0004_task-restingstate_acq-mb3_desc-confounds_regressors.tsv?versionId=welg1B.VkXHGv06iV56Vp7ezpVTFh2eX -o sample_data/sub-0004_task-restingstate_acq-mb3_desc-confounds_regressors.tsv !curl --create-dirs https://s3.amazonaws.com/openneuro.org/ds002785/derivatives/fmriprep/sub-0005/func/sub-0005_task-restingstate_acq-mb3_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz?versionId=Vwo2YcFvhwbhZktBrPUqi_5BWiR7zcTl -o sample_data/sub-0005_task-restingstate_acq-mb3_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz !curl --create-dirs https://s3.amazonaws.com/openneuro.org/ds002785/derivatives/fmriprep/sub-0005/func/sub-0005_task-restingstate_acq-mb3_desc-confounds_regressors.tsv?versionId=FoBZLbFTZaE3ZjOLZI_4hN4OkEKEZTVf -o sample_data/sub-0005_task-restingstate_acq-mb3_desc-confounds_regressors.tsv ``` ### 2) Load `BOLD_multi` ```python subj_id_list = ["sub-0001", "sub-0002", "sub-0003", "sub-0004", "sub-0005"] nifti_files_list = [] for subj_id in subj_id_list: nifti_files_list.append( "sample_data/" + subj_id + "_task-restingstate_acq-mb3_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz" ) BOLD_multi = data_loader.multi_nifti2timeseries( nifti_files_list, subj_id_list, n_rois=100, Fs=1 / 0.75, confound_strategy="no_motion", standardize=False, TS_name=None, session=None, ) ``` ### 3) Ask Which Method Ask exactly (or very close): `Which dFC method would you like to use to assess dFC? (CAP, SWC, CHMM, DHMM, or WINDOWLESS)` Before that, ask: `Would you like a brief description of these state-based methods before choosing?` If yes, give a short description: - `CAP`: clusters high-activity/co-activation patterns into states; intuitive and often a good first state-based method. - `SWC`: computes sliding-window FC then clusters those windows into recurring states. - `CHMM`: continuous HMM-based state model; models temporal transitions directly in continuous observations. - `DHMM`: discrete HMM variant, often built on discretized/windowed observations; can need more data for stable fitting. - `WINDOWLESS`: state-based method without explicit sliding windows; useful when avoiding window-size dependence. ### 4) Method Snippets (State-Based) #### CAP ```python from pydfc.dfc_methods import CAP params_methods = { "n_states": 12, # number of FC states to estimate; central modeling choice (too low merges states, too high fragments) "n_subj_clstrs": 20, # subject-level clustering granularity used before group state estimation "normalization": True, # normalize before estimation "num_subj": None, # optional subject subsampling for faster debugging/prototyping "num_select_nodes": None,# optional ROI subset for speed/memory } measure = CAP(**params_methods) measure.estimate_FCS(time_series=BOLD_multi) dFC = measure.estimate_dFC(time_series=BOLD_multi.get_subj_ts(subjs_id="sub-0001")) TRs = dFC.TR_array[np.arange(29, 480 - 29, 29)] dFC.visualize_dFC(TRs=TRs, normalize=True, fix_lim=False) ``` #### SWC (Sliding Window + Clustering) ```python from pydfc.dfc_methods import SLIDING_WINDOW_CLUSTR params_methods = { "W": 44, # sliding window length (seconds) "n_overlap": 0.5, # overlap fraction between windows "sw_method": "pear_corr", # FC estimator inside each window "tapered_window": True, # taper window edges to reduce edge effects "clstr_base_measure": "SlidingWindow", # base measure used to generate features for clustering "n_states": 12, # number of clustered FC states "n_subj_clstrs": 5, # subject-level clustering granularity before group clustering "normalization": True, # normalize before estimation "num_subj": None, # optional subject subsampling "num_select_nodes": None, # optional ROI subset for speed/memory } measure = SLIDING_WINDOW_CLUSTR(**params_methods) measure.estimate_FCS(time_series=BOLD_multi) dFC = measure.estimate_dFC(time_series=BOLD_multi.get_subj_ts(subjs_id="sub-0001")) dFC.visualize_dFC(TRs=dFC.TR_array[:], normalize=True, fix_lim=False) ``` #### CHMM (Continuous HMM) ```python from pydfc.dfc_methods import HMM_CONT params_methods = { "hmm_iter": 20, # number of HMM training iterations; more can improve convergence but costs time "n_states": 12, # number of hidden states "normalization": True, # normalize before estimation "num_subj": None, # optional subject subsampling "num_select_nodes": None,# optional ROI subset for speed/memory } measure = HMM_CONT(**params_methods) measure.estimate_FCS(time_series=BOLD_multi) dFC = measure.estimate_dFC(time_series=BOLD_multi.get_subj_ts(subjs_id="sub-0001")) TRs = dFC.TR_array[np.arange(29, 480 - 29, 29)] dFC.visualize_dFC(TRs=TRs, normalize=True, fix_lim=False) ``` #### DHMM (Discrete HMM) Note: the demo notebook warns that 5 subjects is too small to fit DHMM well; a warning is expected. ```python from pydfc.dfc_methods import HMM_DISC params_methods = { "W": 44, # sliding window length (seconds) used to create observations "n_overlap": 0.5, # overlap fraction for sliding windows "sw_method": "pear_corr", # FC estimator per window "tapered_window": True, # taper window edges "clstr_base_measure": "SlidingWindow", # base measure for discretization pipeline "hmm_iter": 20, # HMM training iterations "dhmm_obs_state_ratio": 16 / 24, # ratio controlling observation-state discretization relative to hidden states "n_states": 12, # number of hidden states "n_subj_clstrs": 5, # subject-level clustering granularity "normalization": True, # normalize before estimation "num_subj": None, # optional subject subsampling "num_select_nodes": 50, # ROI subset (demo uses 50 here to reduce cost) } measure = HMM_DISC(**params_methods) measure.estimate_FCS(time_series=BOLD_multi) dFC = measure.estimate_dFC(time_series=BOLD_multi.get_subj_ts(subjs_id="sub-0001")) dFC.visualize_dFC(TRs=dFC.TR_array[:], normalize=True, fix_lim=False) ``` #### WINDOWLESS ```python from pydfc.dfc_methods import WINDOWLESS params_methods = { "n_states": 12, # number of states to estimate "normalization": True, # normalize before estimation "num_subj": None, # optional subject subsampling "num_select_nodes": None,# optional ROI subset for speed/memory } measure = WINDOWLESS(**params_methods) measure.estimate_FCS(time_series=BOLD_multi) dFC = measure.estimate_dFC(time_series=BOLD_multi.get_subj_ts(subjs_id="sub-0001")) TRs = dFC.TR_array[np.arange(29, 480 - 29, 29)] dFC.visualize_dFC(TRs=TRs, normalize=True, fix_lim=False) ``` ## Response Style Rules - Keep replies short and practical. - Prefer one code block at a time (do not dump all methods unless the user asks). - Reuse the exact demo parameters first; optimize later only if requested. - If the user is unsure, recommend `SW` first (state-free, simplest). - Offer a brief method overview before asking them to choose, if they want it. - After each method snippet, ask: `Are there any other methods you are curious about?` - Before ending, ask: `Would you like me to extract all code from this chat into a Jupyter notebook (.ipynb) or a Python script (.py)?` ## Failure Handling (Non-Invasive Only) If the user reports an error: 1. Do not edit repo source files or third-party library source. 2. Ask for the traceback / exact error text. 3. Prefer fixes in this order: - environment check (`python --version`, package versions) - reinstall steps (`pip install -U pydfc`, dependency install) - smaller compute settings (`num_select_nodes`, `num_subj`, `n_jobs`) - simpler method (`SW` before state-based methods) - parameter adjustments 4. If a package-level bug is suspected, explain the workaround in chat and explicitly avoid source edits.