Best Practices¶

Guidelines, caveats, and common pitfalls for developing AutoRAG-Research plugins.

Security¶

Security Note

Plugin discovery calls ep.load() which executes code from installed packages. Only install plugins from trusted sources. Review plugin code before installation.

plugin sync loads plugin modules via entry_points() + ep.load() -- this runs arbitrary code from the installed package.
Only install plugins from trusted, reviewed sources.
Plugin names are validated against ^[a-z][a-z0-9_]*$ to prevent path traversal and injection.

Plugin Naming¶

Plugin names must start with a lowercase letter and contain only lowercase letters, digits, and underscores.

Regex: ^[a-z][a-z0-9_]*$

Name	Valid	Reason
`my_search`	Yes
`es_retrieval`	Yes
`custom_bm25`	Yes
`MySearch`	No	Uppercase letters
`123plugin`	No	Starts with digit
`my-search`	No	Hyphens not allowed
`_private`	No	Starts with underscore

Package Layout¶

Use a nested layout with subcategory directories for YAML configs:

src/my_plugin/
├── __init__.py
├── pipeline.py          # or metric.py
└── retrieval/           # subcategory directory
    └── my_search.yaml

For ingestor plugins, no YAML config directory is needed:

src/my_dataset_plugin/
├── __init__.py
└── ingestor.py          # @register_ingestor decorated class

The subcategory directory determines where configs are synced (pipelines/metrics only):

retrieval/ syncs to configs/pipelines/retrieval/ or configs/metrics/retrieval/
generation/ syncs to configs/pipelines/generation/ or configs/metrics/generation/

Place the YAML file in the correct subcategory directory or it will not be discovered.

Config Sync Behavior¶

plugin sync never overwrites existing files. To re-sync a config, delete the existing file first.
Configs are copied, not symlinked. Editing the local copy does not affect the plugin source.
Install a plugin first, then run plugin sync. Order matters -- discovery requires the package to be installed.

Testing Guidelines¶

Use MagicMock() for LLM and session_factory in unit tests.
Test config instantiation and abstract method implementations separately.
Use pytest markers: @pytest.mark.api for tests needing real LLM calls.
The scaffold includes a basic test file. Extend it with integration tests.

from unittest.mock import MagicMock

import pytest


def test_pipeline_config():
    """Test config can be created and returns correct class."""
    config = MySearchPipelineConfig(name="test")
    assert config.get_pipeline_class() is MySearchPipeline
    assert "index_path" in config.get_pipeline_kwargs()


@pytest.mark.api
def test_pipeline_integration(db_session):
    """Integration test with real database (requires Docker)."""
    # Use db_session fixture from conftest.py
    pass

For ingestor plugins, use FakeEmbeddings from langchain_core:

from langchain_core.embeddings import FakeEmbeddings


def test_ingestor_instantiation():
    """Test ingestor can be created with fake embeddings."""
    embeddings = FakeEmbeddings(size=128)
    ingestor = MyDatasetIngestor(
        embedding_model=embeddings,
        dataset_name="dataset_a",
    )
    assert ingestor.dataset_name == "dataset_a"

Development Workflow¶

Scaffold -- autorag-research plugin create NAME --type=TYPE
Implement -- edit pipeline.py or metric.py with your logic
Configure -- edit the YAML config to set parameters
Test -- pytest tests/ to run plugin tests
Install -- pip install -e . to install in dev mode
Sync -- autorag-research plugin sync to copy configs into the project
Integrate -- add the plugin name to your experiment config
Run -- autorag-research run --config-name=experiment

Common Pitfalls¶

Pitfall	Solution
Forgot `pip install -e .`	Plugin won't be discovered. Install before sync.
Config not appearing after sync	Check entry_points in `pyproject.toml`. Run `pip show your-plugin` to verify installation.
Existing config not updated	`plugin sync` never overwrites. Delete the old file and re-sync.
`_target_` path wrong	Must be fully-qualified: `package.module.ClassName`
LLM string not loading	Ensure the LLM provider package is installed (e.g., `langchain-openai`).
`get_pipeline_kwargs()` missing custom params	Only extra kwargs beyond `session_factory`, `name`, `schema` need to be returned.