RagaliQ Documentation¶
RagaliQ is an open-source LLM & RAG evaluation testing framework for Python. It provides automated hallucination detection, faithfulness metrics, answer relevance scoring, context precision, and context recall evaluation — powered by an LLM-as-Judge architecture.
Getting Started¶
- Tutorial — Full walkthrough from installation to CI/CD integration
- Examples — Runnable scripts and pytest examples
- API Reference — Complete README with all features
Core Concepts¶
Evaluators¶
RagaliQ ships with five built-in evaluators for comprehensive RAG pipeline testing:
- Faithfulness — Verifies that responses are grounded only in provided context
- Relevance — Checks whether the response actually answers the user's query
- Hallucination — Detects claims not supported by retrieved documents
- Context Precision — Measures retrieval quality from your vector database
- Context Recall — Validates that retrieved context covers all expected facts
LLM-as-Judge¶
RagaliQ uses Claude or OpenAI as a semantic judge to evaluate response quality. This approach captures nuanced errors that keyword-matching and embedding similarity approaches miss.
Pytest Integration¶
RagaliQ integrates natively with pytest — RAG quality tests run alongside your existing unit tests with familiar fixtures and markers.