RagaliQ Documentation¶

RagaliQ is an open-source LLM & RAG evaluation testing framework for Python. It provides automated hallucination detection, faithfulness metrics, answer relevance scoring, context precision, and context recall evaluation — powered by an LLM-as-Judge architecture.

Getting Started¶

Tutorial — Full walkthrough from installation to CI/CD integration
Examples — Runnable scripts and pytest examples
API Reference — Complete README with all features

Core Concepts¶

Evaluators¶

RagaliQ ships with five built-in evaluators for comprehensive RAG pipeline testing:

Faithfulness — Verifies that responses are grounded only in provided context
Relevance — Checks whether the response actually answers the user's query
Hallucination — Detects claims not supported by retrieved documents
Context Precision — Measures retrieval quality from your vector database
Context Recall — Validates that retrieved context covers all expected facts

LLM-as-Judge¶

RagaliQ uses Claude or OpenAI as a semantic judge to evaluate response quality. This approach captures nuanced errors that keyword-matching and embedding similarity approaches miss.

Pytest Integration¶

RagaliQ integrates natively with pytest — RAG quality tests run alongside your existing unit tests with familiar fixtures and markers.

Installation¶

pip install ragaliq