Skip to content

RagaliQ Documentation

RagaliQ is an open-source LLM & RAG evaluation testing framework for Python. It provides automated hallucination detection, faithfulness metrics, answer relevance scoring, context precision, and context recall evaluation — powered by an LLM-as-Judge architecture.

Getting Started

  • Tutorial — Full walkthrough from installation to CI/CD integration
  • Examples — Runnable scripts and pytest examples
  • API Reference — Complete README with all features

Core Concepts

Evaluators

RagaliQ ships with five built-in evaluators for comprehensive RAG pipeline testing:

  • Faithfulness — Verifies that responses are grounded only in provided context
  • Relevance — Checks whether the response actually answers the user's query
  • Hallucination — Detects claims not supported by retrieved documents
  • Context Precision — Measures retrieval quality from your vector database
  • Context Recall — Validates that retrieved context covers all expected facts

LLM-as-Judge

RagaliQ uses Claude or OpenAI as a semantic judge to evaluate response quality. This approach captures nuanced errors that keyword-matching and embedding similarity approaches miss.

Pytest Integration

RagaliQ integrates natively with pytest — RAG quality tests run alongside your existing unit tests with familiar fixtures and markers.

Installation

pip install ragaliq