Spaces:

dmytrotm
/

rag

Running

App Files Files Community

dmytrotm commited on 2 days ago

Commit

f3d2c7c

1 Parent(s): ffb4f8d

Complete RAG HW requirements: add legal area filtering, refine citations, and update documentation

Browse files

Files changed (7) hide show

.gitignore +2 -1
README.md +47 -40
app.py +43 -8
assets/style.css +20 -2
config.py +15 -14
modules/rag_system.py +4 -3
modules/retriever.py +56 -8

.gitignore CHANGED Viewed

@@ -5,4 +5,5 @@ __pycache__
 *debug*
 *test*
 *verify*
-*example*

 *debug*
 *test*
 *verify*
+*example*
+*check*

README.md CHANGED Viewed

@@ -9,59 +9,66 @@ app_file: app.py
 pinned: false
 ---
-# 🇺🇦 UA Legal RAG System
-A comprehensive Retrieval-Augmented Generation (RAG) system for Ukrainian legislation.
-This system answers questions based on the Criminal Code, Civil Code, and other legal documents, providing precise citations for every answer.
-## Features
-- **Hybrid Search**: Combines BM25 (keyword) and SBERT (semantic) search for high recall.
-- **Reranking**: Uses `cross-encoder/ms-marco-TinyBERT-L-2-v2` to re-rank top results for high precision.
-- **LLM**: Powered by **Llama 3.3 70B** (via Groq) for accurate, Ukrainian-language answers.
-- **Citations**: Automatically cites legal sources (e.g., `[1] ККУ, ст. 187`).
-## Installation
-1. **Install Python 3.9+** and dependencies:
    ```bash
    pip install -r requirements.txt
    ```
-2. **Data Preparation**:
-   Ensure `parsed_chunks.json` and `embeddings.pt` are in the `data/` directory.
-   If not, run the parser first to generate them:
-   ```bash
-   python scripts/parser.py
    ```
-## Running the App
-```bash
-python app.py
-```
-The interface will be available at `http://localhost:7860`.
-## Configuration
-- Open `config.py` to change models or default parameters.
-- You will need a **Groq API Key** to generate answers. Get one at [console.groq.com](https://console.groq.com).
-## Project Structure
 ```
-project/
-├── app.py                 # Gradio UI entry point
-├── config.py              # Configuration
-├── requirements.txt       # Dependencies
-├── data/                  # Data storage (chunks, embeddings)
 ├── modules/
-│   ├── rag_system.py      # Orchestrator
-│   ├── retriever.py       # Hybrid Search
-│   ├── reranker.py        # Cross-Encoder
-│   └── llm_handler.py     # LiteLLM/Groq Integration
 └── scripts/
-    └── parser.py          # Data ingestion and preprocessing
 ```
-## Example Queries
-1. **Keyword-specific**: `стаття 187 ч 1` (Will cite Art. 187 directly)
-2. **Concept-based**: `Як звільнити працівника за прогул` (Will find relevant Labor Code articles)
-3. **Complex**: `Яка різниця між крадіжкою і грабежем?`

 pinned: false
 ---
+# 🇺🇦 Асистент із Законодавства України (RAG QA)
+Ця система дозволяє користувачам отримувати відповіді на юридичні запитання, базуючись на актуальних кодексах та законах України за допомогою підходу **Retrieval-Augmented Generation (RAG)**.
+## Основні функції
+- **Гібридний пошук**: Поєднання BM25 (ключові слова) та SBERT (семантика) для максимального охоплення.
+- **Reranking**: Використання Cross-Encoder моделі для високої точності.
+- **Цитування**: Автоматичне посилання на статті кодексів у тексті відповіді.
+- **Фільтрація за метаданими**: Можливість звузити пошук до конкретної галузі права (Кримінальне, Цивільне тощо).
+- **Інтелектуальні відповіді**: Використання **Llama 3.3 70B** для генерації відповідей українською мовою.
+## Технічна архітектура
+- **Retriever**:
+    - **BM25**: Лематизація (pymorphy3) та спелчекінг (SymSpell).
+    - **Semantic Search**: Модель `sentence-transformers/paraphrase-multilingual-mpnet-base-v2`.
+- **Reranker**: Модель `cross-encoder/ms-marco-TinyBERT-L-2-v2`.
+- **LLM**: `llama-3.3-70b-versatile` через API Groq (LiteLLM).
+- **Metadata**: Реалізовано пере-фільтрацію за полем `legal_area`.
+## Порівняння методів пошуку
+| Запит | Кращий метод | Чому саме він? |
+| :--- | :--- | :--- |
+| "Стаття 115 ККУ" | **BM25** | Точний збіг за номером статті та назвою кодексу. |
+| "права батьків після розлучення" | **Semantic Search** | Розуміє концепцію "сімейних прав", навіть якщо ці слова не зустрічаються буквально. |
+## Інсталяція та запуск
+1. **Встановлення залежностей**:
    ```bash
    pip install -r requirements.txt
    ```
+2. **Налаштування середовища**:
+   Створіть файл `.env` та додайте ваш ключ:
+   ```env
+   GROQ_API_KEY=gsk_...
    ```
+3. **Запуск**:
+   ```bash
+   python app.py
+   ```
+   Інтерфейс буде доступний за адресою `http://localhost:7860`.
+## Структура проєкту
 ```
+├── app.py                 # Точка входу Gradio UI
+├── config.py              # Конфігурація та системні промпти
+├── requirements.txt       # Залежності
+├── data/                  # Дані (парсений JSON, ембедінги)
 ├── modules/
+│   ├── rag_system.py      # Оркестратор пайплайну
+│   ├── retriever.py       # Гібридний пошук з фільтрацією
+│   ├── reranker.py        # Cross-Encoder ранжування
+│   └── llm_handler.py     # Інтеграція з LLM (LiteLLM)
 └── scripts/
+    └── parser.py          # Парсер та передобробка документів
 ```
+---
+**Виконав**: Dmytro

app.py CHANGED Viewed

@@ -34,7 +34,7 @@ def format_sources(sources):
     html += "</div>"
     return html
-def run_chat(query, api_key, search_method, use_reranker, top_k, temperature):
     if not query.strip():
         return "Будь ласка, введіть запитання.", ""
@@ -45,7 +45,21 @@ def run_chat(query, api_key, search_method, use_reranker, top_k, temperature):
         "🧠 Семантичний": "semantic"
     }
     internal_method = method_map.get(search_method, "hybrid")
     try:
         answer, sources = rag_system.process_query(
@@ -54,7 +68,8 @@ def run_chat(query, api_key, search_method, use_reranker, top_k, temperature):
             use_reranker=use_reranker,
             top_k_rerank=int(top_k),
             temperature=float(temperature),
-            search_method=internal_method
         )
         sources_html = format_sources(sources)
@@ -63,7 +78,11 @@ def run_chat(query, api_key, search_method, use_reranker, top_k, temperature):
         return f"Помилка при обробці запиту: {str(e)}", ""
 # --- Gradio UI Construction ---
-with gr.Blocks(title="Асистент із Законодавства") as demo:
     # Header
     with gr.Row(elem_classes="header-container"):
         with gr.Column(scale=0, min_width=80):
@@ -97,8 +116,25 @@ with gr.Blocks(title="Асистент із Законодавства") as demo
                 value="🔄 Гібридний (Рекомендовано)"
             )
             with gr.Accordion("🛠️ Розширені параметри", open=False):
-                use_reranker = gr.Checkbox(label="Використовувати Reranker", value=True)
                 top_k = gr.Slider(label="Кількість джерел", minimum=1, maximum=20, step=1, value=config.DEFAULT_TOP_K_RERANK)
                 temperature = gr.Slider(label="Температура генерації", minimum=0.0, maximum=1.0, step=0.1, value=0.5)
@@ -129,17 +165,16 @@ with gr.Blocks(title="Асистент із Законодавства") as demo
                 btn = gr.Button(q, elem_classes="example-btn")
                 btn.click(lambda x=q: x, outputs=[query_input]).then(
                     fn=run_chat,
-                    inputs=[query_input, api_key_input, search_method, use_reranker, top_k, temperature],
                     outputs=[output_answer, output_sources]
                 )
     # --- Interactions ---
     submit_btn.click(
         fn=run_chat,
-        inputs=[query_input, api_key_input, search_method, use_reranker, top_k, temperature],
         outputs=[output_answer, output_sources]
     )
 if __name__ == "__main__":
-    css_path = Path("assets/style.css")
-    demo.launch(server_name="0.0.0.0", server_port=7860, css=css_path)

     html += "</div>"
     return html
+def run_chat(query, api_key, search_method, use_reranker, legal_area, top_k, temperature):
     if not query.strip():
         return "Будь ласка, введіть запитання.", ""
         "🧠 Семантичний": "semantic"
     }
+    # Mapping Ukrainian legal area names to internal keys
+    area_map = {
+        "Всі": "Всі",
+        "Сімейне право": "сімейне_право",
+        "Трудове право": "трудове_право",
+        "Земельне право": "земельне_право",
+        "Цивільне право": "цивільне_право",
+        "Податкове право": "податкове_право",
+        "Кримінальне право": "кримінальне_право",
+        "Конституційне право": "конституційне_право",
+        "Адміністративне судочинство": "адміністративне_судочинство"
+    }
     internal_method = method_map.get(search_method, "hybrid")
+    internal_area = area_map.get(legal_area, "Всі")
     try:
         answer, sources = rag_system.process_query(
             use_reranker=use_reranker,
             top_k_rerank=int(top_k),
             temperature=float(temperature),
+            search_method=internal_method,
+            legal_area=internal_area
         )
         sources_html = format_sources(sources)
         return f"Помилка при обробці запиту: {str(e)}", ""
 # --- Gradio UI Construction ---
+css_path = Path("assets/style.css")
+with open(css_path, "r", encoding="utf-8") as f:
+    custom_css = f.read()
+with gr.Blocks(title="Асистент із Законодавства", css=custom_css, theme=gr.themes.Soft()) as demo:
     # Header
     with gr.Row(elem_classes="header-container"):
         with gr.Column(scale=0, min_width=80):
                 value="🔄 Гібридний (Рекомендовано)"
             )
+            use_reranker = gr.Checkbox(label="Використовувати Reranker", value=True)
+            legal_area = gr.Dropdown(
+                label="Галузь права (Фільтр)",
+                choices=[
+                    "Всі",
+                    "Сімейне право",
+                    "Трудове право",
+                    "Земельне право",
+                    "Цивільне право",
+                    "Податкове право",
+                    "Кримінальне право",
+                    "Конституційне право",
+                    "Адміністративне судочинство"
+                ],
+                value="Всі"
+            )
             with gr.Accordion("🛠️ Розширені параметри", open=False):
                 top_k = gr.Slider(label="Кількість джерел", minimum=1, maximum=20, step=1, value=config.DEFAULT_TOP_K_RERANK)
                 temperature = gr.Slider(label="Температура генерації", minimum=0.0, maximum=1.0, step=0.1, value=0.5)
                 btn = gr.Button(q, elem_classes="example-btn")
                 btn.click(lambda x=q: x, outputs=[query_input]).then(
                     fn=run_chat,
+                    inputs=[query_input, api_key_input, search_method, use_reranker, legal_area, top_k, temperature],
                     outputs=[output_answer, output_sources]
                 )
     # --- Interactions ---
     submit_btn.click(
         fn=run_chat,
+        inputs=[query_input, api_key_input, search_method, use_reranker, legal_area, top_k, temperature],
         outputs=[output_answer, output_sources]
     )
 if __name__ == "__main__":
+    demo.launch(server_name="0.0.0.0", server_port=7860)

assets/style.css CHANGED Viewed

@@ -69,18 +69,36 @@ body, .gradio-container {
 }
 /* Input Styling */
-.gr-textbox textarea, .gr-textbox input {
     background-color: rgba(255, 255, 255, 0.05) !important;
     border: 1px solid rgba(255, 255, 255, 0.1) !important;
     color: white !important;
     border-radius: 10px !important;
 }
-.gr-textbox textarea:focus, .gr-textbox input:focus {
     border-color: var(--blue) !important;
     box-shadow: 0 0 0 2px rgba(62, 139, 247, 0.2) !important;
 }
 /* Examples Section */
 .example-btn {
     background: rgba(255, 255, 255, 0.05) !important;

 }
 /* Input Styling */
+.gr-textbox textarea, .gr-textbox input,
+input[type="text"], input[type="password"] {
     background-color: rgba(255, 255, 255, 0.05) !important;
     border: 1px solid rgba(255, 255, 255, 0.1) !important;
     color: white !important;
     border-radius: 10px !important;
 }
+.gr-textbox textarea:focus, .gr-textbox input:focus,
+input[type="text"]:focus, input[type="password"]:focus {
+    background-color: rgba(255, 255, 255, 0.08) !important;
     border-color: var(--blue) !important;
     box-shadow: 0 0 0 2px rgba(62, 139, 247, 0.2) !important;
 }
+/* Autofill styling fix */
+input:-webkit-autofill,
+input:-webkit-autofill:hover,
+input:-webkit-autofill:focus,
+input:-webkit-autofill:active {
+    -webkit-box-shadow: 0 0 0 30px var(--bg-secondary) inset !important;
+    -webkit-text-fill-color: white !important;
+    transition: background-color 5000s ease-in-out 0s;
+}
+/* Ensure password and text inputs are readable always */
+input[type="password"], input[type="text"], .gr-textbox input {
+    color: white !important;
+}
 /* Examples Section */
 .example-btn {
     background: rgba(255, 255, 255, 0.05) !important;

config.py CHANGED Viewed

@@ -30,21 +30,22 @@ HYBRID_ALPHA = 0.3     # Semantic weight (higher = more semantic focus)
 MIN_BM25_SCORE = 0.05   # Lower threshold to let good semantic hits through
 # System Prompts
-SYSTEM_PROMPT = """You are a professional legal assistant specializing in Ukrainian legislation. Your task is to provide the most useful answer based on the provided document fragments (Context).
-MANDATORY RULES:
-1. ALWAYS thoroughly analyze all provided context and extract relevant information.
-2. If the context contains a direct answer — provide it clearly and in a structured format.
-3. If the context contains partial or indirect information — explain what exactly is available in the documents, even if it's not a complete answer.
-4. ALWAYS cite sources: insert reference numbers [1], [2], [3] at the end of each statement.
-5. ONLY if there is truly NO relevant information in the context, then say: "На жаль, у наданих документах немає інформації для відповіді на це запитання."
-6. ALWAYS respond in Ukrainian language, clearly and in a structured manner.
-7. NEVER fabricate articles or facts that are not present in the context.
-8. When citing legal provisions, include the exact article numbers and relevant excerpts from the context.
-9. Structure your response with clear sections when appropriate (e.g., main answer, related provisions, important notes).
-10. If the context contains conflicting information from different documents, acknowledge this and present both perspectives with their sources.
 Context: {context}
-Remember: Respond ONLY in Ukrainian, be precise, cite sources for every claim, and base your answer strictly on the provided context.
 """

 MIN_BM25_SCORE = 0.05   # Lower threshold to let good semantic hits through
 # System Prompts
+SYSTEM_PROMPT = """Ви — професійний юридичний асистент, що спеціалізується на законодавстві України. Ваше завдання — надати максимально корисну відповідь на основі наданих фрагментів документів (Контекст).
+ОБОВ'ЯЗКОВІ ПРАВИЛА:
+1. ЗАВЖДИ ретельно аналізуйте весь наданий контекст.
+2. Якщо в контексті є пряма відповідь — надайте її чітко та структуровано.
+3. Якщо інформація часткова — поясніть, що саме відомо з документів.
+4. ЗАВЖДИ вказуйте джерела: вставляйте номери посилань у квадратних дужках [1], [2], [3] безпосередньо після тверджень, які вони підтверджують.
+5. Якщо в контексті НЕМАЄ інформації, скажіть: "На жаль, у наданих документах немає інформації для відповіді на це запитання."
+6. Відповідайте ТІЛЬКИ українською мовою.
+7. НІКОЛИ не вигадуйте статті або факти, яких немає в контексті.
+8. При цитуванні норм вказуйте номери статей та назви кодексів/законів.
+9. Використовуйте списки та заголовки для кращої структури.
+Приклад цитування: Згідно зі статтею 115 ККУ, вбивство — це умисне протиправне заподіяння смерті іншій людині [1]. За це передбачено покарання у вигляді позбавлення волі на строк від семи до п'ятнадцяти років [2].
 Context: {context}
+Пам'ятайте: відповідайте ТІЛЬКИ українською, будьте точними, цитуйте джерела для кожного твердження.
 """

modules/rag_system.py CHANGED Viewed

@@ -27,7 +27,8 @@ class RAGSystem:
                       top_k_retrieval: int = config.DEFAULT_TOP_K_RETRIEVAL,
                       top_k_rerank: int = config.DEFAULT_TOP_K_RERANK,
                       temperature: float = config.DEFAULT_TEMPERATURE,
-                      search_method: str = 'hybrid'
                       ) -> Tuple[str, List[Dict]]:
         """
         Main RAG pipeline:
@@ -43,8 +44,8 @@ class RAGSystem:
             return "Будь ласка, введіть API ключ (Groq) для продовження.", []
         # 1. Retrieval
-        print(f"Retrieving for: {query} (method: {search_method})")
-        retrieved_chunks = self.retriever.search(query, top_k=top_k_retrieval, method=search_method)
         # 2. Reranking
         if use_reranker and retrieved_chunks:

                       top_k_retrieval: int = config.DEFAULT_TOP_K_RETRIEVAL,
                       top_k_rerank: int = config.DEFAULT_TOP_K_RERANK,
                       temperature: float = config.DEFAULT_TEMPERATURE,
+                      search_method: str = 'hybrid',
+                      legal_area: str = None
                       ) -> Tuple[str, List[Dict]]:
         """
         Main RAG pipeline:
             return "Будь ласка, введіть API ключ (Groq) для продовження.", []
         # 1. Retrieval
+        print(f"Retrieving for: {query} (method: {search_method}, legal_area: {legal_area})")
+        retrieved_chunks = self.retriever.search(query, top_k=top_k_retrieval, method=search_method, legal_area=legal_area)
         # 2. Reranking
         if use_reranker and retrieved_chunks:

modules/retriever.py CHANGED Viewed

@@ -285,7 +285,8 @@ class Retriever:
         query: str,
         top_k: int = 30,
         method: str = 'hybrid',
-        alpha: float = None  # Uses config.HYBRID_ALPHA if None
     ) -> List[Dict]:
         """
         Search for relevant chunks using specified method.
@@ -295,6 +296,7 @@ class Retriever:
             top_k: Number of results to return
             method: 'bm25', 'semantic', or 'hybrid'
             alpha: Weight for hybrid search (semantic weight, 0-1)
         Returns:
             List of result dictionaries with chunk and score
@@ -302,6 +304,16 @@ class Retriever:
         # Ensure top_k is an integer (handle Gradio sliders passing floats)
         top_k = int(top_k)
         method_map = {
             'bm25': self._search_bm25,
             'semantic': self._search_semantic,
@@ -314,27 +326,50 @@ class Retriever:
         search_func = method_map[method]
         if method == 'hybrid':
             effective_alpha = alpha if alpha is not None else config.HYBRID_ALPHA
-            return search_func(query, top_k, effective_alpha)
-        return search_func(query, top_k)
-    def _search_bm25(self, query: str, top_k: int) -> List[Dict]:
         """Keyword-based BM25 search with spell correction."""
         tokenized_query = self._tokenize_and_lemmatize(query)
         scores = self.bm25.get_scores(tokenized_query)
         top_indices = np.argsort(scores)[::-1][:top_k]
         return [
             {
                 'chunk': self.chunks[idx],
                 'score': float(scores[idx]),
                 'method': 'bm25'
             }
-            for idx in top_indices
         ]
-    def _search_semantic(self, query: str, top_k: int) -> List[Dict]:
         """Semantic similarity search using embeddings."""
         query_embedding = self.model.encode(query, convert_to_tensor=True)
         hits = util.semantic_search(query_embedding, self.embeddings, top_k=top_k)[0]
         return [
@@ -346,7 +381,7 @@ class Retriever:
             for hit in hits
         ]
-    def _search_hybrid(self, query: str, top_k: int, alpha: float) -> List[Dict]:
         """
         Hybrid search combining BM25 and semantic similarity.
@@ -360,6 +395,13 @@ class Retriever:
         query_embedding = self.model.encode(query, convert_to_tensor=True)
         semantic_scores = util.cos_sim(query_embedding, self.embeddings)[0].cpu().numpy()
         # Normalize scores to [0, 1]
         bm25_norm = self._min_max_normalize(bm25_scores)
         semantic_norm = self._min_max_normalize(semantic_scores)
@@ -367,6 +409,12 @@ class Retriever:
         # Combine with weighted sum
         combined_scores = alpha * semantic_norm + (1 - alpha) * bm25_norm
         # Apply BM25 threshold: penalize chunks with no keyword overlap
         bm25_max = np.max(bm25_scores) if np.max(bm25_scores) > 0 else 1.0
         bm25_relative = bm25_scores / bm25_max
@@ -384,7 +432,7 @@ class Retriever:
                 'semantic_score': float(semantic_scores[idx]),
                 'method': 'hybrid'
             }
-            for idx in top_indices
         ]
     @staticmethod

         query: str,
         top_k: int = 30,
         method: str = 'hybrid',
+        alpha: float = None,  # Uses config.HYBRID_ALPHA if None
+        legal_area: str = None
     ) -> List[Dict]:
         """
         Search for relevant chunks using specified method.
             top_k: Number of results to return
             method: 'bm25', 'semantic', or 'hybrid'
             alpha: Weight for hybrid search (semantic weight, 0-1)
+            legal_area: Optional filter by metadata['legal_area']
         Returns:
             List of result dictionaries with chunk and score
         # Ensure top_k is an integer (handle Gradio sliders passing floats)
         top_k = int(top_k)
+        # Pre-filter chunks by legal_area if provided
+        filtered_indices = None
+        if legal_area and legal_area != "Всі":
+            filtered_indices = [
+                i for i, chunk in enumerate(self.chunks)
+                if chunk.get('metadata', {}).get('legal_area') == legal_area
+            ]
+            if not filtered_indices:
+                return []
         method_map = {
             'bm25': self._search_bm25,
             'semantic': self._search_semantic,
         search_func = method_map[method]
         if method == 'hybrid':
             effective_alpha = alpha if alpha is not None else config.HYBRID_ALPHA
+            return search_func(query, top_k, effective_alpha, filtered_indices=filtered_indices)
+        return search_func(query, top_k, filtered_indices=filtered_indices)
+    def _search_bm25(self, query: str, top_k: int, filtered_indices: List[int] = None) -> List[Dict]:
         """Keyword-based BM25 search with spell correction."""
         tokenized_query = self._tokenize_and_lemmatize(query)
         scores = self.bm25.get_scores(tokenized_query)
+        if filtered_indices is not None:
+            # Mask scores for non-filtered chunks
+            mask = np.zeros(len(scores), dtype=bool)
+            mask[filtered_indices] = True
+            scores[~mask] = -1e9
         top_indices = np.argsort(scores)[::-1][:top_k]
+        # Filter out masked scores from results
         return [
             {
                 'chunk': self.chunks[idx],
                 'score': float(scores[idx]),
                 'method': 'bm25'
             }
+            for idx in top_indices if scores[idx] > -1e8
         ]
+    def _search_semantic(self, query: str, top_k: int, filtered_indices: List[int] = None) -> List[Dict]:
         """Semantic similarity search using embeddings."""
         query_embedding = self.model.encode(query, convert_to_tensor=True)
+        if filtered_indices is not None:
+            # Filter embeddings before searching
+            filtered_embeddings = self.embeddings[filtered_indices]
+            hits = util.semantic_search(query_embedding, filtered_embeddings, top_k=top_k)[0]
+            return [
+                {
+                    'chunk': self.chunks[filtered_indices[hit['corpus_id']]],
+                    'score': float(hit['score']),
+                    'method': 'semantic'
+                }
+                for hit in hits
+            ]
         hits = util.semantic_search(query_embedding, self.embeddings, top_k=top_k)[0]
         return [
             for hit in hits
         ]
+    def _search_hybrid(self, query: str, top_k: int, alpha: float, filtered_indices: List[int] = None) -> List[Dict]:
         """
         Hybrid search combining BM25 and semantic similarity.
         query_embedding = self.model.encode(query, convert_to_tensor=True)
         semantic_scores = util.cos_sim(query_embedding, self.embeddings)[0].cpu().numpy()
+        # Mask scores if filtered_indices is provided
+        if filtered_indices is not None:
+            mask = np.zeros(len(self.chunks), dtype=bool)
+            mask[filtered_indices] = True
+            bm25_scores[~mask] = 0.0 # BM25 min is typically 0
+            semantic_scores[~mask] = -1.0 # Cosine min is -1
         # Normalize scores to [0, 1]
         bm25_norm = self._min_max_normalize(bm25_scores)
         semantic_norm = self._min_max_normalize(semantic_scores)
         # Combine with weighted sum
         combined_scores = alpha * semantic_norm + (1 - alpha) * bm25_norm
+        # Re-apply mask after normalization just in case
+        if filtered_indices is not None:
+            mask = np.zeros(len(self.chunks), dtype=bool)
+            mask[filtered_indices] = True
+            combined_scores[~mask] = -1.0
         # Apply BM25 threshold: penalize chunks with no keyword overlap
         bm25_max = np.max(bm25_scores) if np.max(bm25_scores) > 0 else 1.0
         bm25_relative = bm25_scores / bm25_max
                 'semantic_score': float(semantic_scores[idx]),
                 'method': 'hybrid'
             }
+            for idx in top_indices if combined_scores[idx] > -0.9
         ]
     @staticmethod