RAG Retrieval for Query ReWrite and Reranker

Query ReWriting

Posted by LuochuanAD on February 14, 2026 本文总阅读量

Background

In designing the RAG system, after implementing Structural Chunks and Structural Prompt, the retrieval results achieved the desired accuracy. However, to further improve accuracy, dual retrieval with Query Rewrite and Reranker is necessary.

Query Rewrite

Approach 1: Synonym Expansion

For example:

User query: python

Rewrite as: 
python language
python development
python version

Evaluation: Fast, low cost, limited coverage

The most common method is to use an LLM to generate 3 sub-queries, then merge the results retrieved from 3 searches.

Note: For clarity, prompts are shown in Chinese here, but English is preferred.

prompt = “Rewrite the user query into multiple search queries and return 3 different query results“

Process:

User Query
    ↓
LLM Rewrite
    ↓
Query 1
Query 2
Query 3
    ↓
Vector Search
    ↓
Merge results

Approach 3: HyDE (Use With Caution)

Use the LLM’s knowledge to generate an expected answer from the user query, then perform semantic search in the vector DB using that answer.

Process:

User Query
    ↓
LLM generates expected answer
    ↓
Embedding
    ↓
Vector Search

Note: Because the knowledge in the LLM and the RAG knowledge base can differ significantly and LLM knowledge is limited by its update period, the expected answer may be incorrect or outdated. Use HyDE cautiously.

What if the RAG knowledge base is in English and Japanese, but the user query is in Chinese?

For example:

User query: 输出日志的保留时间是多少?

First translate to English and Japanese:

What is the retention period for the output logs?

出力ログの保存期間はどれくらいですか?

Then search the vector DB with all three queries.

Process:

User Query
    ↓
LLM Translation
    ↓
Query 1 (Chinese)
Query 2 (English)
Query 3 (Japanese)
    ↓
Vector Search
    ↓
Merge results

Reranker Re-ranking

  • “Vector Search” only performs preliminary retrieval to ensure high recall, typically with TopK of 20-50.
  • Reranker refines ranking to ensure high precision, typically with TopK of 3-5.

Approach 1: LLM-based Reranker

Use an LLM to score or select the top documents from preliminary retrieval.

prompt = “
	Given the query and the following documents, rank the documents by relevance.
	Query: {query}
	Documents:
	{chunk_1}
	{chunk_2}
	{chunk_3}
	...
	Return the IDs of the most relevant documents.

”

Evaluation: High token consumption, slow

Feed the retrieved Chunk + user query directly into the model to get a relevance score.

English:
ms-marco-MiniLM-L-6-v2
ms-marco-MiniLM-L-12-v2
bge-reranker-base

Japanese:
japanese-reranker-cross-encoder-xsmall-v1
japanese-reranker-cross-encoder-base-v1
japanese-reranker-cross-encoder-large-v1

Chinese:
bge-reranker-large
bge-reranker-base
m3e-reranker

Cross-lingual Cross-Encoder models:

BAAI/bge-reranker-v2-m3
Alibaba-NLP/gte-multilingual-reranker-base
cross-encoder/ms-marco-MiniLM-L6-v2

Evaluation: High accuracy