Advanced RAG | Notion

RAG Retrieval - Deep Dive

Ref: https://www.udemy.com/course/ultimate-aws-certified-generative-ai-developer-professional/learn/lecture/53542871

Pre-Retrieval (before we actually retrieve data, there's important choices to make!)
- Indexing
  - ❗ Granularity/chunking very important! You don't want huge nor tiny chunks, both make retrieving accurate info more difficult!
  - Data extraction (how we extract data can make a difference!)
- Query Manipulation (e.g. rewriting the query so it retrieves info more efficiently)
Retrieval
Post-Retrieval
- Do we want to e.g. re-rank or filter the results?
- Once we have results, do we want to augment them somehow?

Ref: https://www.udemy.com/course/ultimate-aws-certified-generative-ai-developer-professional/learn/lecture/53542877

Include metadata.json into your vector store → additional info than just raw chunks & embeddings
- Examples of metadata to include: Document ID, category, access control, data lineage (where data is from, useful for source citation), additional context
❗ Metadata won't be chunked like text, but can be used for better retrieval (hybrid approach)
- e.g. relevance scoring against metadata can be used for ranking
Example: include creation year as metadata, can filter documents/chunks by year before applying vector search

Ref: https://www.udemy.com/course/ultimate-aws-certified-generative-ai-developer-professional/learn/lecture/53542873

💡 Tradeoff: Smaller chunks means more precision when searching info, but less context to verify that the info is what we actually need
Sentence boundaries usually preserved
- Usually full sentences carry full semantic meaning
Types of Bedrock chunking:
1. Fixed size: tokens/chunk + overlap percentage
  - Default: 300 tokens/chunk
2. No chunking (you have done it yourself)
3. Hierarchical: nested parents + children chunks
  - Smaller children offer better precision on retrieval
  - Can reclaim comprehensiveness & context by substituting child with parent
4. Semantic chunking: chunk based on meaning/topics
  - Uses FM for chunking → costs $$$
  - Parameters: max tokens/chunk, buffer size, breakpoint percentile threshold
    - Buffer size: number of surrounding sentences per sentence to consider when embedding
      - buffer size = 1 means taking 3 sentences (sentence to embed + one before and one after)
    - Higher breakpoint percentile = chunks are more distinguishable, but they're bigger & less of them