RAG Retrieval - Deep Dive
Ref: https://www.udemy.com/course/ultimate-aws-certified-generative-ai-developer-professional/learn/lecture/53542871
- Pre-Retrieval (before we actually retrieve data, there's important choices to make!)
- Indexing
- ❗ Granularity/chunking very important! You don't want huge nor tiny chunks, both make retrieving accurate info more difficult!
- Data extraction (how we extract data can make a difference!)
- Query Manipulation (e.g. rewriting the query so it retrieves info more efficiently)
- Retrieval
- Post-Retrieval
- Do we want to e.g. re-rank or filter the results?
- Once we have results, do we want to augment them somehow?

Optimizing Retrieval with Metadata
Ref: https://www.udemy.com/course/ultimate-aws-certified-generative-ai-developer-professional/learn/lecture/53542877
- Include
metadata.json into your vector store → additional info than just raw chunks & embeddings
- Examples of metadata to include: Document ID, category, access control, data lineage (where data is from, useful for source citation), additional context
- ❗ Metadata won't be chunked like text, but can be used for better retrieval (hybrid approach)
- e.g. relevance scoring against metadata can be used for ranking
- Example: include creation year as metadata, can filter documents/chunks by year before applying vector search
Chunking
Ref: https://www.udemy.com/course/ultimate-aws-certified-generative-ai-developer-professional/learn/lecture/53542873
- 💡 Tradeoff: Smaller chunks means more precision when searching info, but less context to verify that the info is what we actually need
- Sentence boundaries usually preserved
- Usually full sentences carry full semantic meaning
- Types of Bedrock chunking:
- Fixed size: tokens/chunk + overlap percentage
- Default: 300 tokens/chunk
- No chunking (you have done it yourself)
- Hierarchical: nested parents + children chunks
- Smaller children offer better precision on retrieval
- Can reclaim comprehensiveness & context by substituting child with parent
- Semantic chunking: chunk based on meaning/topics
- Uses FM for chunking → costs $$$
- Parameters: max tokens/chunk, buffer size, breakpoint percentile threshold
- Buffer size: number of surrounding sentences per sentence to consider when embedding
- buffer size = 1 means taking 3 sentences (sentence to embed + one before and one after)
- Higher breakpoint percentile = chunks are more distinguishable, but they're bigger & less of them