biswanath2.roul

Initial commit

e4d5155 6 months ago

7.52 kB

	# efficient-context Documentation

	## Overview

	`efficient-context` is a Python library designed to optimize the handling of context for Large Language Models (LLMs) in CPU-constrained environments. It addresses the challenges of using LLMs with limited computational resources by providing efficient context management strategies.

	## Key Features

	1. Context Compression: Reduce memory requirements while preserving information quality
	2. Semantic Chunking: Go beyond token-based approaches for more effective context management
	3. Retrieval Optimization: Minimize context size through intelligent retrieval strategies
	4. Memory Management: Handle large contexts on limited hardware resources

	## Installation

	```bash
	pip install efficient-context
	```

	## Core Components

	### ContextManager

	The central class that orchestrates all components of the library.

	```python
	from efficient_context import ContextManager

	# Initialize with default settings
	context_manager = ContextManager()

	# Add documents
	context_manager.add_document("This is a sample document about renewable energy...")
	context_manager.add_documents([doc1, doc2, doc3]) # Add multiple documents

	# Generate context for a query
	optimized_context = context_manager.generate_context(query="Tell me about renewable energy")
	```

	### Context Compression

	The compression module reduces the size of content while preserving key information.

	```python
	from efficient_context.compression import SemanticDeduplicator

	# Initialize with custom settings
	compressor = SemanticDeduplicator(
	threshold=0.85, # Similarity threshold for deduplication
	embedding_model="lightweight", # Use a lightweight embedding model
	min_sentence_length=10, # Minimum length of sentences to consider
	importance_weight=0.3 # Weight given to sentence importance vs. deduplication
	)

	# Compress content
	compressed_content = compressor.compress(
	content="Your large text content here...",
	target_size=1000 # Optional target size in tokens
	)
	```

	### Semantic Chunking

	The chunking module divides content into semantically coherent chunks.

	```python
	from efficient_context.chunking import SemanticChunker

	# Initialize with custom settings
	chunker = SemanticChunker(
	chunk_size=512, # Target size for chunks in tokens
	chunk_overlap=50, # Number of tokens to overlap between chunks
	respect_paragraphs=True, # Avoid breaking paragraphs across chunks
	min_chunk_size=100, # Minimum chunk size in tokens
	max_chunk_size=1024 # Maximum chunk size in tokens
	)

	# Chunk content
	chunks = chunker.chunk(
	content="Your large text content here...",
	document_id="doc-1", # Optional document ID
	metadata={"source": "example", "author": "John Doe"} # Optional metadata
	)
	```

	### Retrieval Optimization

	The retrieval module finds the most relevant chunks for a query.

	```python
	from efficient_context.retrieval import CPUOptimizedRetriever

	# Initialize with custom settings
	retriever = CPUOptimizedRetriever(
	embedding_model="lightweight", # Use a lightweight embedding model
	similarity_metric="cosine", # Metric for comparing embeddings
	use_batching=True, # Batch embedding operations
	batch_size=32, # Size of batches for embedding
	max_index_size=5000 # Maximum number of chunks to keep in the index
	)

	# Index chunks
	retriever.index_chunks(chunks)

	# Retrieve relevant chunks
	relevant_chunks = retriever.retrieve(
	query="Your query here...",
	top_k=5 # Number of chunks to retrieve
	)
	```

	### Memory Management

	The memory module helps optimize memory usage during operations.

	```python
	from efficient_context.memory import MemoryManager

	# Initialize with custom settings
	memory_manager = MemoryManager(
	target_usage_percent=80.0, # Target memory usage percentage
	aggressive_cleanup=False, # Whether to perform aggressive garbage collection
	memory_monitor_interval=None # Interval for memory monitoring in seconds
	)

	# Use context manager for memory-intensive operations
	with memory_manager.optimize_memory():
	# Run memory-intensive operations here
	results = process_large_documents(documents)

	# Get memory usage statistics
	memory_stats = memory_manager.get_memory_usage()
	print(f"Process memory: {memory_stats['process_rss_bytes'] / (1024*1024):.2f} MB")
	```

	## Advanced Usage

	### Customizing the Context Manager

	```python
	from efficient_context import ContextManager
	from efficient_context.compression import SemanticDeduplicator
	from efficient_context.chunking import SemanticChunker
	from efficient_context.retrieval import CPUOptimizedRetriever
	from efficient_context.memory import MemoryManager

	# Initialize a fully customized context manager
	context_manager = ContextManager(
	compressor=SemanticDeduplicator(threshold=0.85),
	chunker=SemanticChunker(chunk_size=256, chunk_overlap=50),
	retriever=CPUOptimizedRetriever(embedding_model="lightweight"),
	memory_manager=MemoryManager(target_usage_percent=80.0),
	max_context_size=4096
	)
	```

	### Integration with LLMs

	```python
	from efficient_context import ContextManager
	from your_llm_library import LLM # Replace with your actual LLM library

	# Initialize components
	context_manager = ContextManager()
	llm = LLM(model="lightweight-model")

	# Process documents
	context_manager.add_documents(documents)

	# For each query
	query = "Tell me about renewable energy"
	optimized_context = context_manager.generate_context(query=query)

	# Use context with the LLM
	response = llm.generate(
	prompt=query,
	context=optimized_context,
	max_tokens=512
	)
	```

	## Performance Considerations

	- Memory Usage: The library is designed to be memory-efficient, but be aware that embedding models may still require significant memory.
	- CPU Performance: Choose the appropriate embedding model based on your CPU capabilities. The `lightweight` option is recommended for constrained environments.
	- Batch Size: Adjust the `batch_size` parameter in retrieval to balance between memory usage and processing speed.
	- Context Size: Setting appropriate `max_context_size` can significantly impact performance, especially when working with limited resources.

	## Extending the Library

	You can create custom implementations of the base classes to adapt the library to your specific needs:

	```python
	from efficient_context.compression.base import BaseCompressor

	class MyCustomCompressor(BaseCompressor):
	def __init__(self, custom_param=None):
	self.custom_param = custom_param

	def compress(self, content, target_size=None):
	# Your custom compression logic here
	return compressed_content
	```

	## Troubleshooting

	High Memory Usage
	- Reduce `batch_size` in the retriever
	- Use a more lightweight embedding model
	- Decrease `max_index_size` to limit the number of chunks stored in memory

	Slow Processing
	- Increase `batch_size` (balancing with memory constraints)
	- Increase `threshold` in the SemanticDeduplicator to be more aggressive with deduplication
	- Reduce `chunk_overlap` to minimize redundant processing

	## Example Applications

	- Chatbots on Edge Devices: Enable context-aware conversations on devices with limited resources
	- Document QA Systems: Create efficient question-answering systems for large document collections
	- Embedded AI Applications: Incorporate context-aware LLM capabilities in embedded systems
	- Mobile Applications: Provide sophisticated LLM features in mobile apps with limited resources