| # efficient-context Documentation | |
| ## Overview | |
| `efficient-context` is a Python library designed to optimize the handling of context for Large Language Models (LLMs) in CPU-constrained environments. It addresses the challenges of using LLMs with limited computational resources by providing efficient context management strategies. | |
| ## Key Features | |
| 1. **Context Compression**: Reduce memory requirements while preserving information quality | |
| 2. **Semantic Chunking**: Go beyond token-based approaches for more effective context management | |
| 3. **Retrieval Optimization**: Minimize context size through intelligent retrieval strategies | |
| 4. **Memory Management**: Handle large contexts on limited hardware resources | |
| ## Installation | |
| ```bash | |
| pip install efficient-context | |
| ``` | |
| ## Core Components | |
| ### ContextManager | |
| The central class that orchestrates all components of the library. | |
| ```python | |
| from efficient_context import ContextManager | |
| # Initialize with default settings | |
| context_manager = ContextManager() | |
| # Add documents | |
| context_manager.add_document("This is a sample document about renewable energy...") | |
| context_manager.add_documents([doc1, doc2, doc3]) # Add multiple documents | |
| # Generate context for a query | |
| optimized_context = context_manager.generate_context(query="Tell me about renewable energy") | |
| ``` | |
| ### Context Compression | |
| The compression module reduces the size of content while preserving key information. | |
| ```python | |
| from efficient_context.compression import SemanticDeduplicator | |
| # Initialize with custom settings | |
| compressor = SemanticDeduplicator( | |
| threshold=0.85, # Similarity threshold for deduplication | |
| embedding_model="lightweight", # Use a lightweight embedding model | |
| min_sentence_length=10, # Minimum length of sentences to consider | |
| importance_weight=0.3 # Weight given to sentence importance vs. deduplication | |
| ) | |
| # Compress content | |
| compressed_content = compressor.compress( | |
| content="Your large text content here...", | |
| target_size=1000 # Optional target size in tokens | |
| ) | |
| ``` | |
| ### Semantic Chunking | |
| The chunking module divides content into semantically coherent chunks. | |
| ```python | |
| from efficient_context.chunking import SemanticChunker | |
| # Initialize with custom settings | |
| chunker = SemanticChunker( | |
| chunk_size=512, # Target size for chunks in tokens | |
| chunk_overlap=50, # Number of tokens to overlap between chunks | |
| respect_paragraphs=True, # Avoid breaking paragraphs across chunks | |
| min_chunk_size=100, # Minimum chunk size in tokens | |
| max_chunk_size=1024 # Maximum chunk size in tokens | |
| ) | |
| # Chunk content | |
| chunks = chunker.chunk( | |
| content="Your large text content here...", | |
| document_id="doc-1", # Optional document ID | |
| metadata={"source": "example", "author": "John Doe"} # Optional metadata | |
| ) | |
| ``` | |
| ### Retrieval Optimization | |
| The retrieval module finds the most relevant chunks for a query. | |
| ```python | |
| from efficient_context.retrieval import CPUOptimizedRetriever | |
| # Initialize with custom settings | |
| retriever = CPUOptimizedRetriever( | |
| embedding_model="lightweight", # Use a lightweight embedding model | |
| similarity_metric="cosine", # Metric for comparing embeddings | |
| use_batching=True, # Batch embedding operations | |
| batch_size=32, # Size of batches for embedding | |
| max_index_size=5000 # Maximum number of chunks to keep in the index | |
| ) | |
| # Index chunks | |
| retriever.index_chunks(chunks) | |
| # Retrieve relevant chunks | |
| relevant_chunks = retriever.retrieve( | |
| query="Your query here...", | |
| top_k=5 # Number of chunks to retrieve | |
| ) | |
| ``` | |
| ### Memory Management | |
| The memory module helps optimize memory usage during operations. | |
| ```python | |
| from efficient_context.memory import MemoryManager | |
| # Initialize with custom settings | |
| memory_manager = MemoryManager( | |
| target_usage_percent=80.0, # Target memory usage percentage | |
| aggressive_cleanup=False, # Whether to perform aggressive garbage collection | |
| memory_monitor_interval=None # Interval for memory monitoring in seconds | |
| ) | |
| # Use context manager for memory-intensive operations | |
| with memory_manager.optimize_memory(): | |
| # Run memory-intensive operations here | |
| results = process_large_documents(documents) | |
| # Get memory usage statistics | |
| memory_stats = memory_manager.get_memory_usage() | |
| print(f"Process memory: {memory_stats['process_rss_bytes'] / (1024*1024):.2f} MB") | |
| ``` | |
| ## Advanced Usage | |
| ### Customizing the Context Manager | |
| ```python | |
| from efficient_context import ContextManager | |
| from efficient_context.compression import SemanticDeduplicator | |
| from efficient_context.chunking import SemanticChunker | |
| from efficient_context.retrieval import CPUOptimizedRetriever | |
| from efficient_context.memory import MemoryManager | |
| # Initialize a fully customized context manager | |
| context_manager = ContextManager( | |
| compressor=SemanticDeduplicator(threshold=0.85), | |
| chunker=SemanticChunker(chunk_size=256, chunk_overlap=50), | |
| retriever=CPUOptimizedRetriever(embedding_model="lightweight"), | |
| memory_manager=MemoryManager(target_usage_percent=80.0), | |
| max_context_size=4096 | |
| ) | |
| ``` | |
| ### Integration with LLMs | |
| ```python | |
| from efficient_context import ContextManager | |
| from your_llm_library import LLM # Replace with your actual LLM library | |
| # Initialize components | |
| context_manager = ContextManager() | |
| llm = LLM(model="lightweight-model") | |
| # Process documents | |
| context_manager.add_documents(documents) | |
| # For each query | |
| query = "Tell me about renewable energy" | |
| optimized_context = context_manager.generate_context(query=query) | |
| # Use context with the LLM | |
| response = llm.generate( | |
| prompt=query, | |
| context=optimized_context, | |
| max_tokens=512 | |
| ) | |
| ``` | |
| ## Performance Considerations | |
| - **Memory Usage**: The library is designed to be memory-efficient, but be aware that embedding models may still require significant memory. | |
| - **CPU Performance**: Choose the appropriate embedding model based on your CPU capabilities. The `lightweight` option is recommended for constrained environments. | |
| - **Batch Size**: Adjust the `batch_size` parameter in retrieval to balance between memory usage and processing speed. | |
| - **Context Size**: Setting appropriate `max_context_size` can significantly impact performance, especially when working with limited resources. | |
| ## Extending the Library | |
| You can create custom implementations of the base classes to adapt the library to your specific needs: | |
| ```python | |
| from efficient_context.compression.base import BaseCompressor | |
| class MyCustomCompressor(BaseCompressor): | |
| def __init__(self, custom_param=None): | |
| self.custom_param = custom_param | |
| def compress(self, content, target_size=None): | |
| # Your custom compression logic here | |
| return compressed_content | |
| ``` | |
| ## Troubleshooting | |
| **High Memory Usage** | |
| - Reduce `batch_size` in the retriever | |
| - Use a more lightweight embedding model | |
| - Decrease `max_index_size` to limit the number of chunks stored in memory | |
| **Slow Processing** | |
| - Increase `batch_size` (balancing with memory constraints) | |
| - Increase `threshold` in the SemanticDeduplicator to be more aggressive with deduplication | |
| - Reduce `chunk_overlap` to minimize redundant processing | |
| ## Example Applications | |
| - **Chatbots on Edge Devices**: Enable context-aware conversations on devices with limited resources | |
| - **Document QA Systems**: Create efficient question-answering systems for large document collections | |
| - **Embedded AI Applications**: Incorporate context-aware LLM capabilities in embedded systems | |
| - **Mobile Applications**: Provide sophisticated LLM features in mobile apps with limited resources | |