Krishna Teja Chitty-Venkata's picture

3

Krishna Teja Chitty-Venkata

krishnateja95

·

https://krishnateja95.github.io/

AI & ML interests

LLM Optimization, Neural Architecture Search, Quantization, Pruning

Recent Activity

updated a model 4 days ago

nm-testing/granite-4.0-h-small-FP8-block

updated a model 4 days ago

nm-testing/granite-4.0-h-small-FP8-block

updated a model 4 days ago

nm-testing/granite-4.0-h-small-FP8-block

View all activity

Organizations

authored 4 papers 20 days ago

MoE-Inference-Bench: Performance Evaluation of Mixture of Expert Large Language and Vision Models

Paper • 2508.17467 • Published Aug 24

PagedEviction: Structured Block-wise KV Cache Pruning for Efficient Large Language Model Inference

Paper • 2509.04377 • Published Sep 4

LExI: Layer-Adaptive Active Experts for Efficient MoE Model Inference

Paper • 2509.02753 • Published Sep 2

ImageNet-Think-250K: A Large-Scale Synthetic Dataset for Multimodal Reasoning for Vision Language Models

Paper • 2510.01582 • Published Oct 2

authored a paper about 1 year ago

A Survey of Techniques for Optimizing Transformer Inference

Paper • 2307.07982 • Published Jul 16, 2023