Transformers-tenets / src /fragments /memory-profiler.html
Molbap's picture
Molbap HF Staff
profiler
3854d28
raw
history blame
1.26 kB
<div style="border: 1px solid #e2e8f0; border-radius: 8px; background: white; margin: 1.5rem 0;">
<div style="padding: 1rem; border-bottom: 1px solid #e2e8f0; background: #f8f9fa;">
<h4 style="margin: 0 0 0.5rem 0; color: #495057;">πŸš€ CUDA Warmup Efficiency Benchmark</h4>
<p style="margin: 0; font-size: 0.9em; color: #6c757d;">
Real CUDA warmup benchmarking with actual Transformers models. Measure the performance impact of the caching_allocator_warmup function.
</p>
</div>
<div style="padding: 1rem;">
<iframe src=https://molbap-cuda-warmup-transformers.hf.space width=100% height=800px frameborder=0 style="border-radius: 8px; background: white;"></iframe>
</div>
<div style="padding: 1rem; border-top: 1px solid #e2e8f0; background: #f8f9fa; font-size: 0.9em; color: #6c757d;">
Real CUDA warmup benchmarking with actual Transformers models. Measure the performance impact of the <code>caching_allocator_warmup</code> function at <code>transformers/src/transformers/modeling_utils.py:6186</code>. This interactive tool loads models twice - once with warmup disabled and once with warmup enabled - to demonstrate the significant loading time improvements.
</div>
</div>