Commits · Luigi/ZeroGPU-LLM-Inference

Major UI/UX improvements for better user experience

df40b1d

Luigi commited on Oct 12

Fix UI not resetting after cancel - add reset handler

b7e5000

Luigi commited on Oct 12

Fix GeneratorExit handling to prevent runtime error

c49f312

Luigi commited on Oct 12

Fix cancel generation feature by removing GeneratorExit handler

9466288

Luigi commited on Oct 12

Handle GeneratorExit by yielding UI reset and returning cleanly

a430811

Luigi commited on Oct 12

Simplify GeneratorExit handling - let it propagate and rely on finally block

6dc4bdb

Luigi commited on Oct 12

Fix GeneratorExit handling by re-raising after UI reset

6ff484e

Luigi commited on Oct 12

Fix UI state reset after generation cancellation

fcb2aaa

Luigi commited on Oct 12

Fix GeneratorExit exception in cancel generation by adding proper exception handling

4bc617b

Luigi commited on Oct 12

bugfix

fd70375

Luigi commited on Oct 12

Improve cancel generation with robust UI state management and orchestrator pattern

9ac7f36

Luigi commited on Oct 12

Restore cancel generation feature with improved UI integration

a94befb

Luigi commited on Oct 12

Remove cancel generation feature as it didn't work

efa3ae6

Luigi commited on Oct 12

Restore proper cancel functionality

afa1066

Luigi commited on Oct 12

Test stopping criteria effectiveness

f0d2ab7

Luigi commited on Oct 12

Improve cancel generation responsiveness

48e50b8

Luigi commited on Oct 12

Fix cancel generation to gracefully stop ongoing response generation

a73d8f4

Luigi commited on Oct 12

remove qwen3 30b-a3b and qwen3 next 80b-a3b

4af617b

Luigi commited on Oct 12

Remove AOT indication from UI duration estimate

f97cbfc

Luigi commited on Oct 12

Remove AOT compilation completely and enable use_cache

426163f

Luigi commited on Oct 12

Fix dynamic_shapes kwargs to match inputs structure for AOT compilation

273acf8

Luigi commited on Oct 12

Fix AOT compilation dynamic_shapes to match expected arg names for torch.export.export

0a99dfc

Luigi commited on Oct 12

Add dynamic GPU time estimate indicator to UI

fc989b4

Luigi commited on Oct 12

Improve model size detection: replace ad-hoc string parsing with reliable params_b field in MODELS dict

ab92e0d

Luigi commited on Oct 12

Add qwen 80b-a3b

8cdf3e1

Luigi commited on Oct 12

Set better defaults for free-tier users: Qwen3-1.7B model, 1024 max tokens, search disabled

2cae073

Luigi commited on Oct 12

Adjust duration estimation for H200 performance - reduce conservative estimates

de766da

Luigi commited on Oct 12

Use actual parameter count for AOT decision instead of string matching

e3e334f

Luigi commited on Oct 12

Make AOT compilation conditional for models >= 2B parameters to optimize free tier usage

4500f92

Luigi commited on Oct 12

Add AOT compilation optimization for ZeroGPU acceleration

a7866ff

Luigi commited on Oct 12

add 4 20b+ models after enabling dynamic gpu duration

fea2910
verified

Luigi commited on Oct 12

Add dynamic duration calculation for ZeroGPU acceleration

6073cc2

Luigi commited on Oct 12

make qwen-4b default

d3726c6
verified

Luigi commited on Oct 11

disable two models that cannot run or too run too slowly on hf spaces with zerogpu

3dc7ced

Luigi commited on Oct 11

Update app.py

f1fa55c
verified

Luigi commited on Oct 11

Update app.py

f07d6ab
verified

Luigi commited on Oct 11

Update app.py

0992852
verified

Luigi commited on Oct 11

add original apriel 15b

2b25033
verified

Luigi commited on Oct 11

use apriel 8bit

a4681bd
verified

Luigi commited on Oct 11

run Apriel on 4bit

2cadf8a
verified

Luigi commited on Oct 11

Add Apriel-1.5-15b-Thinker

3665b54
verified

Luigi commited on Oct 11

Update app.py

7f654b2
verified

Luigi commited on Oct 10

Update app.py

15b78c7
verified

Luigi commited on Oct 10

Update app.py

7356fa6
verified

Luigi commited on Oct 9

add 4 models from qwen3 family

048cfc4
verified

Luigi commited on Oct 9

add qwen3 32b awq

b9efb74
verified

Luigi commited on Oct 9

Update app.py

5e03586
verified

Luigi commited on Oct 9

Update app.py

e5a1663
verified

Luigi commited on Oct 9

Update app.py

de64679
verified

Luigi commited on Oct 9

Update app.py

4418827
verified

Luigi commited on Oct 9

Commit History

Major UI/UX improvements for better user experience df40b1d

Fix UI not resetting after cancel - add reset handler b7e5000

Fix GeneratorExit handling to prevent runtime error c49f312

Fix cancel generation feature by removing GeneratorExit handler 9466288

Handle GeneratorExit by yielding UI reset and returning cleanly a430811

Simplify GeneratorExit handling - let it propagate and rely on finally block 6dc4bdb

Fix GeneratorExit handling by re-raising after UI reset 6ff484e

Fix UI state reset after generation cancellation fcb2aaa

Fix GeneratorExit exception in cancel generation by adding proper exception handling 4bc617b

bugfix fd70375

Improve cancel generation with robust UI state management and orchestrator pattern 9ac7f36

Restore cancel generation feature with improved UI integration a94befb

Remove cancel generation feature as it didn't work efa3ae6

Restore proper cancel functionality afa1066

Test stopping criteria effectiveness f0d2ab7

Improve cancel generation responsiveness 48e50b8

Fix cancel generation to gracefully stop ongoing response generation a73d8f4

remove qwen3 30b-a3b and qwen3 next 80b-a3b 4af617b

Remove AOT indication from UI duration estimate f97cbfc

Remove AOT compilation completely and enable use_cache 426163f

Fix dynamic_shapes kwargs to match inputs structure for AOT compilation 273acf8

Fix AOT compilation dynamic_shapes to match expected arg names for torch.export.export 0a99dfc

Add dynamic GPU time estimate indicator to UI fc989b4

Improve model size detection: replace ad-hoc string parsing with reliable params_b field in MODELS dict ab92e0d

Add qwen 80b-a3b 8cdf3e1

Set better defaults for free-tier users: Qwen3-1.7B model, 1024 max tokens, search disabled 2cae073

Adjust duration estimation for H200 performance - reduce conservative estimates de766da

Use actual parameter count for AOT decision instead of string matching e3e334f

Make AOT compilation conditional for models >= 2B parameters to optimize free tier usage 4500f92

Add AOT compilation optimization for ZeroGPU acceleration a7866ff

add 4 20b+ models after enabling dynamic gpu duration fea2910 verified

Add dynamic duration calculation for ZeroGPU acceleration 6073cc2

make qwen-4b default d3726c6 verified

disable two models that cannot run or too run too slowly on hf spaces with zerogpu 3dc7ced

Update app.py f1fa55c verified

Update app.py f07d6ab verified

Update app.py 0992852 verified

add original apriel 15b 2b25033 verified

use apriel 8bit a4681bd verified

run Apriel on 4bit 2cadf8a verified

Add Apriel-1.5-15b-Thinker 3665b54 verified

Update app.py 7f654b2 verified

Update app.py 15b78c7 verified

Update app.py 7356fa6 verified

add 4 models from qwen3 family 048cfc4 verified

add qwen3 32b awq b9efb74 verified

Update app.py 5e03586 verified

Update app.py e5a1663 verified

Update app.py de64679 verified

Update app.py 4418827 verified

Major UI/UX improvements for better user experience

df40b1d

Fix UI not resetting after cancel - add reset handler

b7e5000

Fix GeneratorExit handling to prevent runtime error

c49f312

Fix cancel generation feature by removing GeneratorExit handler

9466288

Handle GeneratorExit by yielding UI reset and returning cleanly

a430811

Simplify GeneratorExit handling - let it propagate and rely on finally block

6dc4bdb

Fix GeneratorExit handling by re-raising after UI reset

6ff484e

Fix UI state reset after generation cancellation

fcb2aaa

Fix GeneratorExit exception in cancel generation by adding proper exception handling

4bc617b

bugfix

fd70375

Improve cancel generation with robust UI state management and orchestrator pattern

9ac7f36

Restore cancel generation feature with improved UI integration

a94befb

Remove cancel generation feature as it didn't work

efa3ae6

Restore proper cancel functionality

afa1066

Test stopping criteria effectiveness

f0d2ab7

Improve cancel generation responsiveness

48e50b8

Fix cancel generation to gracefully stop ongoing response generation

a73d8f4

remove qwen3 30b-a3b and qwen3 next 80b-a3b

4af617b

Remove AOT indication from UI duration estimate

f97cbfc

Remove AOT compilation completely and enable use_cache

426163f

Fix dynamic_shapes kwargs to match inputs structure for AOT compilation

273acf8

Fix AOT compilation dynamic_shapes to match expected arg names for torch.export.export

0a99dfc

Add dynamic GPU time estimate indicator to UI

fc989b4

Improve model size detection: replace ad-hoc string parsing with reliable params_b field in MODELS dict

ab92e0d

Add qwen 80b-a3b

8cdf3e1

Set better defaults for free-tier users: Qwen3-1.7B model, 1024 max tokens, search disabled

2cae073

Adjust duration estimation for H200 performance - reduce conservative estimates

de766da

Use actual parameter count for AOT decision instead of string matching

e3e334f

Make AOT compilation conditional for models >= 2B parameters to optimize free tier usage

4500f92

Add AOT compilation optimization for ZeroGPU acceleration

a7866ff

add 4 20b+ models after enabling dynamic gpu duration

fea2910
verified

Add dynamic duration calculation for ZeroGPU acceleration

6073cc2

make qwen-4b default

d3726c6
verified

disable two models that cannot run or too run too slowly on hf spaces with zerogpu

3dc7ced

Update app.py

f1fa55c
verified

Update app.py

f07d6ab
verified

Update app.py

0992852
verified

add original apriel 15b

2b25033
verified

use apriel 8bit

a4681bd
verified

run Apriel on 4bit

2cadf8a
verified

Add Apriel-1.5-15b-Thinker

3665b54
verified

Update app.py

7f654b2
verified

Update app.py

15b78c7
verified

Update app.py

7356fa6
verified

add 4 models from qwen3 family

048cfc4
verified

add qwen3 32b awq

b9efb74
verified

Update app.py

5e03586
verified

Update app.py

e5a1663
verified

Update app.py

de64679
verified

Update app.py

4418827
verified