17 13 14

kas

shing3232

AI & ML interests

None yet

Recent Activity

liked a Space about 1 month ago

akhaliq/voxel-deepseek-terminus

liked a model about 2 months ago

Aleph-Alpha/llama-tfree-hat-pretrained-7b-dpo

new activity 2 months ago

deepseek-ai/DeepSeek-V3.1:tool call for reasoning mode

View all activity

Organizations

None yet

liked a Space about 1 month ago

Voxel Deepseek Terminus

🚀

Explore a voxel art pagoda garden

liked a model about 2 months ago

Aleph-Alpha/llama-tfree-hat-pretrained-7b-dpo

7B • Updated 12 days ago • 173 • 9

New activity in deepseek-ai/DeepSeek-V3.1 2 months ago

tool call for reasoning mode

➕ 5

#27 opened 2 months ago by

shing3232

updated a collection 6 months ago

sakura

Collection

5 items • Updated May 15

New activity in Qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4 6 months ago

Int4为什么比没量化的float32和float16还慢

#3 opened 8 months ago by

hujianmin

upvoted a paper 6 months ago

TransMLA: Multi-head Latent Attention Is All You Need

Paper • 2502.07864 • Published Feb 11 • 58

updated a collection 6 months ago

sakura

Collection

5 items • Updated May 15

upvoted an article 7 months ago

Article

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

Sep 18, 2024

• 271

upvoted 2 papers 7 months ago

Hogwild! Inference: Parallel LLM Generation via Concurrent Attention

Paper • 2504.06261 • Published Apr 8 • 110

VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks

Paper • 2504.05118 • Published Apr 7 • 26

liked a model 7 months ago

SakuraLLM/Sakura-GalTransl-7B-v3.7

8B • Updated Aug 15 • 5.72k • 79

liked a model 8 months ago

webbigdata/ALMA-7B-Ja-V2

Text Generation • 7B • Updated Nov 3, 2024 • 678 • 20

New activity in agentica-org/DeepScaleR-1.5B-Preview 9 months ago

I have difficulty to trigger thinking process

#12 opened 9 months ago by

shing3232

New activity in tencent/Tencent-Hunyuan-Large 12 months ago

这个模型得什么配置能运行起来啊

#13 opened 12 months ago by

demo001s

updated a model 12 months ago

shing3232/Sakura-1.5B-Qwen2.5-v1.0-GGUF-IMX

2B • Updated Nov 8, 2024 • 319 • 1

upvoted a collection about 1 year ago

Qwen2.5-Coder

Collection

Code-specific model series based on Qwen2.5 • 40 items • Updated Jul 21 • 347

liked a model over 1 year ago

UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3

Text Generation • 9B • Updated Jul 1, 2024 • 1.97k • • 126

updated a model over 1 year ago

shing3232/sakura-14b-qwen2beta-v0.9.2-IMX

14B • Updated May 31, 2024 • 32 • 3

New activity in SakuraLLM/Sakura-14B-Qwen2beta-v0.9.2-GGUF over 1 year ago

CUDA运行不了BF16模型？

#1 opened over 1 year ago by

NeuronAstate

New activity in Qwen/Qwen1.5-7B-Chat-GGUF over 1 year ago

Please post f16 quantization.

🔥 1

#1 opened over 1 year ago by

ZeroWw

kas

AI & ML interests

Recent Activity

Organizations

shing3232's activity

Voxel Deepseek Terminus

tool call for reasoning mode

Int4为什么比没量化的float32和float16还慢

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

I have difficulty to trigger thinking process

这个模型得什么配置能运行起来啊

CUDA运行不了BF16模型？

Please post f16 quantization.