“The doom lies in yourself, not in your name.”
Continuation of Wur doomed!.
For longer text chunks or stories, https://pastebin.com works great and helps prevent the thread from slowing down!
🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧🟧
🟧⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛🟧⬛⬛⬛⬛⬛⬛⬛⬛⬛🟧⬛⬛⬛⬛⬛⬛⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛⬛🟧
🟧🟧⬛⬛⬛⬛⬛⬛⬛⬛⬛🟧⬛⬛⬛⬛⬛⬛⬛⬛⬛🟧⬛⬛⬛⬛⬛⬛⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧🟧
⬜🟧⬛⬛⬛⬛⬛⬛⬛⬛⬛🟧⬛⬛⬛⬛⬛⬛⬛⬛⬛🟧⬛⬛⬛⬛⬛⬛⬛⬛⬛🟧⬛⬛⬛⬛🟧⬛⬛⬛⬛🟧⬜
⬜🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛⬛⬛⬛⬛⬛⬛🟧⬜
⬜🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛⬛⬛⬛⬛⬛⬛🟧⬜
⬜🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛⬛⬛⬛⬛⬛⬛🟧⬜
⬜🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛⬛⬛⬛⬛⬛⬛🟧⬜
⬜🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛⬛⬛⬛⬛⬛⬛🟧⬜
⬜🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛⬛⬛⬛⬛⬛⬛🟧⬜
⬜🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛⬛⬛⬛⬛⬛⬛🟧⬜
⬜🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛⬛⬛⬛⬛⬛⬛🟧⬜
⬜🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛⬛⬛⬛⬛⬛⬛🟧⬜
⬜🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧⬛🟧⬛⬛⬛🟧⬜
⬜🟧⬛⬛⬛🟧🟧⬛⬛⬛⬛🟧⬛⬛⬛⬛🟧🟧⬛⬛⬛🟧⬛⬛⬛🟧🟧⬛⬛⬛⬛🟧⬛⬛⬛🟧🟧🟧⬛⬛⬛🟧⬜
⬜🟧⬛⬛⬛🟧⬛⬛⬛⬛🟧🟧🟧⬛⬛⬛⬛⬛⬛⬛⬛🟧⬛⬛⬛⬛⬛⬛⬛⬛🟧🟧🟧⬛⬛🟧⬜🟧⬛⬛⬛🟧⬜
⬜🟧⬛⬛⬛⬛⬛⬛⬛🟧🟧⬜🟧🟧⬛⬛⬛⬛⬛⬛🟧🟧🟧⬛⬛⬛⬛⬛⬛🟧🟧⬜🟧⬛⬛🟧⬜🟧⬛⬛⬛🟧⬜
⬜🟧⬛⬛⬛⬛⬛⬛🟧🟧⬜⬜⬜🟧🟧⬛⬛⬛⬛🟧🟧⬜🟧🟧⬛⬛⬛⬛🟧🟧⬜⬜🟧🟧⬛🟧⬜🟧⬛⬛⬛🟧⬜
⬜🟧⬛⬛⬛⬛⬛🟧🟧⬜⬜⬜⬜⬜🟧🟧⬛⬛🟧🟧⬜⬜⬜🟧🟧⬛⬛🟧🟧⬜⬜⬜⬜🟧🟧🟧⬜🟧⬛⬛⬛🟧⬜
⬜🟧⬛⬛⬛⬛🟧🟧⬜⬜⬜⬜⬜⬜⬜🟧🟧🟧🟧⬜⬜⬜⬜⬜🟧🟧🟧🟧⬜⬜⬜⬜⬜⬜⬜⬜⬜🟧🟧⬛⬛🟧⬜
⬜🟧⬛⬛⬛🟧🟧⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜🟧⬛⬛🟧⬜
⬜🟧⬛⬛🟧🟧⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜🟧🟧⬛🟧⬜
⬜🟧⬛🟧🟧⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜🟧⬛🟧⬜
⬜🟧🟧🟧⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜⬜🟧🟧🟧⬜
The doom is still buried within Command-A for sure.
A step 601 preview - all with temperature = 0:
- It's still messing up some end of lines, but I can live with that if it works... Likely can be fixed later using the new
class 0random data if a problem. - The Grimdark story was noticeably (much!) better compared to the inverse.
- The Battlestar Galactica story showed that even though
Q8_0,F16andBF16all diverge slightly fromF32; it's not clearly making them any worse (I actually liked theQ8_0story best!).
| Size | Name |
|---|---|
| 287M | command-a-03-2025-lora-Q8_0.ggu |
| 541M | command-a-03-2025-lora-F16.gguf |
| 541M | command-a-03-2025-lora-BF16.gguf |
| 1.1G | command-a-03-2025-lora-F32.gguf |
It still has a way to go before it starts to converge, but I would think by step 1000 it will be pretty close:
566 responses in previous thread! In the future we may be the reason for hf staff to implement multi-page view of discussions.
This was posted on Hacker News today:
Absolutely fascinating!
This was posted on Hacker News today:
Absolutely fascinating!
That was really cool. Thanks for sharing!
This was posted on Hacker News today:
Absolutely fascinating!
That was really cool. Thanks for sharing!
Yeah, and llama-3.1:405b doing so well was quite a surprise too (and makes you a bit sad everything seems to be moving away from large dense models ).
I'm also finding this works well with Kimi-K2:
Use interesting/original names - No "Elara Voss" or "Kael Thorn" AI-slop names please.
There must be enough slop (and scraped discussions about slop) already in its training data for it to infer that you are fed up with "slop" in general, and not just these two specific examples!
Interestingly, this also seems to cause it to write slightly differently. If you look at the first story I created above, it breaks the paragraphs up very differently to normal and I think this in turn made the writing quite interesting and different: it reminded me of Cormac McCarthy's terse writing style, but with the deliberate lack of world explanation of the Malazan Book of the Fallen series that leaves you wondering WTF is going on!
Hopefully Kimi-K3 will have seen enough "it's not X, but Y" to avoid that too! 😱
Yeah, I have it setup locally using the lossless
DF11quant:https://huggingface.co/DFloat11/FLUX.1-Krea-dev-DF11-ComfyUI
(the
FP8was noticeably worse)along with the
FP8"scaled" text encoder from here:https://huggingface.co/comfyanonymous/flux_text_encoders
(it makes almost no difference using
FP8for the text encoder)
That's what I've noticed too when using imagegen models, fp8 hurts them a lot. Q8_0 ggufs(yes, they exist for imagegen) however are much less lossy.
it makes almost no difference using FP8 for the text encoder
So it's like the TTS models with an LLM -> neural codec. You can quantize the LLM part but not the neural codec part.
Q8_0 ggufs(yes, they exist for imagegen)
Did they ever get ggufs working for ByteDance-Seed/BAGEL-7B-MoT ?
I've been using the DF11 but it's quite slow / spills over to CPU.
Hopefully Kimi-K3 will have seen enough "it's not X, but Y" to avoid that too!
You know interestingly, I found that good old Mistral-Large can identify these and highlight them a lot better than Sonnet-4.5, Command-A and Kimi-K2 when prompted to find them in text a chapter.
(they can all do it with better prompting / examples but I just found it interesting that the model predating this slop could do it)
Tested Minimax-M2, they distilled gpt-ass, what a terrible model:
User:
What is 15 * 23?
Model:
<think>
The user asks: "What is 15 * 23?" This is a simple multiplication. 15 * 23 = 345. So answer: 345.
We should respond with the answer. There's no policy violation. So answer: 345.
We can also show steps: 15*20=300, 15*3=45, sum=345.
Thus final answer: 345.
</think>
\(15 \times 23 = 345\).
Besides wasting tokens on "safety", it hallucinates like crazy in other prompts.
They distilled gpt-ass, what a terrible model
What a waste :(
it hallucinates like crazy in other prompts.
So does gpt-ass!
@Downtown-Case @treehugg3 You might be interested. I think OLMo-2-0325-32B has comparable performance to Seed-OSS-36B. I have been using the checkpoints at this branch, at the end of the first stage of pretraining (at 6T tokens, before the "ingredient" datasets are added): https://huggingface.co/allenai/OLMo-2-0325-32B/tree/stage1-step721901-tokens6056B
For a base model, it has been surprisingly good at completions and IMO fairly slop-free and definitely uncensored. They did very little pretrain filtering which is interesting because most other models have extensive filtering these days. I also tried the checkpoint at 3T tokens. At 3T tokens it is less coherent but should be more steerable. I am going to experiment with some of the other checkpoints to find a sweet spot if it is not 6T. Get it while you can because HF is hosting over seven hundred checkpoints of the model in this one repo.
One idea is to continue from the last initial pretrain checkpoint and train on high-quality tokens specific to your targets. The paper talks a lot about this "mid-stage" pretraining that might be useful. https://arxiv.org/pdf/2501.00656
Mmmm... I am interested, though with a caveat:
"max_position_embeddings": 4096
That's rather small :(
Seed-OSS base was interesting because it could grasp a huge context reasonably well, even with its relative sloppiness. I had similar high hopes for Falcon H1 34B base, but (at least last time I checked it with llama.cpp), it just breaks with too many tokens fed to it.
Potential trainability is interesting though. I do wonder what it'd be like if, say, continue trained on a bunch of whole books. But that's no small financial endeavor.



