Upload sd_token_similarity_calculator.ipynb
Browse files
sd_token_similarity_calculator.ipynb
CHANGED
|
@@ -123,7 +123,7 @@
|
|
| 123 |
},
|
| 124 |
"outputId": "e335f5da-b26d-4eea-f854-fd646444ea14"
|
| 125 |
},
|
| 126 |
-
"execution_count":
|
| 127 |
"outputs": [
|
| 128 |
{
|
| 129 |
"output_type": "stream",
|
|
@@ -279,6 +279,48 @@
|
|
| 279 |
},
|
| 280 |
"execution_count": null,
|
| 281 |
"outputs": []
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 282 |
}
|
| 283 |
]
|
| 284 |
}
|
|
|
|
| 123 |
},
|
| 124 |
"outputId": "e335f5da-b26d-4eea-f854-fd646444ea14"
|
| 125 |
},
|
| 126 |
+
"execution_count": null,
|
| 127 |
"outputs": [
|
| 128 |
{
|
| 129 |
"output_type": "stream",
|
|
|
|
| 279 |
},
|
| 280 |
"execution_count": null,
|
| 281 |
"outputs": []
|
| 282 |
+
},
|
| 283 |
+
{
|
| 284 |
+
"cell_type": "markdown",
|
| 285 |
+
"source": [
|
| 286 |
+
"\n",
|
| 287 |
+
"\n",
|
| 288 |
+
"This is how the notebook works:\n",
|
| 289 |
+
"\n",
|
| 290 |
+
"Similiar vectors = similiar output in the SD 1.5 / SDXL / FLUX model\n",
|
| 291 |
+
"\n",
|
| 292 |
+
"CLIP converts the prompt text to vectors (“tensors”) , with float32 values usually ranging from -1 to 1\n",
|
| 293 |
+
"\n",
|
| 294 |
+
"Dimensions are [ 1x768 ] tensors for SD 1.5 , and a [ 1x768 , 1x1024 ] tensor for SDXL and FLUX.\n",
|
| 295 |
+
"\n",
|
| 296 |
+
"The SD models and FLUX converts these vectors to an image.\n",
|
| 297 |
+
"\n",
|
| 298 |
+
"This notebook takes an input string , tokenizes it and matches the first token against the 49407 token vectors in the vocab.json : https://huggingface.co/black-forest-labs/FLUX.1-dev/tree/main/tokenizer\n",
|
| 299 |
+
"\n",
|
| 300 |
+
"It finds the “most similiar tokens” in the list. Similarity is the theta angle between the token vectors.\n",
|
| 301 |
+
"\n",
|
| 302 |
+
"\n",
|
| 303 |
+
"<div>\n",
|
| 304 |
+
"<img src=\"https://huggingface.co/datasets/codeShare/sd_tokens/resolve/main/cosine.jpeg\" width=\"300\"/>\n",
|
| 305 |
+
"</div>\n",
|
| 306 |
+
"\n",
|
| 307 |
+
"The angle is calculated using cosine similarity , where 1 = 100% similarity (parallell vectors) , and 0 = 0% similarity (perpendicular vectors).\n",
|
| 308 |
+
"\n",
|
| 309 |
+
"Negative similarity is also possible.\n",
|
| 310 |
+
"\n",
|
| 311 |
+
"So if you are bored of prompting “girl” and want something similiar you can run this notebook and use the “chick</w>” token at 21.88% similarity , for example\n",
|
| 312 |
+
"\n",
|
| 313 |
+
"You can also run a mixed search , like “cute+girl”/2 , where for example “kpop</w>” has a 16.71% similarity\n",
|
| 314 |
+
"\n",
|
| 315 |
+
"Sidenote: Prompt weights like (banana:1.2) will scale the magnitude of the corresponding 1x768 tensor(s) by 1.2 .\n",
|
| 316 |
+
"\n",
|
| 317 |
+
"Source: https://huggingface.co/docs/diffusers/main/en/using-diffusers/weighted_prompts*\n",
|
| 318 |
+
"\n",
|
| 319 |
+
"So TLDR; vector direction = “what to generate” , vector magnitude = “prompt weights”"
|
| 320 |
+
],
|
| 321 |
+
"metadata": {
|
| 322 |
+
"id": "njeJx_nSSA8H"
|
| 323 |
+
}
|
| 324 |
}
|
| 325 |
]
|
| 326 |
}
|