Dark-O-Ether commited on
Commit
a97920a
Β·
1 Parent(s): 94b2188

Updated README.md and links in the page

Browse files
Files changed (2) hide show
  1. README.md +2 -0
  2. app.py +7 -3
README.md CHANGED
@@ -13,6 +13,8 @@ short_description: Demonstrating the custom tokeniser library (tokeniser-py)
13
 
14
  # tokeniser-py πŸ”£ - Interactive Tokenization Visualizer
15
 
 
 
16
  This Hugging Face Space demonstrates **tokeniser-py**, a custom tokenizer built from scratch for language model preprocessing. Unlike traditional tokenizers like BPE (Byte Pair Encoding), tokeniser-py uses a unique algorithm developed independently and trained on over 1 billion tokens from the SlimPajama dataset.
17
 
18
  ## πŸš€ Features of this Demo
 
13
 
14
  # tokeniser-py πŸ”£ - Interactive Tokenization Visualizer
15
 
16
+ **Imp Links: [PyPI Main Library (tokeniser-py)](https://pypi.org/project/tokeniser-py/) | [PyPI Lite Library (tokeniser-py-lite)](https://pypi.org/project/tokeniser-py-lite/) | [Main Library GitHub (tokeniser-py)](https://github.com/Tasmay-Tibrewal/tokeniser-py) | [Lite Library GitHub (tokeniser-py-lite)](https://github.com/Tasmay-Tibrewal/tokeniser-py-lite) | [Complete repo (unchunked) - HF](https://huggingface.co/datasets/Tasmay-Tib/Tokeniser) | [Complete repo (chunked) - GitHub](https://github.com/Tasmay-Tibrewal/Tokeniser) | [Imp Files Github](https://github.com/Tasmay-Tibrewal/Tokeniser-imp)**
17
+
18
  This Hugging Face Space demonstrates **tokeniser-py**, a custom tokenizer built from scratch for language model preprocessing. Unlike traditional tokenizers like BPE (Byte Pair Encoding), tokeniser-py uses a unique algorithm developed independently and trained on over 1 billion tokens from the SlimPajama dataset.
19
 
20
  ## πŸš€ Features of this Demo
app.py CHANGED
@@ -288,15 +288,19 @@ st.markdown("""
288
  <div class="header-container">
289
  <div>
290
  <h1>tokeniser-py πŸ”£</h1>
291
- <a href = "https://github.com/Tasmay-Tibrewal/tokeniser-py" class="link-top-a" style="display: inline;"><span style="background-color:rgba(100,146,154,0.17); padding:2px 4px; border-radius:3px;">Library GitHub</span></a>
292
  <p class="link-top" style="display: inline;"> | </p>
293
- <a href = "https://huggingface.co/datasets/Tasmay-Tib/Tokeniser" class="link-top-a"style="display: inline;"><span style="background-color:rgba(100,146,154,0.17); padding:2px 4px; border-radius:3px;">HF Dataset</span></a>
 
 
294
  <p class="link-top" style="display: inline;"> | </p>
295
  <a href = "https://github.com/Tasmay-Tibrewal/Tokeniser" class="link-top-a"style="display: inline;"><span style="background-color:rgba(100,146,154,0.17); padding:2px 4px; border-radius:3px;">GitHub Dataset (chunked)</span></a>
296
  <p class="link-top" style="display: inline;"> | </p>
297
  <a href = "https://github.com/Tasmay-Tibrewal/Tokeniser-imp" class="link-top-a"style="display: inline;"><span style="background-color:rgba(100,146,154,0.17); padding:2px 4px; border-radius:3px;">GitHub Imp Files</span></a>
298
  <p class="link-top" style="display: inline;"> | </p>
299
- <a href = "https://pypi.org/project/tokeniser-py/" class="link-top-a"style="display: inline;"><span style="background-color:rgba(100,146,154,0.17); padding:2px 4px; border-radius:3px;">PyPI Package</span></a>
 
 
300
  <p></p>
301
  <p style="font-size: 20px;"><strong>Learn about language model tokenization</strong></p>
302
  <p style="font-size: 17px; margin-bottom: 5px;">
 
288
  <div class="header-container">
289
  <div>
290
  <h1>tokeniser-py πŸ”£</h1>
291
+ <a href = "https://github.com/Tasmay-Tibrewal/tokeniser-py" class="link-top-a" style="display: inline;"><span style="background-color:rgba(100,146,154,0.17); padding:2px 4px; border-radius:3px;">Library GitHub (tokeniser-py)</span></a>
292
  <p class="link-top" style="display: inline;"> | </p>
293
+ <a href = "https://github.com/Tasmay-Tibrewal/tokeniser-py-lite" class="link-top-a" style="display: inline;"><span style="background-color:rgba(100,146,154,0.17); padding:2px 4px; border-radius:3px;">Library GitHub (tokeniser-py-lite)</span></a>
294
+ <p class="link-top" style="display: inline;"> | </p>
295
+ <a href = "https://huggingface.co/datasets/Tasmay-Tib/Tokeniser" class="link-top-a"style="display: inline;"><span style="background-color:rgba(100,146,154,0.17); padding:2px 4px; border-radius:3px;">HF Dataset (unchunked)</span></a>
296
  <p class="link-top" style="display: inline;"> | </p>
297
  <a href = "https://github.com/Tasmay-Tibrewal/Tokeniser" class="link-top-a"style="display: inline;"><span style="background-color:rgba(100,146,154,0.17); padding:2px 4px; border-radius:3px;">GitHub Dataset (chunked)</span></a>
298
  <p class="link-top" style="display: inline;"> | </p>
299
  <a href = "https://github.com/Tasmay-Tibrewal/Tokeniser-imp" class="link-top-a"style="display: inline;"><span style="background-color:rgba(100,146,154,0.17); padding:2px 4px; border-radius:3px;">GitHub Imp Files</span></a>
300
  <p class="link-top" style="display: inline;"> | </p>
301
+ <a href = "https://pypi.org/project/tokeniser-py/" class="link-top-a"style="display: inline;"><span style="background-color:rgba(100,146,154,0.17); padding:2px 4px; border-radius:3px;">PyPI Package (Main Lib)</span></a>
302
+ <p class="link-top" style="display: inline;"> | </p>
303
+ <a href = "https://pypi.org/project/tokeniser-py-lite/" class="link-top-a"style="display: inline;"><span style="background-color:rgba(100,146,154,0.17); padding:2px 4px; border-radius:3px;">PyPI Package (Lite Lib)</span></a>
304
  <p></p>
305
  <p style="font-size: 20px;"><strong>Learn about language model tokenization</strong></p>
306
  <p style="font-size: 17px; margin-bottom: 5px;">