OpenPeerAI
/

OpenPeerLLM

@@ -1,209 +1,209 @@
----
-language:
-  - en
-license: mit
-library_name: pytorch
-pipeline_tag: text-generation
-tags:
-  - pytorch
-  - causal-lm
-  - decentralized-learning
-  - transformer
-  - boinc
-  - decent-torch
-  - lonscript
-datasets:
-  - custom
-model-index:
-  - name: OpenPeerLLM
-    results:
-      - task:
-          name: Language Modeling
-          type: text-generation
-        dataset:
-          name: Custom Text Dataset
-          type: text
-        metrics:
-          - name: Epoch
-            type: number
-            value: 2
-          - name: Model Size
-            type: text
-            value: "1.82 GB"
-          - name: Run Time
-            type: text
-            value: "2.5 minutes on Intel UHD Graphics 630"
-          - name: Loss
-            type: cross-entropy
-            value: 7.11
----
-# OpenPeerLLM: A Decentralized Large Language Model
-[![DOI](https://img.shields.io/badge/DOI-10.57967%2Fhf%2F6469-blue.svg)](https://doi.org/10.57967/hf/6469)
-This project implements a decentralized Large Language Model (LLM) that utilizes DecentTorch, Huggingface Transformers, BOINC, and the decentralized-internet SDK. The model incorporates LonScript grammar for enhanced language understanding and leverages OpenPeer for decentralized training and inference.
-## Author Information
-- **Author:** Andrew Magdy Kamal Nassief
-- **Year:** 2025
-- **Publisher:** Stark Publishing Group
-- **Journal:** Hugging Face Model Hub
-## Features
-- Decentralized model architecture using DecentTorch
-- Distributed computation through BOINC integration
-- OpenPeer network integration for peer-to-peer model training
-- LonScript-inspired grammar parsing system
-- Deep reasoning capabilities following LLM standards
-## Installation
-1. Install the required dependencies:
-```bash
-pip install -r requirements.txt
-```
-2. Ensure you have Mojo runtime installed for enhanced performance.
-## Usage
-```python
-from src.model import DecentralizedLLM
-from src.grammar import LonScriptGrammar
-# Initialize the model
-model = DecentralizedLLM()
-grammar = LonScriptGrammar()
-# Use the model for inference
-response = model.reason("context", "query")
-```
-## Training Details
-### Training Data
-The model is trained on the [awesome-chatgpt-prompts](https://huggingface.co/datasets/fka/awesome-chatgpt-prompts) dataset, which contains diverse prompt-completion pairs. This dataset helps the model understand various roles and contexts, making it suitable for a wide range of applications.
-### Training Procedure
-- **Architecture:** 12-layer transformer with 768 hidden dimensions and 12 attention heads
-- **Optimizer:** AdamW with learning rate 5e-5
-- **Batch Size:** 8
-- **Training Steps:** 10,000
-- **Warmup Steps:** 1,000
-- **Hardware:** Distributed across peer network nodes
-## Evaluation Results
-Initial testing shows promising results:
-- **Final Epoch:** 2
-- **Model Size:** 1.82 GB
-- **Total Run Time:** 2.5 minutes on Intel UHD Graphics 630
-- **Loss:** 7.11
-- **Perplexity:** 1223.8
-- **Accuracy:** 78.5%
-- **Response Coherence:** 82.1%
-- **Peer Network Efficiency:** 91.2%
-### Metrics Explanation
-#### Test Calculations and Methodology
-Our evaluation metrics were computed using the following methodology:
-1. **Training Progression**
-   - Total Steps = epochs × steps_per_epoch = 2 × 10,000 = 20,000
-   - Samples Processed = total_steps × batch_size = 20,000 × 8 = 160,000
-   - Average Time/Epoch = 75 seconds on Intel UHD Graphics 630
-2. **Model Storage Analysis**
-   - Parameter Count = layers × hidden_dim² = 12 × 768² ≈ 7.1M
-   - Network State Size = 1.82 GB (measured post-training)
-   - Includes: weights, biases, peer coordination tables
-3. **Performance Metrics**
-   - Cross-Entropy Loss = -∑(y_true * log(y_pred)) = 7.11
-   - Perplexity = exp(cross_entropy) = exp(7.11) ≈ 1223.8
-   - Token Accuracy = correct_predictions/total_tokens × 100 = 78.5%
-4. **Output Evaluation**
-   - Coherence Score: Based on inter-sentence relationship strength
-   - Measured across 1000 generated responses
-   - Average semantic link score: 82.1%
-5. **Network Metrics**
-   - Task Completion Rate = successful_tasks/total_tasks × 100 = 91.2%
-   - Measured across distributed training operations
-   - Accounts for node synchronization success
-#### Metric Descriptions
-- **Training Progress**: Two complete dataset passes, processing 160,000 total samples through 20,000 batched steps.
-- **Model Scale**: Neural network deployment package of 1.82 GB, encompassing parameter matrices and distributed coordination components.
-- **Validation Results**: Cross-entropy of 7.11 yields perplexity of 1223.8, indicating the model's token prediction spread across vocabulary space.
-- **Token Precision**: Successfully predicted 78.5% of next tokens in held-out validation data, tested against reference completions.
-- **Generation Quality**: Achieved 82.1% semantic continuity score across multi-sentence outputs, based on contextual alignment measurements.
-- **Distributed Performance**: Maintained 91.2% task execution success rate across peer nodes during distributed operations.
-- **Output Quality**: Automated analysis of 82.1% reflects the generated text's internal consistency, measuring how well each new statement connects to and builds upon previous ones.
-- **Network Performance**: Distributed training achieved 91.2% task throughput, indicating the proportion of successfully coordinated computation across the peer-to-peer node network.
-## Limitations & Biases
-1. **Current Limitations:**
-   - Maximum sequence length of 1024 tokens
-   - Requires stable network connection for peer-to-peer operations
-   - Limited support for non-English languages
-2. **Known Biases:**
-   - Training data may contain societal biases
-   - Peer network distribution may favor certain geographic regions
-   - Response quality depends on active peer participation
-## Environmental Impact
-The model is designed to minimize environmental impact through:
-- Efficient resource distribution across peer networks
-- Multithreading and parallel processing optimization
-- Smart load balancing among participating nodes
-- Reduced central server dependency
-- Optimized computational resource sharing
-## Architecture
-The system consists of several key components:
-1. **DecentralizedLLM:** The main model class that integrates various components
-2. **LonScriptGrammar:** Grammar parsing system inspired by LonScript
-3. **BOINC Integration:** For distributed computation
-4. **OpenPeer Network:** For decentralized training and inference
-## License
-This project is licensed under multiple licenses to ensure maximum flexibility and openness:
-- OPNL and OPNL-2 for the decentralized protocol aspects
-- MIT License for the software implementation
-- Creative Commons Attribution 4.0 International (CC-BY-4.0) for documentation and models
-## Citation
-```bibtex
-@misc{openpeer-llm,
-  author = {Andrew Magdy Kamal Nassief},
-  title = {OpenPeerLLM: A Decentralized Language Model},
-  year = {2025},
-  publisher = {Stark Publishing Group},
-  journal = {Hugging Face Model Hub}
-}
-```
-## Contributing
 Contributions are welcome! Please feel free to submit a Pull Request.

+---
+language:
+  - en
+license: mit
+library_name: openpeerllm
+pipeline_tag: text-generation
+tags:
+  - pytorch
+  - causal-lm
+  - decentralized-learning
+  - transformer
+  - boinc
+  - decent-torch
+  - lonscript
+datasets:
+  - custom
+model-index:
+  - name: OpenPeerLLM
+    results:
+      - task:
+          name: Language Modeling
+          type: text-generation
+        dataset:
+          name: Custom Text Dataset
+          type: text
+        metrics:
+          - name: Epoch
+            type: number
+            value: 2
+          - name: Model Size
+            type: text
+            value: "1.82 GB"
+          - name: Run Time
+            type: text
+            value: "2.5 minutes on Intel UHD Graphics 630"
+          - name: Loss
+            type: cross-entropy
+            value: 7.11
+---
+# OpenPeerLLM: A Decentralized Large Language Model
+[![DOI](https://img.shields.io/badge/DOI-10.57967%2Fhf%2F6469-blue.svg)](https://doi.org/10.57967/hf/6469)
+This project implements a decentralized Large Language Model (LLM) that utilizes DecentTorch, Huggingface Transformers, BOINC, and the decentralized-internet SDK. The model incorporates LonScript grammar for enhanced language understanding and leverages OpenPeer for decentralized training and inference.
+## Author Information
+- **Author:** Andrew Magdy Kamal Nassief
+- **Year:** 2025
+- **Publisher:** Stark Publishing Group
+- **Journal:** Hugging Face Model Hub
+## Features
+- Decentralized model architecture using DecentTorch
+- Distributed computation through BOINC integration
+- OpenPeer network integration for peer-to-peer model training
+- LonScript-inspired grammar parsing system
+- Deep reasoning capabilities following LLM standards
+## Installation
+1. Install the required dependencies:
+```bash
+pip install -r requirements.txt
+```
+2. Ensure you have Mojo runtime installed for enhanced performance.
+## Usage
+```python
+from src.model import DecentralizedLLM
+from src.grammar import LonScriptGrammar
+# Initialize the model
+model = DecentralizedLLM()
+grammar = LonScriptGrammar()
+# Use the model for inference
+response = model.reason("context", "query")
+```
+## Training Details
+### Training Data
+The model is trained on the [awesome-chatgpt-prompts](https://huggingface.co/datasets/fka/awesome-chatgpt-prompts) dataset, which contains diverse prompt-completion pairs. This dataset helps the model understand various roles and contexts, making it suitable for a wide range of applications.
+### Training Procedure
+- **Architecture:** 12-layer transformer with 768 hidden dimensions and 12 attention heads
+- **Optimizer:** AdamW with learning rate 5e-5
+- **Batch Size:** 8
+- **Training Steps:** 10,000
+- **Warmup Steps:** 1,000
+- **Hardware:** Distributed across peer network nodes
+## Evaluation Results
+Initial testing shows promising results:
+- **Final Epoch:** 2
+- **Model Size:** 1.82 GB
+- **Total Run Time:** 2.5 minutes on Intel UHD Graphics 630
+- **Loss:** 7.11
+- **Perplexity:** 1223.8
+- **Accuracy:** 78.5%
+- **Response Coherence:** 82.1%
+- **Peer Network Efficiency:** 91.2%
+### Metrics Explanation
+#### Test Calculations and Methodology
+Our evaluation metrics were computed using the following methodology:
+1. **Training Progression**
+   - Total Steps = epochs × steps_per_epoch = 2 × 10,000 = 20,000
+   - Samples Processed = total_steps × batch_size = 20,000 × 8 = 160,000
+   - Average Time/Epoch = 75 seconds on Intel UHD Graphics 630
+2. **Model Storage Analysis**
+   - Parameter Count = layers × hidden_dim² = 12 × 768² ≈ 7.1M
+   - Network State Size = 1.82 GB (measured post-training)
+   - Includes: weights, biases, peer coordination tables
+3. **Performance Metrics**
+   - Cross-Entropy Loss = -∑(y_true * log(y_pred)) = 7.11
+   - Perplexity = exp(cross_entropy) = exp(7.11) ≈ 1223.8
+   - Token Accuracy = correct_predictions/total_tokens × 100 = 78.5%
+4. **Output Evaluation**
+   - Coherence Score: Based on inter-sentence relationship strength
+   - Measured across 1000 generated responses
+   - Average semantic link score: 82.1%
+5. **Network Metrics**
+   - Task Completion Rate = successful_tasks/total_tasks × 100 = 91.2%
+   - Measured across distributed training operations
+   - Accounts for node synchronization success
+#### Metric Descriptions
+- **Training Progress**: Two complete dataset passes, processing 160,000 total samples through 20,000 batched steps.
+- **Model Scale**: Neural network deployment package of 1.82 GB, encompassing parameter matrices and distributed coordination components.
+- **Validation Results**: Cross-entropy of 7.11 yields perplexity of 1223.8, indicating the model's token prediction spread across vocabulary space.
+- **Token Precision**: Successfully predicted 78.5% of next tokens in held-out validation data, tested against reference completions.
+- **Generation Quality**: Achieved 82.1% semantic continuity score across multi-sentence outputs, based on contextual alignment measurements.
+- **Distributed Performance**: Maintained 91.2% task execution success rate across peer nodes during distributed operations.
+- **Output Quality**: Automated analysis of 82.1% reflects the generated text's internal consistency, measuring how well each new statement connects to and builds upon previous ones.
+- **Network Performance**: Distributed training achieved 91.2% task throughput, indicating the proportion of successfully coordinated computation across the peer-to-peer node network.
+## Limitations & Biases
+1. **Current Limitations:**
+   - Maximum sequence length of 1024 tokens
+   - Requires stable network connection for peer-to-peer operations
+   - Limited support for non-English languages
+2. **Known Biases:**
+   - Training data may contain societal biases
+   - Peer network distribution may favor certain geographic regions
+   - Response quality depends on active peer participation
+## Environmental Impact
+The model is designed to minimize environmental impact through:
+- Efficient resource distribution across peer networks
+- Multithreading and parallel processing optimization
+- Smart load balancing among participating nodes
+- Reduced central server dependency
+- Optimized computational resource sharing
+## Architecture
+The system consists of several key components:
+1. **DecentralizedLLM:** The main model class that integrates various components
+2. **LonScriptGrammar:** Grammar parsing system inspired by LonScript
+3. **BOINC Integration:** For distributed computation
+4. **OpenPeer Network:** For decentralized training and inference
+## License
+This project is licensed under multiple licenses to ensure maximum flexibility and openness:
+- OPNL and OPNL-2 for the decentralized protocol aspects
+- MIT License for the software implementation
+- Creative Commons Attribution 4.0 International (CC-BY-4.0) for documentation and models
+## Citation
+```bibtex
+@misc{openpeer-llm,
+  author = {Andrew Magdy Kamal Nassief},
+  title = {OpenPeerLLM: A Decentralized Language Model},
+  year = {2025},
+  publisher = {Stark Publishing Group},
+  journal = {Hugging Face Model Hub}
+}
+```
+## Contributing
 Contributions are welcome! Please feel free to submit a Pull Request.