Spaces:
Sleeping
Sleeping
File size: 7,755 Bytes
e484a46 cbc0b57 e484a46 cbc0b57 e484a46 e9c7c84 49e6e6e e484a46 8a41ab5 e484a46 8a41ab5 e9c7c84 8a41ab5 cbc0b57 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 |
# π¬ AI-Driven Polymer Aging Prediction and Classification System
[](https://opensource.org/licenses/MIT)
A research project developed as part of AIRE 2025. This system applies deep learning to Raman spectral data to classify polymer aging β a critical proxy for recyclability β using a fully reproducible and modular ML pipeline.
---
## π― Project Objective
- Build a validated machine learning system for classifying polymer spectra (predict degradation levels as a proxy for recyclability)
- Compare literature-based and modern CNN architectures (Figure2CNN vs. ResNet1D) on Raman spectral data
- Ensure scientific reproducibility through structured diaignostics and artifact control
- Support sustainability and circular materials research through spectrum-based classification.
---
## π§ Model Architectures
| Model| Description |
|------|-------------|
| `Figure2CNN` | Baseline model from literature |
| `ResNet1D` | Deeper candidate model with skip connections |
> Both models support flexible input lengths; Figure2CNN relies on reshape logic, while ResNet1D uses native global pooling.
---
## π Project Structure (Cleaned and Current)
```text
ml-polymer-recycling/
βββ datasets/
βββ models/ # Model architectures
βββ scripts/ # Training, inference, utilities
βββ outputs/ # Artifacts: models, logs, plots
βββ docs/ # Documentation & reports
βββ environment.yml # (local) Conda execution environment
```
<img width="1773" height="848" alt="ml-polymer-gitdiagram-0" src="https://github.com/user-attachments/assets/bb5d93dc-7ab9-4259-8513-fb680ae59d64" />
---
## β
Current Status
| Track | Status | Test Accuracy |
|-----------|----------------------|----------------|
| **Raman** | β
Active & validated | **87.81% Β± 7.59%** |
| **FTIR** | βΈοΈ Deferred (modeling only) | N/A |
**Note:** FTIR preprocessing scripts are preserved but inactive. Modeling work is deferred until a suitable architecture is identified.
**Artifacts:**
- `outputs/figure2_model.pth`
- `outputs/resnet_model.pth`
- `outputs/logs/raman_{model}_diagnostics.json`
---
## π¬ Key Features
- β
10-Fold Stratified Cross-Validation
- β
CLI Training: `train_model.py`
- β
CLI Inference `run_inference.py`
- β
Output artifact naming per model
- β
Raman-only preprocessing with baseline correction, smoothing, normalization
- β
Structured diagnostics JSON (accuracies, confusion matrices)
- β
Canonical validation script (`validate_pipeline.sh`) confirms reproducibility of all core components
---
**Environments:**
```bash
# Local
git checkout main
conda env create -f environment.yml
conda activate polymer_env
# HPC
git checkout hpc-main
conda env create -f environment_hpc.yml
conda activate polymer_env
```
## π Sample Training & Inference
### Training (10-Fold CV)
```bash
python scripts/train_model.py --model resnet --target-len 4000 --baseline --smooth --normalize
```
### Inference (Raman)
```bash
python scripts/run_inference.py --target-len 4000
--input datasets/rdwp/sample123.txt --model outputs/resnet_model.pth
--output outputs/inference/prediction.txt
```
### Inference Output Example:
```bash
Predicted Label: 1 True Label: 1
Raw Logits: [[-569.544, 427.996]]
```
### Validation Script (Raman Pipeline)
```bash
./validate_pipeline.sh
# Runs preprocessing, training, inference, and plotting checks
# Confirms artifact integrity and logs test results
```
---
## π Dataset Resources
| Type | Dataset | Source |
|-------|---------|--------|
| Raman | RDWP | [A Raman database of microplastics weathered under natural environments](https://data.mendeley.com/datasets/kpygrf9fg6/1) |
| Datasets should be downloaded separately and placed here:
```bash
datasets/
βββ rdwp/
βββ sample1.txt
βββ sample2.txt
βββ ...
```
These files are intentionally excluded from version control via `.gitignore`
---
## π Dependencies
- `Python 3.10+`
- `Conda, Git`
- `PyTorch (CPU & CUDA)`
- `Numpy, SciPy, Pandas`
- `Scikit-learn`
- `Matplotlib, Seaborn`
- `ArgParse, JSON`
---
## π§βπ€βπ§ Contributors
- **Dr. Sanmukh Kuppannagari** β Research Mentor
- **Dr. Metin Karailyan** β Research Mentor
- **Jaser H.** β AIRE 2025 Intern, Developer
---
## π― Strategic Expansion Objectives
> Following Dr. Kuppannagariβs updated guidance, the project scope now extends beyond the Raman-only validated baseline. The roadmap defines three major expansion paths designed to broaden the systemβs capabilities and impact:
1. **Model Expansion: Multi-Model Dashboard**
> The dashboard will evolve into a hub for multiple model architectures rather than being tied to a single baseline. Planned work includes:
- **Retraining & Fine-Tuning**: Incorporating publicly available vision models and retraining them with the polymer dataset.
- **Model Registry**: Automatically detecting available .pth weights and exposing them in the dashboard for easy selection.
- **Side-by-Side Reporting**: Running comparative experiments and reporting each modelβs accuracy and diagnostics in a standardized format.
- **Reproducible Integration**: Maintaining modular scripts and pipelines so each modelβs results can be replicated without conflict.
This ensures flexibility for future research and transparency in performance comparisons.
2. **Image Input Modality**
> The system will support classification on images as an additional modality, extending beyond spectra. Key features will include:
- **Upload Support**: Users can upload single images or batches directly through the dashboard.
- **Multi-Model Execution**: Selected models from the registry can be applied to all uploaded images simultaneously.
- **Batch Results**: Output will be returned in a structured, accessible way, showing both individual predictions and aggregate statistics.
- **Enhanced Feedback**: Outputs will include predicted class, model confidence, and potentially annotated image previews.
This expands the system toward a multi-modal framework, supporting broader research workflows.
3. **FTIR Dataset Integration**
> Although previously deferred, FTIR support will be added back in a modular, distinct fashion. Planned steps are:
- **Dedicated Preprocessing**: Tailored scripts to handle FTIR-specific signal characteristics (multi-layer handling, baseline correction, normalization).
- **Architecture Compatibility**: Ensuring existing and retrained models can process FTIR data without mixing it with Raman workflows.
- **UI Integration**: Introducing FTIR as a separate option in the modality selector, keeping Raman, Image, and FTIR workflows clearly delineated.
- **Phased Development**: Implementation details to be refined during meetings to ensure scientific rigor.
This guarantees FTIR becomes a supported modality without undermining the validated Raman foundation.
## π Guiding Principles
- **Preserve the Raman baseline** as the reproducible ground truth
- **Additive modularity**: Models, images, and FTIR added as clean, distinct layers rather than overwriting core functionality
- **Transparency & reproducibility**: All expansions documented, tested, and logged with clear outputs.
- **Future-oriented design**: Workflows structured to support ongoing collaboration and successor-safe research.
|