Spaces:
Sleeping
Sleeping
| # π¬ AI-Driven Polymer Aging Prediction and Classification System | |
| [](https://opensource.org/licenses/MIT) | |
| A research project developed as part of AIRE 2025. This system applies deep learning to Raman spectral data to classify polymer aging β a critical proxy for recyclability β using a fully reproducible and modular ML pipeline. | |
| --- | |
| ## π― Project Objective | |
| - Build a validated machine learning system for classifying polymer spectra (predict degradation levels as a proxy for recyclability) | |
| - Compare literature-based and modern CNN architectures (Figure2CNN vs. ResNet1D) on Raman spectral data | |
| - Ensure scientific reproducibility through structured diaignostics and artifact control | |
| - Support sustainability and circular materials research through spectrum-based classification. | |
| --- | |
| ## π§ Model Architectures | |
| | Model| Description | | |
| |------|-------------| | |
| | `Figure2CNN` | Baseline model from literature | | |
| | `ResNet1D` | Deeper candidate model with skip connections | | |
| > Both models support flexible input lengths; Figure2CNN relies on reshape logic, while ResNet1D uses native global pooling. | |
| --- | |
| ## π Project Structure (Cleaned and Current) | |
| ```text | |
| ml-polymer-recycling/ | |
| βββ datasets/ | |
| βββ models/ # Model architectures | |
| βββ scripts/ # Training, inference, utilities | |
| βββ outputs/ # Artifacts: models, logs, plots | |
| βββ docs/ # Documentation & reports | |
| βββ environment.yml # (local) Conda execution environment | |
| ``` | |
| <img width="1773" height="848" alt="ml-polymer-gitdiagram-0" src="https://github.com/user-attachments/assets/bb5d93dc-7ab9-4259-8513-fb680ae59d64" /> | |
| --- | |
| ## β Current Status | |
| | Track | Status | Test Accuracy | | |
| |-----------|----------------------|----------------| | |
| | **Raman** | β Active & validated | **87.81% Β± 7.59%** | | |
| | **FTIR** | βΈοΈ Deferred (modeling only) | N/A | | |
| **Note:** FTIR preprocessing scripts are preserved but inactive. Modeling work is deferred until a suitable architecture is identified. | |
| **Artifacts:** | |
| - `outputs/figure2_model.pth` | |
| - `outputs/resnet_model.pth` | |
| - `outputs/logs/raman_{model}_diagnostics.json` | |
| --- | |
| ## π¬ Key Features | |
| - β 10-Fold Stratified Cross-Validation | |
| - β CLI Training: `train_model.py` | |
| - β CLI Inference `run_inference.py` | |
| - β Output artifact naming per model | |
| - β Raman-only preprocessing with baseline correction, smoothing, normalization | |
| - β Structured diagnostics JSON (accuracies, confusion matrices) | |
| - β Canonical validation script (`validate_pipeline.sh`) confirms reproducibility of all core components | |
| --- | |
| **Environments:** | |
| ```bash | |
| # Local | |
| git checkout main | |
| conda env create -f environment.yml | |
| conda activate polymer_env | |
| # HPC | |
| git checkout hpc-main | |
| conda env create -f environment_hpc.yml | |
| conda activate polymer_env | |
| ``` | |
| ## π Sample Training & Inference | |
| ### Training (10-Fold CV) | |
| ```bash | |
| python scripts/train_model.py --model resnet --target-len 4000 --baseline --smooth --normalize | |
| ``` | |
| ### Inference (Raman) | |
| ```bash | |
| python scripts/run_inference.py --target-len 4000 | |
| --input datasets/rdwp/sample123.txt --model outputs/resnet_model.pth | |
| --output outputs/inference/prediction.txt | |
| ``` | |
| ### Inference Output Example: | |
| ```bash | |
| Predicted Label: 1 True Label: 1 | |
| Raw Logits: [[-569.544, 427.996]] | |
| ``` | |
| ### Validation Script (Raman Pipeline) | |
| ```bash | |
| ./validate_pipeline.sh | |
| # Runs preprocessing, training, inference, and plotting checks | |
| # Confirms artifact integrity and logs test results | |
| ``` | |
| --- | |
| ## π Dataset Resources | |
| | Type | Dataset | Source | | |
| |-------|---------|--------| | |
| | Raman | RDWP | [A Raman database of microplastics weathered under natural environments](https://data.mendeley.com/datasets/kpygrf9fg6/1) | | |
| | Datasets should be downloaded separately and placed here: | |
| ```bash | |
| datasets/ | |
| βββ rdwp/ | |
| βββ sample1.txt | |
| βββ sample2.txt | |
| βββ ... | |
| ``` | |
| These files are intentionally excluded from version control via `.gitignore` | |
| --- | |
| ## π Dependencies | |
| - `Python 3.10+` | |
| - `Conda, Git` | |
| - `PyTorch (CPU & CUDA)` | |
| - `Numpy, SciPy, Pandas` | |
| - `Scikit-learn` | |
| - `Matplotlib, Seaborn` | |
| - `ArgParse, JSON` | |
| --- | |
| ## π§βπ€βπ§ Contributors | |
| - **Dr. Sanmukh Kuppannagari** β Research Mentor | |
| - **Dr. Metin Karailyan** β Research Mentor | |
| - **Jaser H.** β AIRE 2025 Intern, Developer | |
| --- | |
| ## π― Strategic Expansion Objectives | |
| > Following Dr. Kuppannagariβs updated guidance, the project scope now extends beyond the Raman-only validated baseline. The roadmap defines three major expansion paths designed to broaden the systemβs capabilities and impact: | |
| 1. **Model Expansion: Multi-Model Dashboard** | |
| > The dashboard will evolve into a hub for multiple model architectures rather than being tied to a single baseline. Planned work includes: | |
| - **Retraining & Fine-Tuning**: Incorporating publicly available vision models and retraining them with the polymer dataset. | |
| - **Model Registry**: Automatically detecting available .pth weights and exposing them in the dashboard for easy selection. | |
| - **Side-by-Side Reporting**: Running comparative experiments and reporting each modelβs accuracy and diagnostics in a standardized format. | |
| - **Reproducible Integration**: Maintaining modular scripts and pipelines so each modelβs results can be replicated without conflict. | |
| This ensures flexibility for future research and transparency in performance comparisons. | |
| 2. **Image Input Modality** | |
| > The system will support classification on images as an additional modality, extending beyond spectra. Key features will include: | |
| - **Upload Support**: Users can upload single images or batches directly through the dashboard. | |
| - **Multi-Model Execution**: Selected models from the registry can be applied to all uploaded images simultaneously. | |
| - **Batch Results**: Output will be returned in a structured, accessible way, showing both individual predictions and aggregate statistics. | |
| - **Enhanced Feedback**: Outputs will include predicted class, model confidence, and potentially annotated image previews. | |
| This expands the system toward a multi-modal framework, supporting broader research workflows. | |
| 3. **FTIR Dataset Integration** | |
| > Although previously deferred, FTIR support will be added back in a modular, distinct fashion. Planned steps are: | |
| - **Dedicated Preprocessing**: Tailored scripts to handle FTIR-specific signal characteristics (multi-layer handling, baseline correction, normalization). | |
| - **Architecture Compatibility**: Ensuring existing and retrained models can process FTIR data without mixing it with Raman workflows. | |
| - **UI Integration**: Introducing FTIR as a separate option in the modality selector, keeping Raman, Image, and FTIR workflows clearly delineated. | |
| - **Phased Development**: Implementation details to be refined during meetings to ensure scientific rigor. | |
| This guarantees FTIR becomes a supported modality without undermining the validated Raman foundation. | |
| ## π Guiding Principles | |
| - **Preserve the Raman baseline** as the reproducible ground truth | |
| - **Additive modularity**: Models, images, and FTIR added as clean, distinct layers rather than overwriting core functionality | |
| - **Transparency & reproducibility**: All expansions documented, tested, and logged with clear outputs. | |
| - **Future-oriented design**: Workflows structured to support ongoing collaboration and successor-safe research. | |