devjas1 commited on
Commit
df6f1ab
·
1 Parent(s): 2906ad6

(DEPLOY/CHORE): space-deploy branch cleaned up

Browse files

- Slimmed to avoid unnecessary rebuild triggers
- Rebuilt automatically by Spaces on push

MANIFEST.git DELETED
@@ -1,26 +0,0 @@
1
- 100644 cfc04f24571aecd66e900d29bd94a311bb2e1111 0 .gitignore
2
- 100644 261eeb9e9f8b2b4b0d119366dda99c6fd7d35c64 0 LICENSE
3
- 100644 1bab8241aa1dd7325c43ed937510646c9de50759 0 README.md
4
- 100644 2e5679a509eed8c11a522df1d5e7fc89f2a95da6 0 app/ui_app.py
5
- 100644 c18dd8d83ceed1806b50b0aaa46beb7e335fff13 0 backend/.gitignore
6
- 100644 a23d89493ce1a6557368b8424e00d2b0a564deeb 0 backend/inference_utils.py
7
- 100644 b1eb4466d4ef220de29438d4b32de8fea1950687 0 backend/main.py
8
- 100644 df16ea87b7dfe3601ed0aa15fa8a563549b71502 0 dashboard/app.py
9
- 100644 08c771b09d2833a50c101294e7e56f24068f1fed 0 docs/BACKEND_MIGRATION_LOG.md
10
- 100644 ededbc2c9d95767d04c0d6a63d6e9b2cd77432d9 0 docs/ENVIRONMENT_GUIDE.md
11
- 100644 9838325b68b604409bb463a449fd850b57674342 0 docs/HPC_REMOTE_SETUP.md
12
- 100644 02e4813140e43533cabe01bc2a816f3740260f76 0 docs/LICENSE
13
- 100644 56b984636170bd2178c77c4933f753e2afb8a65f 0 docs/PROJECT_TIMELINE.md
14
- 100644 f6376d986813734264e70c6cbf45fbf2c40f82c1 0 docs/REPRODUCIBILITY.md
15
- 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0 models/__init__.py
16
- 100644 704266bdfe413b4b0a77879a1c66878ce6eab0dd 0 models/figure2_cnn.py
17
- 100644 9104b59d82ccd5d36ad4ec47f57e3b5ca0fc80aa 0 models/resnet_cnn.py
18
- 100644 ce43e850e5f7890d9e324e8c41e3b8f9fdc3a832 0 outputs/resnet_model.pth
19
- 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0 scripts/__init__.py
20
- 100644 c1f269413e6c89dbef52221db02b5490e7e8d95d 0 scripts/discover_raman_files.py
21
- 100644 6cc73d3ce4e95109ec51583c9831e4f72b4c8a82 0 scripts/list_spectra.py
22
- 100644 136ca3d29e3406a9bed641e19c54da966068e95e 0 scripts/plot_spectrum.py
23
- 100644 c59c21dfd5359e6a3a34199088ff15199c0649f5 0 scripts/preprocess_dataset.py
24
- 100644 77267ae19fcd0fb0e11c664d451d7b7395cd3f30 0 scripts/run_inference.py
25
- 100644 a33fc333522a302cfad5e4139202f3e9cf416921 0 scripts/train_model.py
26
- 100644 5aa6bedb24c303e2fed8554e40a2bbf965616200 0 validate_pipeline.sh
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/BACKEND_MIGRATION_LOG.md DELETED
@@ -1,60 +0,0 @@
1
- # BACKEND_MIGRATION_LOG.md
2
-
3
- ## 📌 Overview
4
-
5
- This document tracks the migration of the inference logic from a monolithic Streamlit app to a modular, testable FastAPI backend for the Polymer AI Aging Prediction System
6
-
7
- ---
8
-
9
- ## ✅ Completed Work
10
-
11
- ## 1. Initial Setup
12
-
13
- - Installed `fastapi`, `uvicorn`, and set up basic FastAPI app in `main.py`.
14
-
15
- ### 2. Modular Inference Utilities
16
-
17
- - Moved `load_model()` and `run_inference()` into `backend/inference_utils.py`.
18
- - Separated model configuration for Figure2CNN and ResNet1D.
19
- - Applied proper preprocessing (resampling, normalization) inside `run_inference()`.
20
-
21
- ### 3. API Endpoint
22
-
23
- - `/infer` route accepts JSON payloads with `model_name` and `spectrum`.
24
- - Returns: full prediction dictionary with class index, logits, and label map.
25
-
26
- ### 4. Validation + Testing
27
-
28
- - Tested manually in Python REPL.
29
- - Tested via `curl`:
30
-
31
- ```bash
32
- curl -X POST -H "Content-Type: application/json" -d @backend/test_payload.json
33
- ```
34
-
35
- ---
36
-
37
- ## 🛠 Fixes & Breakpoints Resolved
38
-
39
- - ✅ Fixed incorrect model path ("models/" → "outputs/")
40
- - ✅ Corrected unpacking bug in `main.py` → now returns full result dict
41
- - ✅ Replaced invalid `tolist()` call on string-typed logits
42
- - ✅ Manually verified output from CLI and curl
43
-
44
- ---
45
-
46
- ## 🧪 Next Focus: Robustness Testing
47
-
48
- - Invalid `model_name` handling
49
- - Short/empty spectrum validation
50
- - ResNet model loading test
51
- - JSON schema validation for input
52
- - Unit tests via `pytest` or integration test runner
53
-
54
- ---
55
-
56
- ## 🔄 Future Enhancements
57
-
58
- - Modular model registry (for adding more model classes easily)
59
- - Add OpenAPI schema and example payloads for documentation
60
- - Enable batch inference or upload support
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/ENVIRONMENT_GUIDE.md DELETED
@@ -1,119 +0,0 @@
1
- # 🔧 Environment Management Guide
2
-
3
- ## AI-Driven Polymer Aging Prediction and Classification System
4
-
5
- **Maintainer:** Jaser Hasan
6
- **Snapshot:** `@artifact-isolation-complete`
7
- **Last Updated:** 2025-06-26
8
- **Environments:** Conda (local) + venv on `/scratch` (HPC)
9
-
10
- ---
11
-
12
- ## 🧠 Overview
13
-
14
- This guide describes how to set up and activate the Python environments required to run the Raman pipeline on both:
15
-
16
- - **Local Systems:** (Mac/Windows/Linux)
17
- - **CWRU Pioneer HPC:** (GPU nodes, venv based)
18
-
19
- This guide documents the environment structure and the divergence between the **local Conda environment (`polymer_env`)** and the **HPC Python virtual environment (`polymer_venv`)**.
20
-
21
- ---
22
-
23
- ## 📁 Environment Overview
24
-
25
- | Platform | Environment | Manager | Path | Notes |
26
- |----------|-------------|---------|------|-------|
27
- | Local (dev) | `polymer_env` | **Conda** | `~/miniconda3/envs/polymer_env` | Primary for day-to-day development |
28
- | HPC (Pioneer) | `polymer_venv` | **venv** (Python stdlib) | `/scratch/users/<case_id>/polymer_project/polymer_venv` | Created under `/scratch` to avoid `/home` quota limits |
29
-
30
- ---
31
-
32
- ## 💻 Local Installation (Conda)
33
-
34
- ```bash
35
-
36
- git clone https://github.com/dev-jaser/ai-ml-polymer-aging-prediction.git
37
- cd polymer_project
38
- conda env create -f environment.yml
39
- conda activate polymer_env
40
- python -c "import torch, sys; print('PyTorch:', torch.__version__, 'Python', sys.version")
41
- ```
42
-
43
- > **Tip:** Keep Conda updated ('conda update conda') to reduce solver errors issues.
44
-
45
- ---
46
-
47
- ## 🚀 CWRU Pioneer HPC Setup (venv + pip)
48
-
49
- > Conda is intentionally **not** used on Pioneer due to prior codec and disk-quota
50
-
51
- ### 1. Load Python Module
52
-
53
- ```bash
54
-
55
- module purge
56
- module load Python/3.12.3-GCCcore-13.2.0
57
- ```
58
-
59
- ### 2. Create Working Directory in `/scratch`
60
-
61
- ```bash
62
-
63
- mkdir -p /scratch/users/<case_id>/polymer_project_runtime
64
- cd /scratch/users/<case_id>/polymer_project_runtime
65
- git clone https://github.com/dev-jaser/ai-ml-polymer-aging-prediction.git
66
- ```
67
-
68
- ### 3. Create & Activate Virtual Environment
69
-
70
- ```bash
71
-
72
- python3 -m venv polymer_env
73
- source polymer_env/bin/activate
74
- ```
75
-
76
- ### 4. Install Dependencies
77
-
78
- ```bash
79
-
80
- pip install --upgrade pip
81
- pip install -r environment_hpc.yml # Optimized dependencies list for Pioneer
82
- ```
83
-
84
- (Optional) Save a reproducible freeze:
85
-
86
- ```bash
87
-
88
- pip freeze > requirements_hpc.txt
89
- ```
90
-
91
- ---
92
-
93
- ## ✅ Supported CLI Workflows (Raman-only)
94
-
95
- | Script | Purpose |
96
- |--------|---------|
97
- | `scripts/train_model.py` | 10-fold CV training ('--model figure2' or 'resnet') |
98
- | `scripts/run_inference.py` | Predict single Raman spectrum |
99
- | `scripts/preprocess_dataset.py` | Apply full preprocessing chain |
100
- | `scripts/plot_spectrum.py` | Quick spectrum visualization (.png) |
101
-
102
- > FTIR-related scripts are archived and *not installed* into the active environments.
103
-
104
- ---
105
-
106
- ## 🔁 Cross-Environment Parity
107
-
108
- - Package sets in environment.yml and environment_hpc.yml are aligned.
109
- - Diagnostics JSON structure and checkpoint filenames are identical on both systems.
110
- - Training commands are copy-paste compatible between local shell and HPC login shell.
111
-
112
- ---
113
-
114
- ## 📦 Best Practices
115
-
116
- - **Local:** use Conda for rapid iteration, notebook work, small CPU inference.
117
- - **HPC:** use venv in `/scratch` for GPU training, never install large packages into `/home` (`'~/'`)
118
- - Keep environments lightweight; remove unused libraries to minimize rebuild time.
119
- - Update this guide if either environment definition changes.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/HPC_REMOTE_SETUP.md DELETED
@@ -1,111 +0,0 @@
1
- # Accessing CWRU Pioneer HPC System Remotely via SSH (PuTTY)
2
-
3
- ## Step 1: Set up DUO Authentication for VPN Access
4
-
5
- ### 1. Enroll in DUO (if not already done):
6
-
7
- > - Go to [case.edu/utech/duo](https://case.edu/utech/duo) and follow instructions to register your device (phone/tablet/hardward token)
8
- > - This is required for FortiClient VPN authentication.
9
-
10
- ---
11
-
12
- ## Step 2: Install and Configure FortiClient VPN
13
-
14
- ### 1. Download FortiClient VPN:
15
-
16
- - Visit [case.edu/utech/help/forticlient-vpn](https://case.edu/utech/help/forticlient-vpn)
17
- - Download the **FortiClient VPN** software for your specific device.
18
-
19
- ### 2. Install & Configure VPN
20
-
21
- - Run the installer and complete setup
22
- - Open FortiClient and configure new connection:
23
- - **Connection Name**: `CWRU VPN` (or any name)
24
- - **Remote Gateway**: `vpn.case.edu`
25
- - **Customize Port**: `443`
26
- - Enable "**Save Credentials**" (optional)
27
- - Click **Save**
28
-
29
- ### 3. Connect to VPN:
30
-
31
- - Enter your **CWRU Network ID** (e.g., `jxh369`) and password.
32
- - Complete **DUO two-factor authentication** when prompted (approve via phone/device)
33
- - Once connected, you'll see a confirmation message.
34
-
35
- ---
36
-
37
- ## Step 3: Install PuTTY (SSH Client)
38
-
39
- ### 1. Download PuTTY:
40
-
41
- - If not installed, download from [https://www.putty.org](https://www.putty.org)
42
- - Run the installer (or use the portable version).
43
-
44
- ## 2. Open PuTTY:
45
-
46
- - Launch PuTTY from the Start Menu
47
-
48
- ---
49
-
50
- ## Step 4: Configure PuTTY for Pioneer HPC
51
-
52
- ### 1. Enter Connection Details:
53
-
54
- - **Host Name (or IP address)**: `pioneer.case.edu`
55
- - **Port**: `22`
56
- - **Connection Type**: SSH
57
-
58
- ### 2. Optional: Save Session (for future use):
59
-
60
- - Under "**Saved Sessions**", type `Pioneer HPC` and click **Save**
61
-
62
- ### 3. Click "Open" to initiate the connection
63
-
64
- ---
65
-
66
- ## Step 5: Log In via SSH
67
-
68
- ### 1. Enter Credentials:
69
-
70
- - When prompted, enter your **CWRU Network ID** (e.g., `jxh369`)
71
- - Enter your password (same as VPN/CWRU login)
72
- - Complete DUO authentication again if required
73
-
74
- ### 2. Successful Login:
75
-
76
- - You should now see the **Pioneer HPC command-line interface**
77
-
78
- ---
79
-
80
- ## Step 6: Disconnecting
81
-
82
- ### 1. Exit SSH Session:
83
-
84
- - Type `exit` or `logout` in the terminal
85
-
86
- ### 2. Disconnect VPN:
87
-
88
- - Close PuTTY and disconnect FortiClient VPN when done.
89
-
90
- ---
91
-
92
- ## Troubleshooting Tips
93
-
94
- ### VPN Fails?
95
-
96
- - Ensure DUO is set up correctly
97
- - Try reconnecting or restarting FortiClient VPN
98
-
99
- ### PuTTY Connection Refused?
100
-
101
- - Verify VPN is active (`vpn.case.edu` shows "**Connected**")
102
- - Check `pioneer.case.edu` and port `22` are correct
103
-
104
- ## DUO Not Prompting?
105
-
106
- - Ensure your device is registered in DUO
107
-
108
-
109
- ## Extra Help on CWRU HPC Systems
110
-
111
- [https://sites.google.com/a/case.edu/hpcc/](https://sites.google.com/a/case.edu/hpcc/)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/LICENSE DELETED
@@ -1,21 +0,0 @@
1
- MIT License
2
-
3
- Copyright (c) 2025 dev-jaser
4
-
5
- Permission is hereby granted, free of charge, to any person obtaining a copy
6
- of this software and associated documentation files (the "Software"), to deal
7
- in the Software without restriction, including without limitation the rights
8
- to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
- copies of the Software, and to permit persons to whom the Software is
10
- furnished to do so, subject to the following conditions:
11
-
12
- The above copyright notice and this permission notice shall be included in all
13
- copies or substantial portions of the Software.
14
-
15
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
- AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
- OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
- SOFTWARE.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/PROJECT_TIMELINE.md DELETED
@@ -1,156 +0,0 @@
1
- # 📅 PROJECT_TIMELINE.md
2
-
3
- ## AI-Driven Polymer Aging Prediction and Classification System
4
-
5
- **Intern:** Jaser Hasan
6
-
7
- ### ✅ PHASE 1 – Project Kickoff and Faculty Guidance
8
-
9
- **Tag:** `@project-init-complete`
10
-
11
- Received first set of research tasks from Prof. Kuppannagari
12
-
13
- - Reeived research plan
14
- - Objectives defined: download datasets, analyze spectra, implement CNN, run initial inference
15
-
16
- ---
17
-
18
- ### ✅ PHASE 2 – Dataset Acquisition (Local System)
19
-
20
- **Tag:** `@data-downloaded`
21
-
22
- - Downloaded Raman `.txt` (RDWP) and FTIR `.csv` data (polymer packaging)
23
- - Structured into:
24
- - `datasets/rdwp`
25
- - `datasets/ftir`
26
-
27
- ---
28
-
29
- ### ✅ PHASE 3 – Data Exploration & Spectral Validation
30
-
31
- **Tag:** `@data-exploration-complete`
32
-
33
- - Built plotting tools for Raman and FTIR
34
- - Validated spectrum structure, removed malformed samples
35
- - Observed structural inconsistencies in FTIR multi-layer grouping
36
-
37
- ---
38
-
39
- ### ✅ PHASE 4 – Preprocessing Pipeline Implementation
40
-
41
- **Tag:** `@data-prep`
42
-
43
- - Implemented `preprocess_dataset.py` for Raman
44
- - Applied: Resampling -> Baseline correction -> Smoothing -> Normalization
45
- - Confirmed reproducible input/output behavior and dynamic CLI control
46
-
47
- ### ✅ PHASE 5 – Figure2CNN Architecture Build
48
-
49
- **Tag:** `@figure2cnn-complete`
50
-
51
- - Constructed `Figure2CNN` modeled after Figure 2 CNN from research paper
52
- - `Figure2CNN`: 4 conv layers + 3 FC layers
53
- - Verified dynamic input length handling (e.g., 500, 1000, 4000)
54
-
55
- ---
56
-
57
- ### ✅ PHASE 6 – Local Training and Inference
58
-
59
- **Tag:** `@figure2cnn-training-local`
60
-
61
- - Trained Raman models locally (FTIR now deferred)
62
- - Canonical Raman accuracy: **87.29% ± 6.30%**
63
- - FTIR accuracy results archived and excluded from current validation
64
- - CLI tools for training, inference, plotting implemented
65
-
66
- ---
67
-
68
- ### ✅ PHASE 7 – Reproducibility and Documentation Setup
69
-
70
- **Tag:** `@project-docs-started`
71
-
72
- - Authored `README.md`, `PROJECT_REPORT.md`, and `ENVIRONMENT_GUIDE.md`
73
- - Defined reproducibility guidelines
74
- - Standardized project structure and versioning
75
-
76
- ---
77
-
78
- ### ✅ PHASE 8 – HPC Access and Venv Strategy
79
-
80
- **Tag:** `@hpc-login-successful`
81
-
82
- - Logged into CWRU Pioneer (SSH via PuTTY)
83
- - Setup up FortiClient VPN as it is required to access Pioneer remotely
84
- - Explored module system; selected venv over Conda for compatibility
85
- - Loaded Python 3.12.3 + created `polymer_env`
86
-
87
- ---
88
-
89
- ### ✅ PHASE 9 – HPC Environment Sync
90
-
91
- **Tag:** `@venv-alignment-complete`
92
-
93
- - Created `environment_hpc.yml`
94
- - Installed dependencies into `polymer_env`
95
- - Validated imports, PyTorch installation, and CLI script execution
96
-
97
- ---
98
-
99
- ### ✅ PHASE 10 – Full Instruction Validation on HPC
100
-
101
- **Tag:** `@prof-k-instruction-validation-complete`
102
-
103
- - Ran Raman preprocessing and plotting scripts
104
- - Executed `run_inference.py` with CLI on raw Raman `.txt` file
105
- - Verified consistent predictions and output logging across local and HPC
106
-
107
- ---
108
-
109
- ### ✅ PHASE 11 – FTIR Path Paused, Raman Declared Primary
110
-
111
- **Tag:** `@raman-pipeline-focus-milestone`
112
-
113
- - FTIR modeling formally deferred
114
- - FTIR preprocessing scripts preserved and archived for future use
115
- - All resources directed toward Raman pipeline finalization
116
- - Saliency, FTIR ingestion, and `train_ftir_model.py` archived
117
-
118
- ---
119
-
120
- ### ✅ PHASE 12 – ResNet1D Prototyping & Benchmark Setup
121
-
122
- **Tag:** `@resnet-prototype-complete`
123
-
124
- - Built `ResNet1D` architecture in `models/resnet_cnn.py`
125
- - Integrated `train_model.py` via `--model resnet`
126
- - Ran initial CV training with successful results
127
-
128
- ---
129
-
130
- ### ✅ PHASE 13 – Output Artifact Isolation
131
-
132
- **Tag:** `@artifact-isolation-complete`
133
-
134
- - Patched `train_model.py` to save:
135
- - `figure2_model.pth`, `resnet_model.pth`
136
- - `raman_figure2_diagnostics.json`. `raman_resnet_diagnostics.json`
137
- - Prevented all overwrites by tying output filenames to `args.model`
138
- - Snapshotted as reproducibility milestone. Enabled downstream validation harness.
139
-
140
- ### ✅ PHASE 14 – Canonical Validation Achieved
141
-
142
- **Tag:** `@validation-loop-complete`
143
-
144
- - Created `validate_pipeline.sh` to verify preprocessing, training, inferece, plotting
145
- - Ran full validation using `Figure2CNN` with reproducible CLI config
146
- - All ouputs verified: logs, artifacts, predictions, plots
147
- - Declared Raman pipeline scientifically validated and stable
148
-
149
- ---
150
-
151
- ### ⏭️ NEXT - Results Analysis & Finalization
152
-
153
- - Analyze logged diagnostics for both models
154
- - Conduct optional hyperparameter tuning (batch size, LR)
155
- - Begin deliverable prep: visuals, posters, cards
156
- - Resume FTIR work only after Raman path is fully stablized and documented & open FTIR conceptual error is resolved
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/REPRODUCIBILITY.md DELETED
@@ -1,132 +0,0 @@
1
- # 📚 REPRODUCIBILITY.md
2
-
3
- *AI-Driven Polymer Aging Prediction & Classification System*
4
- *(Canonical Raman-only Pipeline)*
5
-
6
- > **Purpose**
7
- > A single document that lets any new user clone the repo, arquire the dataset, recreate the conda environment, and generate the validated Raman pipeline artifacts.
8
-
9
- ---
10
-
11
- ## 1. System Requirements
12
-
13
- | Component | Minimum Version | Notes |
14
- |-----------|-----------------|-------|
15
- | Python | 3.10+ | Conda recommended |
16
- | Git | 2.30+ | Any modern version |
17
- | Conda | 23.1+ | Mamba also fine |
18
- | OS | Linux / MacOS / Windows | CPU run (no GPU needed) |
19
- | Disk | ~1 GB | Dataset + artifacts |
20
-
21
- ---
22
-
23
- ## 2. Clone Repository
24
-
25
- ```bash
26
- git clone https://github.com/dev-jaser/ai-ml-polymer-aging-prediction.git
27
- cd ai-ml-polymer-aging-prediction
28
- git checkout main
29
- ```
30
-
31
- ---
32
-
33
- ## 3. Create & Activate Conda Environment
34
-
35
- ```bash
36
- conda env create -f environment.yml
37
- conda activate polymer_env
38
- ```
39
-
40
- > **Tip:** If you already created `polymer_env` just run `conda activate polymer_env`
41
-
42
- ---
43
-
44
- ## 4. Download RDWP Raman Dataset
45
-
46
- 1. Visit https://data.mendeley.com/datasets/kpygrf9fg6/1
47
- 2. Download the archive (**RDWP.zip or similar**) by clicking `Download Add 10.3 MB`
48
- 3. Extract all `*.txt` Raman files into:
49
-
50
- ```bash
51
- ai-ml-polymer-aging-prediction/datasets/rdwp
52
- ```
53
-
54
- 4. Quick sanity check:
55
-
56
- ```bash
57
- ls datasets/rdwp | grep ".txt" | wc -l # -> 170 + files expected
58
- ```
59
-
60
- ---
61
-
62
- ## 5. Validate the Entire Pipeline
63
-
64
- Run the canonical smoke-test harness:
65
-
66
- ```bash
67
- ./validate_pipeline.sh
68
- ```
69
-
70
- Successful run prints:
71
-
72
- ```bash
73
- [PASS] Preprocessing
74
- [PASS] Training & artificats
75
- [PASS] Inference
76
- [PASS] Plotting
77
- All validation checks passed!
78
- ```
79
-
80
- Artifacts created:
81
-
82
- ```bash
83
- outputs/figure2_model.pth
84
- outputs/logs/raman_figure2_diagnostics.json
85
- outputs/inference/test_prediction.json
86
- outputs/plots/validation_plot.png
87
- ```
88
-
89
- ---
90
-
91
- ## 6. Optional: Train ResNet Variant
92
-
93
- ```python
94
- python scripts/train_model.py --model resnet --target-len 4000 --baseline --smooth --normalize
95
- ```
96
-
97
- Check that these exist now:
98
-
99
- ```bash
100
- outputs/resnet_model.pth
101
- outputs/logs/raman_resnet_diagnostics.json
102
- ```
103
-
104
- ---
105
-
106
- ## 7. Clean-up & Re-Run
107
-
108
- To re-run from a clean state:
109
-
110
- ```bash
111
- rm -rf outputs/*
112
- ./validate_pipeline.sh
113
- ```
114
-
115
- All artifacts will be regenerated.
116
-
117
- ---
118
-
119
- ## 8. Troubleshooting
120
-
121
- | Symptom | Likely Cause | Fix |
122
- |---------|--------------|-----|
123
- | `ModuleNotFoundError` during scripts| `conda activate polymer_env` not done | Activate env|
124
- | `CUDA not available` warning | Running on CPU | Safe to ignore |
125
- | Fewer than 170 files in `datasets/rdwp` | Incomplete extract | Re-download archive |
126
- | `validate_pipeline.sh: Permission denied` | Missing executable bit | `chmod +x validated_pipeline.sh` |
127
-
128
- ---
129
-
130
- ## 9. Contact
131
-
132
- For issues or questions, open an Issue in the GitHub repo or contact @dev-jaser
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/sprint_log.md DELETED
@@ -1,31 +0,0 @@
1
- # Sprint Log
2
-
3
- ## @model-expansion-preflight-2025-08-21
4
- **Goal:** Reinforce training script contracts and registry hook without behavior changes.
5
- **Changes:**
6
- - Reproducibility seeds (python/numpy/torch/cuda).
7
- - Optional cuDNN deterministic settings.
8
- -Typo fix: "Reseample" -> "Resample".
9
- - Diagnostics fix: per-fold accuracy logs use correct variable.
10
- - Explicit dtypes in TensorDataset (float32/long).
11
- **Tests:**
12
- - Preprocess: ✅
13
- - Train (figure2, 1 epoch): ✅
14
- - Inference smoke: ✅
15
- **Notes:** Baseline intact; high CV variance due to class imbalance recorded for later migration.
16
-
17
- ## @model-expansion-registry-2025-08-21
18
- **Goal:** Make model lookup a single source of truth and expose dynamic choices for CLI/infra.
19
- **Changes:**
20
- - Added `models/registry.py` with `choices()` and `build()` helpers.
21
- - `scripts/train_model.py` imports registry, uses `choices()` for argparse and `build()` for contruction.
22
- - Removed direct model selection logic from training script.
23
- **Tests:**
24
- - Train (figure2) via registry: ✅
25
- - Inference unchanged paths: ✅
26
- **Notes:** Artifacts remain `outputs/{model}_model.pth` to avoid breaking validator; inference arch flag to be added next.
27
- ## @model-expansion-resnet18vision-2025-08-21
28
- **Goal:** Introduce a second architecture and prove multi-model training/inference via shared registry.
29
- **Changes:** `models/resnet18_vision.py` (1D), registry entry, `run_inference.py --arch`.
30
- **Tests:** Train (1 epoch) -> `outputs/resnet18vision_model.pth`; Inference JSON ✅
31
- **Notes:** Backward compatibility preserved (`--arch` defaults to figure2).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
validate_pipeline.sh DELETED
@@ -1,67 +0,0 @@
1
- #!/usr/bin/env bash
2
- # ===========================================
3
- # validate_pipeline.sh — Canonical Smoke Test
4
- # AI-Driven Polymer Aging Prediction System
5
- # Requires: conda (or venv) already installed
6
- # ===========================================
7
-
8
- set -euo pipefail
9
- RED='\033[0;31m'
10
- GRN='\033[0;32m'
11
- YLW='\033[1;33m'
12
- NC='\033[0m'
13
-
14
- die() {
15
- echo -e "${RED}[FAIL] $1${NC}"
16
- exit 1
17
- }
18
- pass() { echo -e "${GRN}[PASS] $1${NC}"; }
19
-
20
- echo -e "${YLW}>>> Activating environment...${NC}"
21
- source "$(conda info --base)/etc/profile.d/conda.sh"
22
- conda activate polymer_env || die "conda env 'polymer_env' not found"
23
-
24
- root_dir="$(dirname "$(readlink -f "$0")")"
25
- cd "$root_dir" || die "repo root not found"
26
-
27
- # ---------- Step 1: Preprocessing ----------
28
- echo -e "${YLW}>>> Step 1: Preprocessing${NC}"
29
- python scripts/preprocess_dataset.py datasets/rdwp \
30
- --target-len 500 --baseline --smooth --normalize |
31
- grep -q "X shape:" || die "preprocess_dataset.py failed"
32
- pass "Preprocessing"
33
-
34
- # ---------- Step 2: CV Training (Figure2) ----------
35
- mkdir -p outputs outputs/logs || true
36
- # Optional: skip gracefully if dataset is not present
37
- if [ ! -d "datasets/rdwp" ] || [ -z "$(find datasets/rdwp -maxdepth 1 -name '*.txt' 2>/dev/null)" ]; then
38
- echo -e "${YLW}{SKIP} Training (no datasets/rdwp/*.txt found)${NC}"
39
- else
40
- echo -e "${YLW}>>> Step 2: 10-Fold CV Training${NC}"
41
- python scripts/train_model.py \
42
- --target-len 500 --baseline --smooth --normalize \
43
- --model figure2
44
- [[ -f outputs/figure2_model.pth ]] || die "model .pth not found"
45
- [[ -f outputs/logs/raman_figure2_diagnostics.json ]] || die "diagnostics JSON not found"
46
- pass "Training & artifacts"
47
- fi
48
-
49
- # ---------- Step 3: Inference ----------
50
- echo -e "${YLW}>>> Step 3: Inference${NC}"
51
- python scripts/run_inference.py \
52
- --target-len 500 \
53
- --input datasets/rdwp/wea-100.txt \
54
- --model outputs/figure2_model.pth \
55
- --output outputs/inference/test_prediction.json
56
- [[ -f outputs/inference/test_prediction.json ]] || die "inference output missing"
57
- pass "Inference"
58
-
59
- # ---------- Step 4: Spectrum Plot ----------
60
- echo -e "${YLW}>>> Step 4: Plot Spectrum${NC}"
61
- mkdir -p outputs/inference || true
62
- python scripts/plot_spectrum.py --input datasets/rdwp/sta-10.txt
63
- [[ $? -eq 0 ]] || die "plot_spectrum.py failed"
64
- pass "Plotting"
65
-
66
-
67
- echo -e "${GRN}All validation checks passed!${NC}"