Richard Young commited on
Commit
c43a81f
·
0 Parent(s):

Initial commit for Hugging Face Space

Browse files
Files changed (8) hide show
  1. .gitattributes +7 -0
  2. .gitignore +22 -0
  3. README.md +106 -0
  4. app.py +563 -0
  5. find_bad_images.py +1670 -0
  6. rat_finder.py +1223 -0
  7. requirements.txt +8 -0
  8. steg_embedder.py +337 -0
.gitattributes ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ *.jpg filter=lfs diff=lfs merge=lfs -text
2
+ *.png filter=lfs diff=lfs merge=lfs -text
3
+ *.jpeg filter=lfs diff=lfs merge=lfs -text
4
+ *.gif filter=lfs diff=lfs merge=lfs -text
5
+ *.pdf filter=lfs diff=lfs merge=lfs -text
6
+ *.zip filter=lfs diff=lfs merge=lfs -text
7
+ docs/*.jpg filter=lfs diff=lfs merge=lfs -text
.gitignore ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Image files
2
+ *.jpg
3
+ *.jpeg
4
+ *.JPG
5
+ *.JPEG
6
+
7
+ # System files
8
+ .DS_Store
9
+ Thumbs.db
10
+
11
+ # Python
12
+ __pycache__/
13
+ *.py[cod]
14
+ *.class
15
+ .env
16
+ .venv
17
+ env/
18
+ venv/
19
+ ENV/
20
+ env.bak/
21
+ venv.bak/
22
+
README.md ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: 2PAC Picture Analyzer & Corruption Killer
3
+ emoji: 🔫
4
+ colorFrom: purple
5
+ colorTo: blue
6
+ sdk: gradio
7
+ sdk_version: 4.44.0
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ ---
12
+
13
+ # 🔫 2PAC: Picture Analyzer & Corruption Killer
14
+
15
+ **Advanced image security and steganography toolkit**
16
+
17
+ ## Features
18
+
19
+ ### 🔒 Hide Secret Data
20
+ Invisibly hide text messages inside images using **LSB (Least Significant Bit) steganography**:
21
+ - Hide text of any length (capacity depends on image size)
22
+ - Optional password encryption for added security
23
+ - Adjustable LSB depth (1-4 bits per channel)
24
+ - PNG output preserves hidden data perfectly
25
+
26
+ ### 🔍 Detect & Extract Hidden Data
27
+ Advanced steganography detection using **RAT Finder** technology:
28
+ - **ELA (Error Level Analysis)** - Highlights compression artifacts
29
+ - **LSB Analysis** - Detects randomness in least significant bits
30
+ - **Histogram Analysis** - Finds statistical anomalies
31
+ - **Metadata Inspection** - Checks EXIF data for suspicious tools
32
+ - **Extract Data** - Recover messages hidden with this tool
33
+
34
+ ### 🛡️ Check Image Integrity
35
+ Comprehensive image validation and corruption detection:
36
+ - File format validation (JPEG, PNG, GIF, TIFF, BMP, WebP, HEIC)
37
+ - Header integrity checks
38
+ - Data completeness verification
39
+ - Visual corruption detection (black/gray regions)
40
+ - Structure validation
41
+
42
+ ## How It Works
43
+
44
+ ### LSB Steganography
45
+ The tool hides data in the **least significant bits** of pixel values. Since changing the last 1-2 bits of a pixel value (e.g., changing 200 to 201) is imperceptible to the human eye, we can encode arbitrary data without visible changes to the image.
46
+
47
+ **Example:**
48
+ - Original pixel: RGB(156, 89, 201) = `10011100, 01011001, 11001001`
49
+ - After hiding bit '1': RGB(156, 89, 201) = `10011100, 01011001, 11001001` (last bit already 1)
50
+ - After hiding bit '0': RGB(156, 88, 201) = `10011100, 01011000, 11001001` (89→88)
51
+
52
+ This allows hiding hundreds to thousands of bytes in a typical photo!
53
+
54
+ ### Steganography Detection
55
+ The RAT Finder uses multiple forensic techniques:
56
+
57
+ 1. **ELA (Error Level Analysis)**: Re-saves the image at a known quality and compares compression artifacts. Hidden data or manipulation shows as bright areas.
58
+
59
+ 2. **LSB Analysis**: Statistical tests check if the least significant bits are too random (hidden data) or too uniform (natural image).
60
+
61
+ 3. **Histogram Analysis**: Analyzes color distribution for anomalies typical of steganography.
62
+
63
+ 4. **Metadata Forensics**: Checks EXIF data for steganography tools or suspicious editing history.
64
+
65
+ ## Usage Tips
66
+
67
+ ### For Hiding Data:
68
+ - ✅ Use **PNG** images (JPEG compression destroys hidden data)
69
+ - ✅ Larger images = more capacity
70
+ - ✅ Use 1-2 bits per channel for undetectable hiding
71
+ - ✅ Add password encryption for sensitive data
72
+ - ⚠️ Don't re-save or edit the output image!
73
+
74
+ ### For Detection:
75
+ - 🔍 Higher sensitivity = more thorough but more false positives
76
+ - 📊 Check the ELA image for bright spots (potential hiding)
77
+ - 💡 High confidence doesn't guarantee hidden data (could be compression artifacts)
78
+ - 🔓 Use "Extract Data" tab if you suspect LSB steganography
79
+
80
+ ### For Corruption Checking:
81
+ - 🛡️ Enable visual corruption check for damaged photos
82
+ - ⚙️ Higher sensitivity for stricter validation
83
+ - 📁 Useful before archiving important photo collections
84
+
85
+ ## About
86
+
87
+ **2PAC** combines three powerful tools:
88
+ - **LSB Steganography** engine (new!)
89
+ - **RAT Finder** - Advanced steg detection
90
+ - **Image Validator** - Corruption checker
91
+
92
+ Created by [Richard Young](https://github.com/ricyoung) | Part of [DeepNeuro.AI](https://deepneuro.ai)
93
+
94
+ 🔗 **GitHub Repository:** [github.com/ricyoung/2pac](https://github.com/ricyoung/2pac)
95
+ 🌐 **More Tools:** [demo.deepneuro.ai](https://demo.deepneuro.ai)
96
+
97
+ ## Security & Privacy
98
+
99
+ - ✅ All processing happens in your browser session (Hugging Face Space)
100
+ - ✅ Images are not stored or logged
101
+ - ✅ Temporary files are deleted after processing
102
+ - ✅ Your hidden data and passwords are never saved
103
+
104
+ ---
105
+
106
+ *"All Eyez On Your Images" 👁️*
app.py ADDED
@@ -0,0 +1,563 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ 2PAC: Picture Analyzer & Corruption Killer - Gradio Web Interface
4
+ Steganography, image corruption detection, and security analysis
5
+ """
6
+
7
+ import os
8
+ import tempfile
9
+ import gradio as gr
10
+ from PIL import Image
11
+ import matplotlib.pyplot as plt
12
+ import io
13
+ import base64
14
+
15
+ # Import 2PAC modules
16
+ from steg_embedder import StegEmbedder
17
+ import rat_finder
18
+ import find_bad_images
19
+
20
+
21
+ # Initialize embedder
22
+ embedder = StegEmbedder()
23
+
24
+
25
+ def hide_data_in_image(image, secret_text, password, bits_per_channel):
26
+ """
27
+ Tab 1: Hide data in an image using LSB steganography
28
+ """
29
+ if image is None:
30
+ return None, "⚠️ Please upload an image first"
31
+
32
+ if not secret_text or len(secret_text.strip()) == 0:
33
+ return None, "⚠️ Please enter text to hide"
34
+
35
+ try:
36
+ # Save uploaded image to temp file
37
+ with tempfile.NamedTemporaryFile(delete=False, suffix='.png') as tmp_input:
38
+ img = Image.fromarray(image)
39
+ img.save(tmp_input.name, 'PNG')
40
+ input_path = tmp_input.name
41
+
42
+ # Create output file
43
+ with tempfile.NamedTemporaryFile(delete=False, suffix='.png') as tmp_output:
44
+ output_path = tmp_output.name
45
+
46
+ # Calculate capacity first
47
+ img = Image.open(input_path)
48
+ capacity = embedder.calculate_capacity(img, bits_per_channel)
49
+
50
+ # Check if data fits
51
+ data_size = len(secret_text.encode('utf-8'))
52
+ if data_size > capacity:
53
+ os.unlink(input_path)
54
+ return None, f"❌ **Error:** Data too large!\n\n" \
55
+ f"- **Data size:** {data_size:,} bytes\n" \
56
+ f"- **Maximum capacity:** {capacity:,} bytes\n" \
57
+ f"- **Overflow:** {data_size - capacity:,} bytes\n\n" \
58
+ f"💡 Try: Shorter text, larger image, or more bits per channel"
59
+
60
+ # Embed data
61
+ pwd = password if password and len(password) > 0 else None
62
+ success, message, stats = embedder.embed_data(
63
+ input_path,
64
+ secret_text,
65
+ output_path,
66
+ password=pwd,
67
+ bits_per_channel=bits_per_channel
68
+ )
69
+
70
+ # Clean up input
71
+ os.unlink(input_path)
72
+
73
+ if not success:
74
+ if os.path.exists(output_path):
75
+ os.unlink(output_path)
76
+ return None, f"❌ **Error:** {message}"
77
+
78
+ # Load result image
79
+ result_img = Image.open(output_path)
80
+
81
+ # Format success message
82
+ result_message = f"""
83
+ ✅ **Successfully Hidden!**
84
+
85
+ 📊 **Statistics:**
86
+ - **Data hidden:** {stats['data_size']:,} bytes ({len(secret_text):,} characters)
87
+ - **Image capacity:** {stats['capacity']:,} bytes
88
+ - **Utilization:** {stats['utilization']}
89
+ - **Encryption:** {"🔒 Yes" if stats['encrypted'] else "🔓 No"}
90
+ - **LSB depth:** {stats['bits_per_channel']} bit(s) per channel
91
+ - **Image dimensions:** {stats['image_size']}
92
+
93
+ 💾 **Download the image below** - your data is invisible to the naked eye!
94
+
95
+ ⚠️ **Important:**
96
+ - Save as PNG (not JPEG - will destroy hidden data)
97
+ - Keep your password safe if you used encryption
98
+ """
99
+
100
+ return result_img, result_message
101
+
102
+ except Exception as e:
103
+ if 'input_path' in locals() and os.path.exists(input_path):
104
+ os.unlink(input_path)
105
+ if 'output_path' in locals() and os.path.exists(output_path):
106
+ os.unlink(output_path)
107
+ return None, f"❌ **Error:** {str(e)}"
108
+
109
+
110
+ def detect_hidden_data(image, sensitivity):
111
+ """
112
+ Tab 2: Detect steganography using RAT Finder analysis
113
+ """
114
+ if image is None:
115
+ return None, "⚠️ Please upload an image to analyze"
116
+
117
+ try:
118
+ # Save uploaded image to temp file
119
+ with tempfile.NamedTemporaryFile(delete=False, suffix='.png') as tmp:
120
+ img = Image.fromarray(image)
121
+ img.save(tmp.name, 'PNG')
122
+ image_path = tmp.name
123
+
124
+ # Map slider to sensitivity
125
+ sens_map = {1: 'low', 2: 'low', 3: 'low', 4: 'medium', 5: 'medium',
126
+ 6: 'medium', 7: 'high', 8: 'high', 9: 'high', 10: 'high'}
127
+ sensitivity_str = sens_map.get(sensitivity, 'medium')
128
+
129
+ # Perform analysis
130
+ confidence, details = rat_finder.analyze_image(image_path, sensitivity=sensitivity_str)
131
+
132
+ # Generate ELA visualization
133
+ ela_result = rat_finder.perform_ela_analysis(image_path)
134
+
135
+ # Clean up
136
+ os.unlink(image_path)
137
+
138
+ # Create confidence indicator
139
+ if confidence >= 70:
140
+ confidence_emoji = "🚨"
141
+ confidence_label = "HIGH SUSPICION"
142
+ elif confidence >= 40:
143
+ confidence_emoji = "⚠️"
144
+ confidence_label = "MODERATE SUSPICION"
145
+ else:
146
+ confidence_emoji = "✅"
147
+ confidence_label = "LOW SUSPICION"
148
+
149
+ # Format results
150
+ result_text = f"""
151
+ {confidence_emoji} **{confidence_label}**
152
+
153
+ 📊 **Confidence Score:** {confidence:.1f}%
154
+
155
+ 🔍 **Analysis Details:**
156
+ """
157
+
158
+ for detail in details:
159
+ result_text += f"\n• {detail}"
160
+
161
+ result_text += f"""
162
+
163
+ ---
164
+
165
+ **What does this mean?**
166
+
167
+ - **ELA (Error Level Analysis):** Highlights areas with different compression levels
168
+ - Bright areas = potential manipulation or hidden data
169
+ - Uniform appearance = likely unmodified
170
+
171
+ - **LSB Analysis:** Checks randomness in least significant bits
172
+ - **Histogram Analysis:** Looks for statistical anomalies
173
+ - **Metadata:** Examines EXIF data for suspicious tools
174
+ - **File Structure:** Checks for trailing data
175
+
176
+ 💡 **High confidence doesn't mean data is hidden** - just that anomalies exist.
177
+ Use the "Extract Data" tab if you suspect LSB steganography!
178
+ """
179
+
180
+ # Return ELA plot if available
181
+ if ela_result['success'] and ela_result['ela_image']:
182
+ return ela_result['ela_image'], result_text
183
+
184
+ return None, result_text
185
+
186
+ except Exception as e:
187
+ if 'image_path' in locals() and os.path.exists(image_path):
188
+ os.unlink(image_path)
189
+ return None, f"❌ **Error:** {str(e)}"
190
+
191
+
192
+ def extract_hidden_data(image, password, bits_per_channel):
193
+ """
194
+ Tab 2b: Extract data hidden with LSB steganography
195
+ """
196
+ if image is None:
197
+ return "⚠️ Please upload an image"
198
+
199
+ try:
200
+ # Save uploaded image to temp file
201
+ with tempfile.NamedTemporaryFile(delete=False, suffix='.png') as tmp:
202
+ img = Image.fromarray(image)
203
+ img.save(tmp.name, 'PNG')
204
+ image_path = tmp.name
205
+
206
+ # Attempt extraction
207
+ pwd = password if password and len(password) > 0 else None
208
+ success, message, extracted_data = embedder.extract_data(
209
+ image_path,
210
+ password=pwd,
211
+ bits_per_channel=bits_per_channel
212
+ )
213
+
214
+ # Clean up
215
+ os.unlink(image_path)
216
+
217
+ if not success:
218
+ return f"❌ **{message}**\n\nPossible reasons:\n" \
219
+ f"• No data hidden in this image\n" \
220
+ f"• Wrong password (if encrypted)\n" \
221
+ f"• Wrong bits-per-channel setting\n" \
222
+ f"• Image was modified/re-saved"
223
+
224
+ result = f"""
225
+ ✅ **Data Successfully Extracted!**
226
+
227
+ 📝 **Hidden Message:**
228
+
229
+ ---
230
+ {extracted_data}
231
+ ---
232
+
233
+ 📊 **Extraction Info:**
234
+ - **Data size:** {len(extracted_data)} characters
235
+ - **Decryption:** {"🔒 Used" if pwd else "🔓 Not needed"}
236
+ - **LSB depth:** {bits_per_channel} bit(s) per channel
237
+
238
+ 💡 Copy the message above - it has been successfully recovered from the image!
239
+ """
240
+ return result
241
+
242
+ except Exception as e:
243
+ if 'image_path' in locals() and os.path.exists(image_path):
244
+ os.unlink(image_path)
245
+ return f"❌ **Error:** {str(e)}"
246
+
247
+
248
+ def check_image_corruption(image, sensitivity, check_visual):
249
+ """
250
+ Tab 3: Check for image corruption and validate integrity
251
+ """
252
+ if image is None:
253
+ return "⚠️ Please upload an image to check"
254
+
255
+ try:
256
+ # Save uploaded image to temp file
257
+ with tempfile.NamedTemporaryFile(delete=False, suffix='.png') as tmp:
258
+ img = Image.fromarray(image)
259
+ img.save(tmp.name, 'PNG')
260
+ image_path = tmp.name
261
+
262
+ # Map slider to sensitivity
263
+ sens_map = {1: 'low', 2: 'low', 3: 'low', 4: 'medium', 5: 'medium',
264
+ 6: 'medium', 7: 'high', 8: 'high', 9: 'high', 10: 'high'}
265
+ sensitivity_str = sens_map.get(sensitivity, 'medium')
266
+
267
+ # Validate image
268
+ is_valid = find_bad_images.is_valid_image(
269
+ image_path,
270
+ thorough=True,
271
+ sensitivity=sensitivity_str,
272
+ check_visual=check_visual
273
+ )
274
+
275
+ # Get diagnostic details
276
+ issues = find_bad_images.diagnose_image_issue(image_path)
277
+
278
+ # Clean up
279
+ os.unlink(image_path)
280
+
281
+ # Format results
282
+ if is_valid:
283
+ result = f"""
284
+ ✅ **IMAGE IS VALID**
285
+
286
+ The image passed all validation checks:
287
+ - ✅ File structure is intact
288
+ - ✅ Headers are valid
289
+ - ✅ No truncation detected
290
+ - ✅ Metadata is consistent
291
+ """
292
+ if check_visual:
293
+ result += "- ✅ No visual corruption detected\n"
294
+
295
+ result += "\n💚 **This image is safe to use!**"
296
+
297
+ else:
298
+ result = f"""
299
+ ⚠️ **ISSUES DETECTED**
300
+
301
+ The image has validation problems:
302
+
303
+ """
304
+ if issues:
305
+ for issue_type, issue_desc in issues.items():
306
+ result += f"**{issue_type}:**\n{issue_desc}\n\n"
307
+ else:
308
+ result += "❌ Image failed validation but no specific issues identified.\n\n"
309
+
310
+ result += """
311
+ ---
312
+
313
+ **What to do:**
314
+ - Image may be corrupted or incomplete
315
+ - Try re-downloading the original file
316
+ - Check if the file was properly transferred
317
+ - Use image repair tools if needed
318
+ """
319
+
320
+ return result
321
+
322
+ except Exception as e:
323
+ if 'image_path' in locals() and os.path.exists(image_path):
324
+ os.unlink(image_path)
325
+ return f"❌ **Error:** {str(e)}"
326
+
327
+
328
+ # Create Gradio interface
329
+ with gr.Blocks(
330
+ title="2PAC: Picture Analyzer & Corruption Killer",
331
+ theme=gr.themes.Soft(
332
+ primary_hue="violet",
333
+ secondary_hue="blue",
334
+ )
335
+ ) as demo:
336
+
337
+ gr.Markdown("""
338
+ # 🔫 2PAC: Picture Analyzer & Corruption Killer
339
+
340
+ **Advanced image security and steganography toolkit**
341
+
342
+ Hide secret messages in images, detect hidden data, and validate image integrity.
343
+ """)
344
+
345
+ with gr.Tabs():
346
+
347
+ # TAB 1: Hide Data
348
+ with gr.Tab("🔒 Hide Secret Data"):
349
+ gr.Markdown("""
350
+ ## Hide Data in Image (LSB Steganography)
351
+
352
+ Invisibly hide text inside an image using Least Significant Bit encoding.
353
+ The image will look identical to the naked eye, but contains your secret message!
354
+ """)
355
+
356
+ with gr.Row():
357
+ with gr.Column(scale=1):
358
+ hide_input_image = gr.Image(
359
+ label="Upload Image",
360
+ type="numpy",
361
+ height=300
362
+ )
363
+ hide_secret_text = gr.Textbox(
364
+ label="Secret Text to Hide",
365
+ placeholder="Enter your secret message here...",
366
+ lines=5,
367
+ max_lines=10
368
+ )
369
+ with gr.Row():
370
+ hide_password = gr.Textbox(
371
+ label="Password (Optional - for encryption)",
372
+ placeholder="Leave empty for no encryption",
373
+ type="password"
374
+ )
375
+ hide_bits = gr.Slider(
376
+ minimum=1,
377
+ maximum=4,
378
+ value=1,
379
+ step=1,
380
+ label="LSB Depth (higher = more capacity, less subtle)",
381
+ info="1=subtle, 4=maximum capacity"
382
+ )
383
+
384
+ hide_button = gr.Button("🔒 Hide Data in Image", variant="primary", size="lg")
385
+
386
+ with gr.Column(scale=1):
387
+ hide_output_image = gr.Image(label="Result Image (Download This!)", height=300)
388
+ hide_output_text = gr.Markdown(label="Status")
389
+
390
+ hide_button.click(
391
+ fn=hide_data_in_image,
392
+ inputs=[hide_input_image, hide_secret_text, hide_password, hide_bits],
393
+ outputs=[hide_output_image, hide_output_text]
394
+ )
395
+
396
+ gr.Markdown("""
397
+ ---
398
+ **💡 Tips:**
399
+ - Use PNG images for best results (JPEG will destroy hidden data!)
400
+ - Larger images can hold more data
401
+ - Password encryption adds extra security layer
402
+ - LSB depth: 1-2 bits is undetectable, 3-4 bits provides more capacity
403
+ """)
404
+
405
+ # TAB 2: Detect & Extract
406
+ with gr.Tab("🔍 Detect & Extract Hidden Data"):
407
+ gr.Markdown("""
408
+ ## Detect Steganography & Extract Hidden Data
409
+
410
+ Use advanced analysis techniques to detect hidden data in images, or extract data hidden with this tool.
411
+ """)
412
+
413
+ with gr.Tabs():
414
+
415
+ # Sub-tab: Detection
416
+ with gr.Tab("🔎 Detect (Analysis)"):
417
+ gr.Markdown("""
418
+ ### Steganography Detection (RAT Finder)
419
+
420
+ Analyzes images for signs of hidden data using multiple techniques:
421
+ ELA, LSB analysis, histogram analysis, metadata inspection, and more.
422
+ """)
423
+
424
+ with gr.Row():
425
+ with gr.Column(scale=1):
426
+ detect_input_image = gr.Image(
427
+ label="Upload Image to Analyze",
428
+ type="numpy",
429
+ height=300
430
+ )
431
+ detect_sensitivity = gr.Slider(
432
+ minimum=1,
433
+ maximum=10,
434
+ value=5,
435
+ step=1,
436
+ label="Detection Sensitivity",
437
+ info="Higher = more thorough but more false positives"
438
+ )
439
+ detect_button = gr.Button("🔍 Analyze for Hidden Data", variant="primary", size="lg")
440
+
441
+ with gr.Column(scale=1):
442
+ detect_output_image = gr.Image(label="ELA Visualization", height=300)
443
+ detect_output_text = gr.Markdown(label="Analysis Results")
444
+
445
+ detect_button.click(
446
+ fn=detect_hidden_data,
447
+ inputs=[detect_input_image, detect_sensitivity],
448
+ outputs=[detect_output_image, detect_output_text]
449
+ )
450
+
451
+ # Sub-tab: Extraction
452
+ with gr.Tab("📤 Extract Data"):
453
+ gr.Markdown("""
454
+ ### Extract Hidden Data (LSB Extraction)
455
+
456
+ If you have an image created with the "Hide Data" tool, extract the hidden message here.
457
+ """)
458
+
459
+ with gr.Row():
460
+ with gr.Column(scale=1):
461
+ extract_input_image = gr.Image(
462
+ label="Upload Image with Hidden Data",
463
+ type="numpy",
464
+ height=300
465
+ )
466
+ with gr.Row():
467
+ extract_password = gr.Textbox(
468
+ label="Password (if encrypted)",
469
+ placeholder="Leave empty if not encrypted",
470
+ type="password"
471
+ )
472
+ extract_bits = gr.Slider(
473
+ minimum=1,
474
+ maximum=4,
475
+ value=1,
476
+ step=1,
477
+ label="LSB Depth (must match encoding)",
478
+ info="Use same value as when hiding"
479
+ )
480
+ extract_button = gr.Button("📤 Extract Hidden Data", variant="primary", size="lg")
481
+
482
+ with gr.Column(scale=1):
483
+ extract_output_text = gr.Markdown(label="Extracted Data")
484
+
485
+ extract_button.click(
486
+ fn=extract_hidden_data,
487
+ inputs=[extract_input_image, extract_password, extract_bits],
488
+ outputs=[extract_output_text]
489
+ )
490
+
491
+ # TAB 3: Check Corruption
492
+ with gr.Tab("🛡️ Check Image Integrity"):
493
+ gr.Markdown("""
494
+ ## Image Corruption & Validation
495
+
496
+ Thoroughly validate image files for corruption, truncation, and structural issues.
497
+ Detects damaged headers, incomplete data, and visual artifacts.
498
+ """)
499
+
500
+ with gr.Row():
501
+ with gr.Column(scale=1):
502
+ check_input_image = gr.Image(
503
+ label="Upload Image to Validate",
504
+ type="numpy",
505
+ height=300
506
+ )
507
+ with gr.Row():
508
+ check_sensitivity = gr.Slider(
509
+ minimum=1,
510
+ maximum=10,
511
+ value=5,
512
+ step=1,
513
+ label="Validation Sensitivity",
514
+ info="Higher = more strict validation"
515
+ )
516
+ check_visual = gr.Checkbox(
517
+ label="Check for Visual Corruption",
518
+ value=True,
519
+ info="Slower but detects visual artifacts"
520
+ )
521
+ check_button = gr.Button("🛡️ Validate Image", variant="primary", size="lg")
522
+
523
+ with gr.Column(scale=1):
524
+ check_output_text = gr.Markdown(label="Validation Results")
525
+
526
+ check_button.click(
527
+ fn=check_image_corruption,
528
+ inputs=[check_input_image, check_sensitivity, check_visual],
529
+ outputs=[check_output_text]
530
+ )
531
+
532
+ gr.Markdown("""
533
+ ---
534
+ **🔍 Checks Performed:**
535
+ - ✅ File format validation (JPEG, PNG, GIF, etc.)
536
+ - ✅ Header integrity
537
+ - ✅ Data completeness
538
+ - ✅ Metadata consistency
539
+ - ✅ Visual corruption detection (black/gray regions)
540
+ - ✅ Structure validation
541
+ """)
542
+
543
+ gr.Markdown("""
544
+ ---
545
+
546
+ ## About 2PAC
547
+
548
+ **2PAC** (Picture Analyzer & Corruption Killer) is a comprehensive image security toolkit combining:
549
+ - **LSB Steganography**: Hide and extract secret messages in images
550
+ - **RAT Finder**: Advanced steganography detection using 7+ analysis techniques
551
+ - **Image Validation**: Detect corruption and structural issues
552
+
553
+ 🔗 **GitHub:** [github.com/ricyoung/2pac](https://github.com/ricyoung/2pac)
554
+ 🌐 **More Tools:** [demo.deepneuro.ai](https://demo.deepneuro.ai)
555
+
556
+ ---
557
+
558
+ *Built with ❤️ by DeepNeuro.AI | Powered by Gradio & Hugging Face Spaces*
559
+ """)
560
+
561
+
562
+ if __name__ == "__main__":
563
+ demo.launch()
find_bad_images.py ADDED
@@ -0,0 +1,1670 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ 2PAC: The Picture Analyzer & Corruption killer
4
+ Author: Richard Young
5
+ License: MIT
6
+
7
+ In memory of Jeff Young, who loved Tupac's music and lived by his values of helping others.
8
+ Like Tupac, Jeff believed in bringing people together and always lending a hand to those in need.
9
+ May your photos always be as clear as the memories they capture, and may we all strive to help others as Jeff did.
10
+ """
11
+
12
+ import os
13
+ import argparse
14
+ import concurrent.futures
15
+ import sys
16
+ import time
17
+ import io
18
+ import json
19
+ import shutil
20
+ import hashlib
21
+ import struct
22
+ import tempfile
23
+ import subprocess
24
+ import random
25
+ from datetime import datetime
26
+ from pathlib import Path
27
+ from PIL import Image, ImageFile, UnidentifiedImageError
28
+ from tqdm import tqdm
29
+ import tqdm.auto as tqdm_auto
30
+ import colorama
31
+ import humanize
32
+ import logging
33
+
34
+ # Import 2PAC quotes
35
+ try:
36
+ from quotes import QUOTES
37
+ except ImportError:
38
+ # Default quotes if file is missing
39
+ QUOTES = ["All Eyez On Your Images."]
40
+
41
+ # Initialize colorama (required for Windows)
42
+ colorama.init()
43
+
44
+ # Allow loading of truncated images for repair attempts
45
+ ImageFile.LOAD_TRUNCATED_IMAGES = True
46
+
47
+ # Dictionary of supported image formats with their extensions
48
+ SUPPORTED_FORMATS = {
49
+ 'JPEG': ('.jpg', '.jpeg', '.jpe', '.jif', '.jfif', '.jfi'),
50
+ 'PNG': ('.png',),
51
+ 'GIF': ('.gif',),
52
+ 'TIFF': ('.tiff', '.tif'),
53
+ 'BMP': ('.bmp', '.dib'),
54
+ 'WEBP': ('.webp',),
55
+ 'ICO': ('.ico',),
56
+ 'HEIC': ('.heic',),
57
+ }
58
+
59
+ # Default formats (all supported formats)
60
+ DEFAULT_FORMATS = list(SUPPORTED_FORMATS.keys())
61
+
62
+ # List of formats that can potentially be repaired
63
+ REPAIRABLE_FORMATS = ['JPEG', 'PNG', 'GIF']
64
+
65
+ # Default progress directory
66
+ DEFAULT_PROGRESS_DIR = os.path.expanduser("~/.bad_image_finder/progress")
67
+
68
+ # Current version
69
+ VERSION = "1.5.1"
70
+
71
+ # Security: Maximum file size to process (100MB) to prevent DoS
72
+ MAX_FILE_SIZE = 100 * 1024 * 1024
73
+
74
+ # Security: Maximum image dimensions (50 megapixels) to prevent decompression bombs
75
+ MAX_IMAGE_PIXELS = 50000 * 50000
76
+
77
+ def setup_logging(verbose, no_color=False):
78
+ level = logging.DEBUG if verbose else logging.INFO
79
+
80
+ # Define color codes
81
+ if not no_color:
82
+ # Color scheme
83
+ COLORS = {
84
+ 'DEBUG': colorama.Fore.CYAN,
85
+ 'INFO': colorama.Fore.GREEN,
86
+ 'WARNING': colorama.Fore.YELLOW,
87
+ 'ERROR': colorama.Fore.RED,
88
+ 'CRITICAL': colorama.Fore.MAGENTA + colorama.Style.BRIGHT,
89
+ 'RESET': colorama.Style.RESET_ALL
90
+ }
91
+
92
+ # Custom formatter with colors
93
+ class ColoredFormatter(logging.Formatter):
94
+ def format(self, record):
95
+ levelname = record.levelname
96
+ if levelname in COLORS:
97
+ record.levelname = f"{COLORS[levelname]}{levelname}{COLORS['RESET']}"
98
+ record.msg = f"{COLORS[levelname]}{record.msg}{COLORS['RESET']}"
99
+ return super().format(record)
100
+
101
+ formatter = ColoredFormatter('%(asctime)s - %(levelname)s - %(message)s')
102
+ else:
103
+ formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
104
+
105
+ handler = logging.StreamHandler()
106
+ handler.setFormatter(formatter)
107
+
108
+ logging.basicConfig(
109
+ level=level,
110
+ handlers=[handler]
111
+ )
112
+
113
+ def diagnose_image_issue(file_path):
114
+ """
115
+ Attempts to diagnose what's wrong with the image.
116
+ Returns: (error_type, details)
117
+ """
118
+ try:
119
+ with open(file_path, 'rb') as f:
120
+ header = f.read(16) # Read first 16 bytes
121
+
122
+ # Check for zero-byte file
123
+ if len(header) == 0:
124
+ return "empty_file", "File is empty (0 bytes)"
125
+
126
+ # Check for correct JPEG header
127
+ if file_path.lower().endswith(SUPPORTED_FORMATS['JPEG']):
128
+ if not (header.startswith(b'\xff\xd8\xff')):
129
+ return "invalid_header", "Invalid JPEG header"
130
+
131
+ # Check for correct PNG header
132
+ elif file_path.lower().endswith(SUPPORTED_FORMATS['PNG']):
133
+ if not header.startswith(b'\x89PNG\r\n\x1a\n'):
134
+ return "invalid_header", "Invalid PNG header"
135
+
136
+ # Try to open with PIL for more detailed diagnosis
137
+ try:
138
+ with Image.open(file_path) as img:
139
+ img.verify()
140
+ except Exception as e:
141
+ error_str = str(e).lower()
142
+
143
+ if "truncated" in error_str:
144
+ return "truncated", "File is truncated"
145
+ elif "corrupt" in error_str:
146
+ return "corrupt_data", "Data corruption detected"
147
+ elif "incorrect mode" in error_str or "decoder" in error_str:
148
+ return "decoder_issue", "Image decoder issue"
149
+ else:
150
+ return "unknown", f"Unknown issue: {str(e)}"
151
+
152
+ # Now try to load the data
153
+ try:
154
+ with Image.open(file_path) as img:
155
+ img.load()
156
+ except Exception as e:
157
+ return "data_load_failed", f"Image data couldn't be loaded: {str(e)}"
158
+
159
+ # If we got here, there's some other issue
160
+ return "unknown", "Unknown issue"
161
+
162
+ except Exception as e:
163
+ return "access_error", f"Error accessing file: {str(e)}"
164
+
165
+ def check_jpeg_structure(file_path):
166
+ """
167
+ Performs a deep check of JPEG file structure to find corruption that PIL might miss.
168
+ Returns (is_valid, error_message)
169
+ """
170
+ try:
171
+ with open(file_path, 'rb') as f:
172
+ data = f.read()
173
+
174
+ # Check for correct JPEG header (SOI marker)
175
+ if not data.startswith(b'\xFF\xD8'):
176
+ return False, "Invalid JPEG header (missing SOI marker)"
177
+
178
+ # Check for proper EOI marker at the end
179
+ if not data.endswith(b'\xFF\xD9'):
180
+ return False, "Missing EOI marker at end of file"
181
+
182
+ # Check for key JPEG segments
183
+ # SOF marker (Start of Frame) - At least one should be present
184
+ sof_markers = [b'\xFF\xC0', b'\xFF\xC1', b'\xFF\xC2', b'\xFF\xC3']
185
+ has_sof = any(marker in data for marker in sof_markers)
186
+ if not has_sof:
187
+ return False, "No Start of Frame (SOF) marker found"
188
+
189
+ # Check for SOS marker (Start of Scan)
190
+ if b'\xFF\xDA' not in data:
191
+ return False, "No Start of Scan (SOS) marker found"
192
+
193
+ # Scan through the file to check marker structure
194
+ i = 2 # Skip SOI marker
195
+ while i < len(data) - 1:
196
+ if data[i] == 0xFF and data[i+1] != 0x00 and data[i+1] != 0xFF:
197
+ # Found a marker
198
+ marker = data[i:i+2]
199
+
200
+ # For markers with length fields, validate length
201
+ if (0xC0 <= data[i+1] <= 0xCF and data[i+1] != 0xC4 and data[i+1] != 0xC8) or \
202
+ (0xDB <= data[i+1] <= 0xFE):
203
+ if i + 4 >= len(data):
204
+ return False, f"Truncated marker {data[i+1]:02X} at position {i}"
205
+ length = struct.unpack('>H', data[i+2:i+4])[0]
206
+ if i + 2 + length > len(data):
207
+ return False, f"Invalid segment length for marker {data[i+1]:02X}"
208
+ i += 2 + length
209
+ continue
210
+
211
+ # Move to next byte
212
+ i += 1
213
+
214
+ return True, "JPEG structure appears valid"
215
+ except Exception as e:
216
+ return False, f"Error during JPEG structure check: {str(e)}"
217
+
218
+ def check_png_structure(file_path):
219
+ """
220
+ Performs a deep check of PNG file structure to find corruption.
221
+ Returns (is_valid, error_message)
222
+ """
223
+ try:
224
+ with open(file_path, 'rb') as f:
225
+ data = f.read()
226
+
227
+ # Check for PNG signature
228
+ png_signature = b'\x89PNG\r\n\x1a\n'
229
+ if not data.startswith(png_signature):
230
+ return False, "Invalid PNG signature"
231
+
232
+ # Check minimum viable PNG (signature + IHDR chunk)
233
+ if len(data) < 8 + 12: # 8 bytes signature + 12 bytes min IHDR chunk
234
+ return False, "PNG file too small to contain valid header"
235
+
236
+ # Check for IEND chunk at the end
237
+ if not data.endswith(b'IEND\xaeB`\x82'):
238
+ return False, "Missing IEND chunk at end of file"
239
+
240
+ # Parse chunks
241
+ pos = 8 # Skip signature
242
+ required_chunks = {'IHDR': False}
243
+
244
+ while pos < len(data):
245
+ if pos + 8 > len(data):
246
+ return False, "Truncated chunk header"
247
+
248
+ # Read chunk length and type
249
+ chunk_len = struct.unpack('>I', data[pos:pos+4])[0]
250
+ chunk_type = data[pos+4:pos+8].decode('ascii', errors='replace')
251
+
252
+ # Validate chunk length
253
+ if pos + chunk_len + 12 > len(data):
254
+ return False, f"Truncated {chunk_type} chunk"
255
+
256
+ # Track required chunks
257
+ if chunk_type in required_chunks:
258
+ required_chunks[chunk_type] = True
259
+
260
+ # Special validation for IHDR chunk
261
+ if chunk_type == 'IHDR' and chunk_len != 13:
262
+ return False, "Invalid IHDR chunk length"
263
+
264
+ # Mandatory IHDR must be first chunk
265
+ if pos == 8 and chunk_type != 'IHDR':
266
+ return False, "First chunk must be IHDR"
267
+
268
+ # IEND must be the last chunk
269
+ if chunk_type == 'IEND' and pos + chunk_len + 12 != len(data):
270
+ return False, "Data after IEND chunk"
271
+
272
+ # Move to next chunk
273
+ pos += chunk_len + 12 # Length (4) + Type (4) + Data (chunk_len) + CRC (4)
274
+
275
+ # Verify required chunks
276
+ for chunk, present in required_chunks.items():
277
+ if not present:
278
+ return False, f"Missing required {chunk} chunk"
279
+
280
+ return True, "PNG structure appears valid"
281
+ except Exception as e:
282
+ return False, f"Error during PNG structure check: {str(e)}"
283
+
284
+ def validate_subprocess_path(file_path):
285
+ """
286
+ Validate file path before passing to subprocess to prevent command injection.
287
+
288
+ Args:
289
+ file_path: Path to validate
290
+
291
+ Returns:
292
+ True if path is safe
293
+
294
+ Raises:
295
+ ValueError: If path contains dangerous characters or patterns
296
+ """
297
+ import re
298
+
299
+ # Must be an absolute path
300
+ if not os.path.isabs(file_path):
301
+ raise ValueError(f"Path must be absolute: {file_path}")
302
+
303
+ # File must exist
304
+ if not os.path.exists(file_path):
305
+ raise ValueError(f"File does not exist: {file_path}")
306
+
307
+ # Check for shell metacharacters and dangerous patterns
308
+ # Allow: alphanumeric, spaces, dots, dashes, underscores, forward slashes
309
+ # Block: semicolons, pipes, backticks, $, &, >, <, etc.
310
+ dangerous_chars = ['`', '$', '&', '|', ';', '>', '<', '\n', '\r', '(', ')']
311
+ for char in dangerous_chars:
312
+ if char in file_path:
313
+ raise ValueError(f"Dangerous character '{char}' found in path: {file_path}")
314
+
315
+ # Block path traversal attempts
316
+ if '..' in file_path:
317
+ raise ValueError(f"Path traversal pattern '..' detected: {file_path}")
318
+
319
+ # Block null bytes
320
+ if '\x00' in file_path:
321
+ raise ValueError("Null byte detected in path")
322
+
323
+ return True
324
+
325
+
326
+ def try_external_tools(file_path):
327
+ """
328
+ Try using external tools to validate the image if they're available.
329
+ Returns (is_valid, message)
330
+
331
+ Security: Validates file path before passing to subprocess to prevent
332
+ command injection attacks.
333
+ """
334
+ # Validate path before passing to subprocess
335
+ try:
336
+ validate_subprocess_path(file_path)
337
+ except ValueError as e:
338
+ logging.warning(f"Skipping external tool validation due to security check: {e}")
339
+ return True, "External tools check skipped (security)"
340
+
341
+ # Try using exiftool if available
342
+ try:
343
+ result = subprocess.run(['exiftool', '-m', '-p', '$Error', file_path],
344
+ capture_output=True, text=True, timeout=5)
345
+ if result.returncode == 0 and result.stdout.strip():
346
+ return False, f"Exiftool error: {result.stdout.strip()}"
347
+
348
+ # Check with identify (ImageMagick) if available
349
+ result = subprocess.run(['identify', '-verbose', file_path],
350
+ capture_output=True, text=True, timeout=5)
351
+ if result.returncode != 0:
352
+ return False, "ImageMagick identify failed to read the image"
353
+
354
+ return True, "Passed external tool validation"
355
+ except (subprocess.SubprocessError, FileNotFoundError):
356
+ # External tools not available or failed
357
+ return True, "External tools check skipped"
358
+
359
+ def try_full_decode_check(file_path):
360
+ """
361
+ Try to fully decode the image to a temporary file.
362
+ This catches more subtle corruption that might otherwise be missed.
363
+ """
364
+ try:
365
+ # For JPEGs, try to decode and re-encode the image
366
+ with Image.open(file_path) as img:
367
+ # Create a temporary file for testing
368
+ with tempfile.NamedTemporaryFile(delete=True) as tmp:
369
+ # Try to save a decoded copy
370
+ img.save(tmp.name, format="BMP")
371
+
372
+ # If we get here, the image data could be fully decoded
373
+ return True, "Full decode test passed"
374
+ except Exception as e:
375
+ return False, f"Full decode test failed: {str(e)}"
376
+
377
+ def check_visual_corruption(file_path, block_threshold=0.20, uniform_threshold=10, strict_mode=False):
378
+ """
379
+ Analyze image content to detect visual corruption like large uniform areas.
380
+
381
+ Args:
382
+ file_path: Path to the image file
383
+ block_threshold: Percentage of image that must be uniform to be considered corrupt (0.0-1.0)
384
+ uniform_threshold: Color variation threshold for considering pixels "uniform"
385
+ strict_mode: If True, only detect gray/black areas as corruption indicators
386
+
387
+ Returns:
388
+ (is_visually_corrupt, details)
389
+ """
390
+ try:
391
+ with Image.open(file_path) as img:
392
+ # Get image dimensions
393
+ width, height = img.size
394
+ total_pixels = width * height
395
+
396
+ # Convert to RGB to ensure consistent analysis
397
+ if img.mode != "RGB":
398
+ img = img.convert("RGB")
399
+
400
+ # Sample the image (analyzing every pixel would be too slow)
401
+ # We'll create a grid of sample points - we'll use more samples for more accuracy
402
+ sample_step = max(1, min(width, height) // 150) # Adjust based on image size
403
+
404
+ # Track unique colors and their counts
405
+ color_counts = {}
406
+ total_samples = 0
407
+
408
+ # Sample the image
409
+ for y in range(0, height, sample_step):
410
+ for x in range(0, width, sample_step):
411
+ total_samples += 1
412
+ pixel = img.getpixel((x, y))
413
+
414
+ # Round pixel values to reduce sensitivity to minor variations
415
+ rounded_pixel = (
416
+ pixel[0] // uniform_threshold * uniform_threshold,
417
+ pixel[1] // uniform_threshold * uniform_threshold,
418
+ pixel[2] // uniform_threshold * uniform_threshold
419
+ )
420
+
421
+ if rounded_pixel in color_counts:
422
+ color_counts[rounded_pixel] += 1
423
+ else:
424
+ color_counts[rounded_pixel] = 1
425
+
426
+ # Find the most common color
427
+ most_common_color = max(color_counts.items(), key=lambda x: x[1])
428
+ most_common_percentage = most_common_color[1] / total_samples
429
+
430
+ # Check for large blocks of uniform color (potential corruption)
431
+ if most_common_percentage > block_threshold:
432
+ # Calculate approximate percentage of the image affected
433
+ affected_pct = most_common_percentage * 100
434
+ color_value = most_common_color[0]
435
+
436
+ # Determine if this is likely corruption
437
+ # Gray/black areas are common in corruption
438
+ is_dark = sum(color_value) < 3 * uniform_threshold # Very dark areas
439
+
440
+ # Check if it's a gray area (equal R,G,B values)
441
+ is_gray = abs(color_value[0] - color_value[1]) < uniform_threshold and \
442
+ abs(color_value[1] - color_value[2]) < uniform_threshold and \
443
+ abs(color_value[0] - color_value[2]) < uniform_threshold
444
+
445
+ # Only consider mid-range grays as corruption indicators (not white/black)
446
+ is_mid_gray = is_gray and 30 < sum(color_value)/3 < 220
447
+
448
+ # Special case: almost pure white is often legitimate content
449
+ is_white = color_value[0] > 240 and color_value[1] > 240 and color_value[2] > 240
450
+
451
+ # Determine likelihood of corruption based on color and percentage
452
+ if (is_dark or is_mid_gray) and not is_white:
453
+ # Higher threshold for white areas since they're common in legitimate images
454
+ white_threshold = 0.4 # 40% of image
455
+ if is_white and most_common_percentage < white_threshold:
456
+ return False, f"Large white area ({affected_pct:.1f}%) but likely not corruption"
457
+
458
+ # More likely to be corruption
459
+ return True, f"Visual corruption detected: {affected_pct:.1f}% of image is uniform {color_value}"
460
+ else:
461
+ # Could be a legitimate image with a uniform background
462
+ return False, f"Large uniform area ({affected_pct:.1f}%) but likely not corruption"
463
+
464
+ # Check for other telltale signs of corruption - but only in strict mode
465
+ if strict_mode:
466
+ # 1. Excessive color blocks (fragmentation) - this works well for detecting noise
467
+ if len(color_counts) > total_samples * 0.85 and total_samples > 200:
468
+ return True, f"Excessive color fragmentation detected ({len(color_counts)} colors in {total_samples} samples)"
469
+
470
+ # 2. Check for very specific corruption patterns
471
+ # Analyze distribution of colors to look for unusual patterns
472
+ if total_samples > 500: # Only for larger images with enough samples
473
+ # Check if there's an unnatural color distribution
474
+ # Normal photos have a more gradual distribution rather than spikes
475
+ sorted_counts = sorted(color_counts.values(), reverse=True)
476
+
477
+ # Calculate the color distribution ratio
478
+ if len(sorted_counts) > 5:
479
+ top5_ratio = sum(sorted_counts[:5]) / sum(sorted_counts)
480
+ # Usually, the top 5 colors shouldn't dominate more than 80% of the image
481
+ # unless it's a graphic or very simple image
482
+ if top5_ratio < 0.2 and most_common_percentage < 0.1:
483
+ return True, f"Unusual color distribution (possible noise/corruption)"
484
+
485
+ return False, "No visual corruption detected"
486
+
487
+ except Exception as e:
488
+ return False, f"Error during visual analysis: {str(e)}"
489
+
490
+ def is_valid_image(file_path, thorough=True, sensitivity='medium', ignore_eof=False, check_visual=False, visual_strictness='medium'):
491
+ """
492
+ Validate image file integrity using multiple methods.
493
+
494
+ Args:
495
+ file_path: Path to the image file
496
+ thorough: Whether to perform deep structure validation
497
+ sensitivity: 'low', 'medium', or 'high'
498
+ ignore_eof: Whether to ignore missing end-of-file markers
499
+ check_visual: Whether to perform visual content analysis to detect corruption
500
+ visual_strictness: 'low', 'medium', or 'high' strictness for visual corruption detection
501
+
502
+ Returns:
503
+ True if valid, False if corrupt.
504
+ """
505
+ # Basic PIL validation first (fast check)
506
+ try:
507
+ with Image.open(file_path) as img:
508
+ # verify() checks the file header
509
+ img.verify()
510
+
511
+ # Additional step: try to load the image data
512
+ # This catches more corruption issues
513
+ with Image.open(file_path) as img2:
514
+ img2.load()
515
+
516
+ # If check_visual is enabled, analyze the image content
517
+ if check_visual:
518
+ # Set thresholds based on strictness level
519
+ if visual_strictness == 'low':
520
+ # More permissive - only detect very obvious corruption
521
+ block_threshold = 0.3 # 30% of the image must be uniform
522
+ uniform_threshold = 5 # Smaller color variations are allowed
523
+ elif visual_strictness == 'high':
524
+ # Most strict - catches subtle corruption but may have false positives
525
+ block_threshold = 0.15 # Only 15% of the image needs to be uniform
526
+ uniform_threshold = 15 # Larger color variations are considered uniform
527
+ else: # medium (default)
528
+ block_threshold = 0.20 # 20% threshold
529
+ uniform_threshold = 10
530
+
531
+ # Check for visual corruption with appropriate thresholds
532
+ is_visually_corrupt, msg = check_visual_corruption(
533
+ file_path,
534
+ block_threshold=block_threshold,
535
+ uniform_threshold=uniform_threshold,
536
+ # Only use additional detection methods in high strictness mode
537
+ strict_mode=(visual_strictness == 'high')
538
+ )
539
+
540
+ if is_visually_corrupt:
541
+ logging.debug(f"Visual corruption detected in {file_path}: {msg}")
542
+ return False
543
+
544
+ # If thorough checking is disabled, return after basic check
545
+ if not thorough or sensitivity == 'low':
546
+ return True
547
+
548
+ # For JPEG files, do additional structure checking
549
+ if file_path.lower().endswith(tuple(SUPPORTED_FORMATS['JPEG'])):
550
+ # Check JPEG structure
551
+ is_valid, error_msg = check_jpeg_structure(file_path)
552
+ if not is_valid:
553
+ # If ignore_eof is enabled and the only issue is missing EOI marker, consider it valid
554
+ if ignore_eof and error_msg == "Missing EOI marker at end of file":
555
+ logging.debug(f"Ignoring missing EOI marker for {file_path} as requested")
556
+ else:
557
+ logging.debug(f"JPEG structure invalid for {file_path}: {error_msg}")
558
+ return False
559
+
560
+ # Try full decode test (catches subtle corruption)
561
+ is_valid, error_msg = try_full_decode_check(file_path)
562
+ if not is_valid:
563
+ logging.debug(f"Full decode test failed for {file_path}: {error_msg}")
564
+ return False
565
+
566
+ # Try external tools if applicable
567
+ is_valid, error_msg = try_external_tools(file_path)
568
+ if not is_valid:
569
+ logging.debug(f"External tool validation failed for {file_path}: {error_msg}")
570
+ return False
571
+
572
+ # For PNG files, do additional structure checking
573
+ elif file_path.lower().endswith(tuple(SUPPORTED_FORMATS['PNG'])):
574
+ # Check PNG structure
575
+ is_valid, error_msg = check_png_structure(file_path)
576
+ if not is_valid:
577
+ logging.debug(f"PNG structure invalid for {file_path}: {error_msg}")
578
+ return False
579
+
580
+ # Try full decode test (catches subtle corruption)
581
+ is_valid, error_msg = try_full_decode_check(file_path)
582
+ if not is_valid:
583
+ logging.debug(f"Full decode test failed for {file_path}: {error_msg}")
584
+ return False
585
+
586
+ return True
587
+ except Exception as e:
588
+ logging.debug(f"Invalid image {file_path}: {str(e)}")
589
+ return False
590
+
591
+ def attempt_repair(file_path, backup_dir=None):
592
+ """
593
+ Attempts to repair corrupt image files.
594
+ Returns: (success, message, fixed_width, fixed_height)
595
+ """
596
+ # Create backup if requested
597
+ if backup_dir:
598
+ backup_path = os.path.join(backup_dir, os.path.basename(file_path) + ".bak")
599
+ try:
600
+ shutil.copy2(file_path, backup_path)
601
+ logging.debug(f"Created backup at {backup_path}")
602
+ except Exception as e:
603
+ logging.warning(f"Could not create backup: {str(e)}")
604
+
605
+ try:
606
+ # First, diagnose the issue
607
+ issue_type, details = diagnose_image_issue(file_path)
608
+ logging.debug(f"Diagnosis for {file_path}: {issue_type} - {details}")
609
+
610
+ file_ext = os.path.splitext(file_path)[1].lower()
611
+
612
+ # Check if file format is supported for repair
613
+ format_supported = False
614
+ for fmt in REPAIRABLE_FORMATS:
615
+ if file_ext in SUPPORTED_FORMATS[fmt]:
616
+ format_supported = True
617
+ break
618
+
619
+ if not format_supported:
620
+ return False, f"Format not supported for repair ({file_ext})", None, None
621
+
622
+ # Try to open and resave the image with PIL's error forgiveness
623
+ # This works for many truncated files
624
+ try:
625
+ with Image.open(file_path) as img:
626
+ width, height = img.size
627
+ format = img.format
628
+
629
+ # Create a buffer for the fixed image
630
+ buffer = io.BytesIO()
631
+ img.save(buffer, format=format)
632
+
633
+ # Write the repaired image back to the original file
634
+ with open(file_path, 'wb') as f:
635
+ f.write(buffer.getvalue())
636
+
637
+ # Verify the repaired image
638
+ if is_valid_image(file_path):
639
+ return True, f"Repaired {issue_type} issue", width, height
640
+ else:
641
+ # If verification fails, try again with JPEG specific options for JPEG files
642
+ if format == 'JPEG':
643
+ with Image.open(file_path) as img:
644
+ buffer = io.BytesIO()
645
+ # Use optimize=True and quality=85 for better repair chances
646
+ img.save(buffer, format='JPEG', optimize=True, quality=85)
647
+ with open(file_path, 'wb') as f:
648
+ f.write(buffer.getvalue())
649
+
650
+ if is_valid_image(file_path):
651
+ return True, f"Repaired {issue_type} issue with JPEG optimization", width, height
652
+
653
+ return False, f"Failed to repair {issue_type} issue", None, None
654
+
655
+ except Exception as e:
656
+ logging.debug(f"Repair attempt failed for {file_path}: {str(e)}")
657
+ return False, f"Repair failed: {str(e)}", None, None
658
+
659
+ except Exception as e:
660
+ logging.debug(f"Error during repair of {file_path}: {str(e)}")
661
+ return False, f"Repair error: {str(e)}", None, None
662
+
663
+ def process_file(args):
664
+ """Process a single image file."""
665
+ file_path, repair_mode, repair_dir, thorough_check, sensitivity, ignore_eof, check_visual, visual_strictness, enable_security_checks = args
666
+
667
+ # Security validation (if enabled)
668
+ if enable_security_checks:
669
+ try:
670
+ is_safe, warnings = validate_file_security(file_path, check_size=True, check_dimensions=True)
671
+
672
+ # Log security warnings
673
+ for warning in warnings:
674
+ logging.warning(f"Security warning for {file_path}: {warning}")
675
+
676
+ if not is_safe:
677
+ # File failed security checks - treat as invalid
678
+ size = os.path.getsize(file_path)
679
+ return file_path, False, size, "security_failed", "Failed security validation", None
680
+
681
+ except ValueError as e:
682
+ # Critical security failure (file too large, dimensions too big, etc.)
683
+ logging.error(f"Security check failed for {file_path}: {e}")
684
+ size = os.path.getsize(file_path) if os.path.exists(file_path) else 0
685
+ return file_path, False, size, "security_failed", str(e), None
686
+ except Exception as e:
687
+ # Unexpected error during security validation
688
+ logging.debug(f"Security validation error for {file_path}: {e}")
689
+ # Continue processing anyway for this case
690
+
691
+ # Check if the image is valid
692
+ is_valid = is_valid_image(file_path, thorough=thorough_check, sensitivity=sensitivity,
693
+ ignore_eof=ignore_eof, check_visual=check_visual, visual_strictness=visual_strictness)
694
+
695
+ if not is_valid and repair_mode:
696
+ # Try to repair the file
697
+ repair_success, repair_msg, width, height = attempt_repair(file_path, repair_dir)
698
+
699
+ if repair_success:
700
+ # File was repaired
701
+ return file_path, True, 0, "repaired", repair_msg, (width, height)
702
+ else:
703
+ # File is still corrupt
704
+ size = os.path.getsize(file_path)
705
+ return file_path, False, size, "repair_failed", repair_msg, None
706
+ else:
707
+ # No repair attempted or file is valid
708
+ size = os.path.getsize(file_path) if not is_valid else 0
709
+ return file_path, is_valid, size, "not_repaired", None, None
710
+
711
+ def get_session_id(directory, formats, recursive):
712
+ """Generate a unique session ID based on scan parameters."""
713
+ # Create a unique identifier for this scan session
714
+ dir_path = str(directory).encode('utf-8')
715
+ formats_str = ",".join(sorted(formats)).encode('utf-8')
716
+ recursive_str = str(recursive).encode('utf-8')
717
+
718
+ # Use SHA256 instead of MD5 for better security
719
+ # MD5 is cryptographically broken and should not be used
720
+ hash_obj = hashlib.sha256()
721
+ hash_obj.update(dir_path)
722
+ hash_obj.update(formats_str)
723
+ hash_obj.update(recursive_str)
724
+
725
+ return hash_obj.hexdigest()[:16] # Use first 16 chars of hash for uniqueness
726
+
727
+ def _deduplicate(seq):
728
+ """Return a list with duplicates removed while preserving order."""
729
+ seen = set()
730
+ deduped = []
731
+ for item in seq:
732
+ if item not in seen:
733
+ deduped.append(item)
734
+ seen.add(item)
735
+ return deduped
736
+
737
+
738
+ def validate_file_security(file_path, check_size=True, check_dimensions=True):
739
+ """
740
+ Perform security validation on a file before processing.
741
+
742
+ Args:
743
+ file_path: Path to the file
744
+ check_size: Whether to check file size limits
745
+ check_dimensions: Whether to check image dimension limits
746
+
747
+ Returns:
748
+ (is_safe, warnings) - tuple of boolean and list of warning messages
749
+
750
+ Raises:
751
+ ValueError: If file fails critical security checks
752
+ """
753
+ warnings = []
754
+
755
+ # Check if file exists
756
+ if not os.path.exists(file_path):
757
+ raise ValueError(f"File does not exist: {file_path}")
758
+
759
+ # Check file size to prevent DoS via huge files
760
+ if check_size:
761
+ file_size = os.path.getsize(file_path)
762
+ if file_size > MAX_FILE_SIZE:
763
+ raise ValueError(f"File too large ({file_size} bytes, max {MAX_FILE_SIZE}). "
764
+ f"This could indicate a malicious file or decompression bomb.")
765
+
766
+ # Warn about suspiciously large files (over 10MB for images is unusual)
767
+ if file_size > 10 * 1024 * 1024:
768
+ warnings.append(f"Large file size: {humanize.naturalsize(file_size)}")
769
+
770
+ # Check image dimensions to prevent decompression bombs
771
+ if check_dimensions:
772
+ try:
773
+ with Image.open(file_path) as img:
774
+ width, height = img.size
775
+ total_pixels = width * height
776
+
777
+ if total_pixels > MAX_IMAGE_PIXELS:
778
+ raise ValueError(f"Image dimensions too large ({width}x{height} = {total_pixels} pixels, "
779
+ f"max {MAX_IMAGE_PIXELS}). This could be a decompression bomb attack.")
780
+
781
+ # Warn about very large images
782
+ if total_pixels > 10000 * 10000:
783
+ warnings.append(f"Large image dimensions: {width}x{height}")
784
+
785
+ # Check for format mismatch (file extension vs actual format)
786
+ actual_format = img.format
787
+ expected_formats = []
788
+ for fmt, extensions in SUPPORTED_FORMATS.items():
789
+ if file_path.lower().endswith(extensions):
790
+ expected_formats.append(fmt)
791
+
792
+ if actual_format and expected_formats and actual_format not in expected_formats:
793
+ warnings.append(f"Format mismatch: file has '{file_path.split('.')[-1]}' extension "
794
+ f"but is actually '{actual_format}' format")
795
+
796
+ except UnidentifiedImageError:
797
+ raise ValueError(f"Cannot identify image format - file may be corrupted or malicious")
798
+ except Exception as e:
799
+ raise ValueError(f"Error validating image: {str(e)}")
800
+
801
+ return True, warnings
802
+
803
+
804
+ def calculate_file_hash(file_path, algorithm='sha256'):
805
+ """
806
+ Calculate cryptographic hash of a file.
807
+
808
+ Args:
809
+ file_path: Path to the file
810
+ algorithm: Hash algorithm to use (sha256, sha512, etc.)
811
+
812
+ Returns:
813
+ Hexadecimal hash string
814
+ """
815
+ hash_obj = hashlib.new(algorithm)
816
+
817
+ # Read file in chunks to handle large files
818
+ with open(file_path, 'rb') as f:
819
+ for chunk in iter(lambda: f.read(4096), b''):
820
+ hash_obj.update(chunk)
821
+
822
+ return hash_obj.hexdigest()
823
+
824
+
825
+ def safe_join_path(base_dir, user_path):
826
+ """
827
+ Safely join paths and prevent path traversal attacks.
828
+
829
+ Args:
830
+ base_dir: Base directory (trusted)
831
+ user_path: User-provided path component (untrusted)
832
+
833
+ Returns:
834
+ Safe absolute path within base_dir
835
+
836
+ Raises:
837
+ ValueError: If path traversal is detected
838
+ """
839
+ # Normalize base directory
840
+ base_dir = os.path.abspath(base_dir)
841
+
842
+ # Join paths
843
+ full_path = os.path.normpath(os.path.join(base_dir, user_path))
844
+
845
+ # Resolve any symlinks
846
+ full_path = os.path.abspath(full_path)
847
+
848
+ # Ensure the result is within base_dir
849
+ if not full_path.startswith(base_dir + os.sep) and full_path != base_dir:
850
+ raise ValueError(f"Path traversal detected: '{user_path}' resolves outside base directory")
851
+
852
+ return full_path
853
+
854
+
855
+ def save_progress(session_id, directory, formats, recursive, processed_files,
856
+ bad_files, repaired_files, progress_dir=DEFAULT_PROGRESS_DIR):
857
+ """Save the current progress to a file."""
858
+ # Create progress directory if it doesn't exist
859
+ if not os.path.exists(progress_dir):
860
+ os.makedirs(progress_dir, exist_ok=True)
861
+
862
+ # Create a progress state object
863
+ progress_state = {
864
+ 'version': VERSION,
865
+ 'timestamp': datetime.now().isoformat(),
866
+ 'directory': str(directory),
867
+ 'formats': formats,
868
+ 'recursive': recursive,
869
+ 'processed_files': _deduplicate(processed_files),
870
+ 'bad_files': _deduplicate(bad_files),
871
+ 'repaired_files': _deduplicate(repaired_files)
872
+ }
873
+
874
+ # Save to file using JSON instead of pickle for security
875
+ # This prevents arbitrary code execution via malicious progress files
876
+ progress_file = os.path.join(progress_dir, f"session_{session_id}.progress.json")
877
+ with open(progress_file, 'w') as f:
878
+ json.dump(progress_state, f, indent=2)
879
+
880
+ logging.debug(f"Progress saved to {progress_file}")
881
+ return progress_file
882
+
883
+ def load_progress(session_id, progress_dir=DEFAULT_PROGRESS_DIR):
884
+ """Load progress from a saved session."""
885
+ # Try new JSON format first (more secure)
886
+ progress_file_json = os.path.join(progress_dir, f"session_{session_id}.progress.json")
887
+ progress_file_legacy = os.path.join(progress_dir, f"session_{session_id}.progress")
888
+
889
+ # Prefer JSON format for security
890
+ if os.path.exists(progress_file_json):
891
+ progress_file = progress_file_json
892
+ use_json = True
893
+ elif os.path.exists(progress_file_legacy):
894
+ progress_file = progress_file_legacy
895
+ use_json = False
896
+ logging.warning("Loading legacy pickle format. This format is deprecated for security reasons.")
897
+ else:
898
+ return None
899
+
900
+ try:
901
+ if use_json:
902
+ # Secure JSON deserialization
903
+ with open(progress_file, 'r') as f:
904
+ progress_state = json.load(f)
905
+ else:
906
+ # Legacy pickle support (with warning)
907
+ # TODO: Remove pickle support in future versions
908
+ import pickle
909
+ with open(progress_file, 'rb') as f:
910
+ progress_state = pickle.load(f)
911
+ logging.warning("SECURITY WARNING: Loaded progress file using unsafe pickle format. "
912
+ "Please delete old .progress files and use new .progress.json format.")
913
+
914
+ # Remove any duplicate entries from lists
915
+ for key in ('processed_files', 'bad_files', 'repaired_files'):
916
+ if key in progress_state:
917
+ progress_state[key] = _deduplicate(progress_state[key])
918
+
919
+ # Check version compatibility
920
+ if progress_state.get('version', '0.0.0') != VERSION:
921
+ logging.warning("Progress file was created with a different version. Some incompatibilities may exist.")
922
+
923
+ logging.info(f"Loaded progress from {progress_file}")
924
+ return progress_state
925
+ except Exception as e:
926
+ logging.error(f"Failed to load progress: {str(e)}")
927
+ return None
928
+
929
+ def list_saved_sessions(progress_dir=DEFAULT_PROGRESS_DIR):
930
+ """List all saved sessions with their details."""
931
+ if not os.path.exists(progress_dir):
932
+ return []
933
+
934
+ sessions = []
935
+ for filename in os.listdir(progress_dir):
936
+ # Support both new JSON format and legacy pickle format
937
+ if filename.endswith('.progress.json') or filename.endswith('.progress'):
938
+ try:
939
+ filepath = os.path.join(progress_dir, filename)
940
+ use_json = filename.endswith('.progress.json')
941
+
942
+ if use_json:
943
+ with open(filepath, 'r') as f:
944
+ progress_state = json.load(f)
945
+ else:
946
+ # Legacy pickle format
947
+ import pickle
948
+ with open(filepath, 'rb') as f:
949
+ progress_state = pickle.load(f)
950
+
951
+ # Extract session ID from filename
952
+ if filename.endswith('.progress.json'):
953
+ session_id = filename.replace('session_', '').replace('.progress.json', '')
954
+ else:
955
+ session_id = filename.replace('session_', '').replace('.progress', '')
956
+
957
+ session_info = {
958
+ 'id': session_id,
959
+ 'timestamp': progress_state.get('timestamp', 'Unknown'),
960
+ 'directory': progress_state.get('directory', 'Unknown'),
961
+ 'formats': progress_state.get('formats', []),
962
+ 'processed_count': len(progress_state.get('processed_files', [])),
963
+ 'bad_count': len(progress_state.get('bad_files', [])),
964
+ 'repaired_count': len(progress_state.get('repaired_files', [])),
965
+ 'filepath': filepath,
966
+ 'format': 'JSON' if use_json else 'Pickle (Legacy)'
967
+ }
968
+ sessions.append(session_info)
969
+ except Exception as e:
970
+ logging.debug(f"Failed to load session from {filename}: {str(e)}")
971
+
972
+ # Sort by timestamp, newest first
973
+ sessions.sort(key=lambda x: x['timestamp'], reverse=True)
974
+ return sessions
975
+
976
+ def get_extensions_for_formats(formats):
977
+ """Get all file extensions for the specified formats."""
978
+ extensions = []
979
+ for fmt in formats:
980
+ if fmt in SUPPORTED_FORMATS:
981
+ extensions.extend(SUPPORTED_FORMATS[fmt])
982
+ return tuple(extensions)
983
+
984
+ def find_image_files(directory, formats, recursive=True):
985
+ """Find all image files of specified formats in a directory."""
986
+ image_files = []
987
+ extensions = get_extensions_for_formats(formats)
988
+
989
+ if not extensions:
990
+ logging.warning("No valid image formats specified!")
991
+ return []
992
+
993
+ format_names = ", ".join(formats)
994
+ if recursive:
995
+ logging.info(f"Recursively scanning for {format_names} files...")
996
+ for root, _, files in os.walk(directory):
997
+ for file in files:
998
+ if file.lower().endswith(extensions):
999
+ image_files.append(os.path.join(root, file))
1000
+ else:
1001
+ logging.info(f"Scanning for {format_names} files in {directory} (non-recursive)...")
1002
+ for file in os.listdir(directory):
1003
+ if os.path.isfile(os.path.join(directory, file)) and file.lower().endswith(extensions):
1004
+ image_files.append(os.path.join(directory, file))
1005
+
1006
+ logging.info(f"Found {len(image_files)} image files")
1007
+ return image_files
1008
+
1009
+ def process_images(directory, formats, dry_run=True, repair=False,
1010
+ max_workers=None, recursive=True, move_to=None, repair_dir=None,
1011
+ save_progress_interval=5, resume_session=None, progress_dir=DEFAULT_PROGRESS_DIR,
1012
+ thorough_check=False, sensitivity='medium', ignore_eof=False, check_visual=False,
1013
+ visual_strictness='medium', enable_security_checks=False):
1014
+ """Find corrupt image files and optionally repair, delete, or move them."""
1015
+ start_time = time.time()
1016
+
1017
+ # Generate session ID for this scan
1018
+ session_id = get_session_id(directory, formats, recursive)
1019
+ processed_files = []
1020
+ bad_files = []
1021
+ repaired_files = []
1022
+ total_size_saved = 0
1023
+ last_progress_save = time.time()
1024
+
1025
+ # If resuming, load previous progress
1026
+ if resume_session:
1027
+ try:
1028
+ progress = load_progress(resume_session, progress_dir)
1029
+ if progress and progress['directory'] == str(directory) and progress['formats'] == formats:
1030
+ processed_files = progress['processed_files']
1031
+ bad_files = progress['bad_files']
1032
+ repaired_files = progress['repaired_files']
1033
+ logging.info(f"Resuming session: {len(processed_files)} files already processed")
1034
+ else:
1035
+ if progress:
1036
+ logging.warning("Session parameters don't match current parameters. Starting fresh scan.")
1037
+ else:
1038
+ logging.warning(f"Couldn't find session {resume_session}. Starting fresh scan.")
1039
+ except Exception as e:
1040
+ logging.error(f"Error loading session: {str(e)}. Starting fresh scan.")
1041
+
1042
+ # Find all image files
1043
+ image_files = find_image_files(directory, formats, recursive)
1044
+ if not image_files:
1045
+ logging.warning("No image files found!")
1046
+ return [], [], 0
1047
+
1048
+ # Filter out already processed files if resuming
1049
+ if processed_files:
1050
+ remaining_files = [f for f in image_files if f not in processed_files]
1051
+ skipped_count = len(image_files) - len(remaining_files)
1052
+ image_files = remaining_files
1053
+ logging.info(f"Skipping {skipped_count} already processed files")
1054
+
1055
+ if not image_files:
1056
+ logging.info("All files have already been processed in the previous session!")
1057
+ return bad_files, repaired_files, total_size_saved
1058
+
1059
+ # Create directories if they don't exist
1060
+ if move_to and not os.path.exists(move_to):
1061
+ os.makedirs(move_to)
1062
+ logging.info(f"Created directory for corrupt files: {move_to}")
1063
+
1064
+ if repair and repair_dir and not os.path.exists(repair_dir):
1065
+ os.makedirs(repair_dir)
1066
+ logging.info(f"Created directory for backup files: {repair_dir}")
1067
+
1068
+ # Prepare input arguments for workers
1069
+ input_args = [(file_path, repair, repair_dir, thorough_check, sensitivity, ignore_eof, check_visual, visual_strictness, enable_security_checks) for file_path in image_files]
1070
+
1071
+ # Process files in parallel
1072
+ logging.info("Processing files in parallel...")
1073
+
1074
+ # Create a custom progress bar class that saves progress periodically
1075
+ class ProgressSavingBar(tqdm_auto.tqdm):
1076
+ def update(self, n=1):
1077
+ nonlocal last_progress_save, processed_files
1078
+ result = super().update(n)
1079
+
1080
+ # Save progress periodically
1081
+ current_time = time.time()
1082
+ if save_progress_interval > 0 and current_time - last_progress_save >= save_progress_interval * 60:
1083
+ # Save the progress using the list of files that have actually
1084
+ # completed processing. ``processed_files`` is updated as each
1085
+ # future finishes so we can safely persist it as-is.
1086
+ save_progress(
1087
+ session_id,
1088
+ directory,
1089
+ formats,
1090
+ recursive,
1091
+ processed_files,
1092
+ bad_files,
1093
+ repaired_files,
1094
+ progress_dir,
1095
+ )
1096
+
1097
+ last_progress_save = current_time
1098
+ logging.debug(f"Progress saved at {self.n} / {len(image_files)} files")
1099
+
1100
+ return result
1101
+
1102
+ try:
1103
+ with concurrent.futures.ProcessPoolExecutor(max_workers=max_workers) as executor:
1104
+ # Colorful progress bar with progress saving
1105
+ results = []
1106
+ futures = {executor.submit(process_file, arg): arg[0] for arg in input_args}
1107
+
1108
+ with ProgressSavingBar(
1109
+ total=len(image_files),
1110
+ desc=f"{colorama.Fore.BLUE}Checking image files{colorama.Style.RESET_ALL}",
1111
+ unit="file",
1112
+ bar_format="{desc}: {percentage:3.0f}%|{bar:30}| {n_fmt}/{total_fmt} [{elapsed}<{remaining}, {rate_fmt}]",
1113
+ colour="blue"
1114
+ ) as pbar:
1115
+ for future in concurrent.futures.as_completed(futures):
1116
+ file_path = futures[future]
1117
+ try:
1118
+ result = future.result()
1119
+ results.append(result)
1120
+
1121
+ # Track this file as processed for resuming later if needed
1122
+ processed_files.append(file_path)
1123
+
1124
+ # Update progress for successful or failed processing
1125
+ pbar.update(1)
1126
+
1127
+ # Update our tracking of bad/repaired files in real-time for progress saving
1128
+ file_path, is_valid, size, repair_status, repair_msg, dimensions = result
1129
+ if repair_status == "repaired":
1130
+ repaired_files.append(file_path)
1131
+ elif not is_valid:
1132
+ bad_files.append(file_path)
1133
+
1134
+ except Exception as e:
1135
+ logging.error(f"Error processing {file_path}: {str(e)}")
1136
+ pbar.update(1)
1137
+ except KeyboardInterrupt:
1138
+ # If the user interrupts, save progress before exiting
1139
+ logging.warning("Process interrupted by user. Saving progress...")
1140
+ save_progress(session_id, directory, formats, recursive,
1141
+ processed_files, bad_files, repaired_files, progress_dir)
1142
+ logging.info(f"Progress saved. You can resume with --resume {session_id}")
1143
+ raise
1144
+
1145
+ # Process results
1146
+ total_size_saved = 0
1147
+ for file_path, is_valid, size, repair_status, repair_msg, dimensions in results:
1148
+ if repair_status == "repaired":
1149
+ # File was successfully repaired (already added to repaired_files during processing)
1150
+ width, height = dimensions
1151
+ msg = f"Repaired: {file_path} ({width}x{height}) - {repair_msg}"
1152
+ logging.info(msg)
1153
+ elif not is_valid:
1154
+ # File is corrupt and wasn't repaired (or repair failed)
1155
+ # (already added to bad_files during processing)
1156
+ total_size_saved += size
1157
+
1158
+ size_str = humanize.naturalsize(size)
1159
+ if repair_status == "repair_failed":
1160
+ fail_msg = f"Repair failed: {file_path} ({size_str}) - {repair_msg}"
1161
+ logging.warning(fail_msg)
1162
+
1163
+ if dry_run:
1164
+ msg = f"Would delete: {file_path} ({size_str})"
1165
+ logging.info(msg)
1166
+ elif move_to:
1167
+ # Preserve the subdirectory structure by getting the relative path from the search directory
1168
+ try:
1169
+ # Get the relative path from the base directory
1170
+ rel_path = os.path.relpath(file_path, str(directory))
1171
+ # If relpath starts with ".." it means file_path is not within directory
1172
+ # In this case, just use the basename as fallback
1173
+ if rel_path.startswith('..'):
1174
+ rel_path = os.path.basename(file_path)
1175
+
1176
+ # Use safe path joining to prevent path traversal attacks
1177
+ # This ensures files can't be written outside the move_to directory
1178
+ try:
1179
+ dest_path = safe_join_path(move_to, rel_path)
1180
+ except ValueError as ve:
1181
+ logging.error(f"Security error moving {file_path}: {ve}")
1182
+ continue
1183
+
1184
+ # Create parent directories if they don't exist
1185
+ os.makedirs(os.path.dirname(dest_path), exist_ok=True)
1186
+
1187
+ # Use shutil.move instead of os.rename to handle cross-device file movements
1188
+ shutil.move(file_path, dest_path)
1189
+
1190
+ # Add arrow with color
1191
+ arrow = f"{colorama.Fore.CYAN}→{colorama.Style.RESET_ALL}"
1192
+ msg = f"Moved: {file_path} {arrow} {dest_path} ({size_str})"
1193
+ logging.info(msg)
1194
+ except Exception as e:
1195
+ logging.error(f"Failed to move {file_path}: {e}")
1196
+ else:
1197
+ try:
1198
+ os.remove(file_path)
1199
+ msg = f"Deleted: {file_path} ({size_str})"
1200
+ logging.info(msg)
1201
+ except Exception as e:
1202
+ logging.error(f"Failed to delete {file_path}: {e}")
1203
+
1204
+ # Final progress save
1205
+ save_progress(session_id, directory, formats, recursive,
1206
+ processed_files, bad_files, repaired_files, progress_dir)
1207
+
1208
+ elapsed = time.time() - start_time
1209
+ logging.info(f"Processed {len(processed_files)} files in {elapsed:.2f} seconds")
1210
+ logging.info(f"Session ID: {session_id} (use --resume {session_id} to resume if needed)")
1211
+
1212
+ return bad_files, repaired_files, total_size_saved
1213
+
1214
+ def print_banner():
1215
+ """Print 2PAC-themed ASCII art banner"""
1216
+ banner = r"""
1217
+ ░▒▓███████▓▒░░▒▓███████▓▒░ ░▒▓██████▓▒░ ░▒▓██████▓▒░
1218
+ ░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░
1219
+ ░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░
1220
+ ░▒▓██████▓▒░░▒▓███████▓▒░░▒▓████████▓▒░▒▓█▓▒░
1221
+ ░▒▓█▓▒░ ░▒▓█▓▒░ ░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░
1222
+ ░▒▓█▓▒░ ░▒▓█▓▒░ ░▒▓█▓▒░░▒▓█▓▒░▒▓█▓▒░░▒▓█▓▒░
1223
+ ░▒▓████████▓▒░▒▓█▓▒░ ░▒▓█▓▒░░▒▓█▓▒░░▒▓██████▓▒░
1224
+ ╔═════════════════════════════════════════════════════════╗
1225
+ ║ The Picture Analyzer & Corruption killer ║
1226
+ ║ In memory of Jeff Young - Bringing people together ║
1227
+ ╚═════════════════════════════════════════════════════════╝
1228
+ """
1229
+
1230
+ # Colored version of the banner, highlighting PAC for Picture Analyzer Corruption
1231
+ if 'colorama' in sys.modules:
1232
+ banner_lines = banner.strip().split('\n')
1233
+ colored_banner = []
1234
+
1235
+ # Color the new gradient ASCII art logo (lines 0-6)
1236
+ for i, line in enumerate(banner_lines):
1237
+ if i < 7: # The ASCII art logo lines for the new gradient style
1238
+ # For "2" part (first column)
1239
+ part1 = line[:11]
1240
+ # For "P" part (second column)
1241
+ part2 = line[11:24]
1242
+ # For "A" part (third column)
1243
+ part3 = line[24:38]
1244
+ # For "C" part (fourth column)
1245
+ part4 = line[38:]
1246
+
1247
+ colored_line = f"{colorama.Fore.WHITE}{part1}" + \
1248
+ f"{colorama.Fore.RED}{part2}" + \
1249
+ f"{colorama.Fore.GREEN}{part3}" + \
1250
+ f"{colorama.Fore.BLUE}{part4}{colorama.Style.RESET_ALL}"
1251
+
1252
+ colored_banner.append(colored_line)
1253
+ elif i >= 7 and i <= 10: # The box and text lines
1254
+ if i == 8: # Title line with PAC highlighted
1255
+ parts = line.split("Picture Analyzer & Corruption")
1256
+ if len(parts) == 2:
1257
+ prefix = parts[0]
1258
+ suffix = parts[1]
1259
+ colored_title = f"{colorama.Fore.YELLOW}{prefix}" + \
1260
+ f"{colorama.Fore.RED}Picture " + \
1261
+ f"{colorama.Fore.GREEN}Analyzer " + \
1262
+ f"{colorama.Fore.WHITE}& " + \
1263
+ f"{colorama.Fore.BLUE}Corruption" + \
1264
+ f"{colorama.Fore.YELLOW}{suffix}{colorama.Style.RESET_ALL}"
1265
+ colored_banner.append(colored_title)
1266
+ else:
1267
+ colored_banner.append(f"{colorama.Fore.YELLOW}{line}{colorama.Style.RESET_ALL}")
1268
+ elif i == 9: # Jeff Young tribute line
1269
+ colored_banner.append(f"{colorama.Fore.CYAN}{line}{colorama.Style.RESET_ALL}")
1270
+ else: # Box border lines
1271
+ colored_banner.append(f"{colorama.Fore.YELLOW}{line}{colorama.Style.RESET_ALL}")
1272
+ else:
1273
+ colored_banner.append(f"{colorama.Fore.WHITE}{line}{colorama.Style.RESET_ALL}")
1274
+
1275
+ print('\n'.join(colored_banner))
1276
+ else:
1277
+ print(banner)
1278
+ print()
1279
+
1280
+ def main():
1281
+ print_banner()
1282
+
1283
+ # Check for 'q' command to quit
1284
+ if len(sys.argv) == 2 and sys.argv[1].lower() == 'q':
1285
+ print(f"{colorama.Fore.YELLOW}Exiting 2PAC. Stay safe!{colorama.Style.RESET_ALL}")
1286
+ sys.exit(0)
1287
+
1288
+ parser = argparse.ArgumentParser(
1289
+ description='2PAC: The Picture Analyzer & Corruption killer',
1290
+ epilog='Created by Richard Young - "All Eyez On Your Images" - https://github.com/ricyoung/2pac'
1291
+ )
1292
+
1293
+ # Main action (mutually exclusive)
1294
+ action_group = parser.add_mutually_exclusive_group()
1295
+ action_group.add_argument('directory', nargs='?', help='Directory to search for image files')
1296
+ action_group.add_argument('--list-sessions', action='store_true', help='List all saved sessions')
1297
+ action_group.add_argument('--check-file', type=str, help='Check a specific file for corruption (useful for testing)')
1298
+
1299
+ # Basic options
1300
+ parser.add_argument('--delete', action='store_true', help='Delete corrupt image files (without this flag, runs in dry-run mode)')
1301
+ parser.add_argument('--move-to', type=str, help='Move corrupt files to this directory instead of deleting them')
1302
+ parser.add_argument('--workers', type=int, default=None, help='Number of worker processes (default: CPU count)')
1303
+ parser.add_argument('--non-recursive', action='store_true', help='Only search in the specified directory, not subdirectories')
1304
+ parser.add_argument('--output', type=str, help='Save list of corrupt files to this file')
1305
+ parser.add_argument('--verbose', '-v', action='store_true', help='Enable verbose logging')
1306
+ parser.add_argument('--no-color', action='store_true', help='Disable colored output')
1307
+ parser.add_argument('--version', action='version', version=f'Bad Image Finder v{VERSION} by Richard Young')
1308
+
1309
+ # Repair options
1310
+ repair_group = parser.add_argument_group('Repair options')
1311
+ repair_group.add_argument('--repair', action='store_true', help='Attempt to repair corrupt image files')
1312
+ repair_group.add_argument('--backup-dir', type=str, help='Directory to store backups of files before repair')
1313
+ repair_group.add_argument('--repair-report', type=str, help='Save list of repaired files to this file')
1314
+
1315
+ # Format options
1316
+ format_group = parser.add_argument_group('Image format options')
1317
+ format_group.add_argument('--formats', type=str, nargs='+', choices=SUPPORTED_FORMATS.keys(),
1318
+ help=f'Image formats to check (default: all formats)')
1319
+ format_group.add_argument('--jpeg', action='store_true', help='Check JPEG files only')
1320
+ format_group.add_argument('--png', action='store_true', help='Check PNG files only')
1321
+ format_group.add_argument('--tiff', action='store_true', help='Check TIFF files only')
1322
+ format_group.add_argument('--gif', action='store_true', help='Check GIF files only')
1323
+ format_group.add_argument('--bmp', action='store_true', help='Check BMP files only')
1324
+
1325
+ # Validation options
1326
+ validation_group = parser.add_argument_group('Validation options')
1327
+ validation_group.add_argument('--thorough', action='store_true',
1328
+ help='Perform thorough image validation (slower but catches more subtle corruption)')
1329
+ validation_group.add_argument('--sensitivity', type=str, choices=['low', 'medium', 'high'], default='medium',
1330
+ help='Set validation sensitivity level: low (basic checks), medium (standard checks), high (most strict)')
1331
+ validation_group.add_argument('--ignore-eof', action='store_true',
1332
+ help='Ignore missing end-of-file markers (useful for truncated but viewable files)')
1333
+ validation_group.add_argument('--check-visual', action='store_true',
1334
+ help='Analyze image content to detect visible corruption like gray/black areas')
1335
+ validation_group.add_argument('--visual-strictness', type=str, choices=['low', 'medium', 'high'], default='medium',
1336
+ help='Set strictness level for visual corruption detection: low (most permissive), medium (balanced), high (only clear corruption)')
1337
+
1338
+ # Security options
1339
+ security_group = parser.add_argument_group('Security options')
1340
+ security_group.add_argument('--security-checks', action='store_true',
1341
+ help='Enable enhanced security validation (file size limits, dimension checks, format verification)')
1342
+ security_group.add_argument('--max-file-size', type=int, default=MAX_FILE_SIZE,
1343
+ help=f'Maximum file size in bytes to process (default: {MAX_FILE_SIZE} = 100MB)')
1344
+ security_group.add_argument('--max-pixels', type=int, default=MAX_IMAGE_PIXELS,
1345
+ help=f'Maximum image dimensions in pixels (default: {MAX_IMAGE_PIXELS} = 50MP)')
1346
+
1347
+ # Progress saving options
1348
+ progress_group = parser.add_argument_group('Progress options')
1349
+ progress_group.add_argument('--save-interval', type=int, default=5,
1350
+ help='Save progress every N minutes (0 to disable progress saving)')
1351
+ progress_group.add_argument('--progress-dir', type=str, default=DEFAULT_PROGRESS_DIR,
1352
+ help='Directory to store progress files')
1353
+ progress_group.add_argument('--resume', type=str, metavar='SESSION_ID',
1354
+ help='Resume from a previously saved session')
1355
+
1356
+ args = parser.parse_args()
1357
+
1358
+ # Setup logging
1359
+ setup_logging(args.verbose, args.no_color)
1360
+
1361
+ # Handle specific file check mode
1362
+ if args.check_file:
1363
+ file_path = args.check_file
1364
+ if not os.path.exists(file_path):
1365
+ logging.error(f"Error: File not found: {file_path}")
1366
+ sys.exit(1)
1367
+
1368
+ print(f"\n{colorama.Style.BRIGHT}Checking file: {file_path}{colorama.Style.RESET_ALL}\n")
1369
+
1370
+ # Basic check
1371
+ print(f"{colorama.Fore.CYAN}Basic validation:{colorama.Style.RESET_ALL}")
1372
+ try:
1373
+ with Image.open(file_path) as img:
1374
+ print(f"✓ File can be opened by PIL")
1375
+ print(f" Format: {img.format}")
1376
+ print(f" Mode: {img.mode}")
1377
+ print(f" Size: {img.size[0]}x{img.size[1]}")
1378
+
1379
+ try:
1380
+ img.verify()
1381
+ print(f"✓ Header verification passed")
1382
+ except Exception as e:
1383
+ print(f"❌ Header verification failed: {str(e)}")
1384
+
1385
+ try:
1386
+ with Image.open(file_path) as img2:
1387
+ img2.load()
1388
+ print(f"✓ Data loading test passed")
1389
+ except Exception as e:
1390
+ print(f"❌ Data loading test failed: {str(e)}")
1391
+ except Exception as e:
1392
+ print(f"❌ Cannot open file with PIL: {str(e)}")
1393
+
1394
+ # Detailed format-specific checks
1395
+ if file_path.lower().endswith(tuple(SUPPORTED_FORMATS['JPEG'])):
1396
+ print(f"\n{colorama.Fore.CYAN}JPEG structure checks:{colorama.Style.RESET_ALL}")
1397
+ is_valid, msg = check_jpeg_structure(file_path)
1398
+ if is_valid:
1399
+ print(f"✓ JPEG structure valid: {msg}")
1400
+ else:
1401
+ print(f"❌ JPEG structure invalid: {msg}")
1402
+ elif file_path.lower().endswith(tuple(SUPPORTED_FORMATS['PNG'])):
1403
+ print(f"\n{colorama.Fore.CYAN}PNG structure checks:{colorama.Style.RESET_ALL}")
1404
+ is_valid, msg = check_png_structure(file_path)
1405
+ if is_valid:
1406
+ print(f"✓ PNG structure valid: {msg}")
1407
+ else:
1408
+ print(f"❌ PNG structure invalid: {msg}")
1409
+
1410
+ # Decode test
1411
+ print(f"\n{colorama.Fore.CYAN}Full decode test:{colorama.Style.RESET_ALL}")
1412
+ is_valid, msg = try_full_decode_check(file_path)
1413
+ if is_valid:
1414
+ print(f"✓ Full decode test passed: {msg}")
1415
+ else:
1416
+ print(f"❌ Full decode test failed: {msg}")
1417
+
1418
+ # External tools check
1419
+ print(f"\n{colorama.Fore.CYAN}External tools check:{colorama.Style.RESET_ALL}")
1420
+ is_valid, msg = try_external_tools(file_path)
1421
+ if is_valid:
1422
+ print(f"✓ External tools: {msg}")
1423
+ else:
1424
+ print(f"❌ External tools: {msg}")
1425
+
1426
+ # Visual corruption check
1427
+ print(f"\n{colorama.Fore.CYAN}Visual content analysis:{colorama.Style.RESET_ALL}")
1428
+ is_visually_corrupt, vis_msg = check_visual_corruption(file_path)
1429
+ if not is_visually_corrupt:
1430
+ print(f"✓ No visual corruption detected: {vis_msg}")
1431
+ else:
1432
+ print(f"❌ {vis_msg}")
1433
+
1434
+ # Final verdict
1435
+ print(f"\n{colorama.Fore.CYAN}Final verdict:{colorama.Style.RESET_ALL}")
1436
+ is_valid_basic = is_valid_image(file_path, thorough=False)
1437
+ is_valid_thorough = is_valid_image(file_path, thorough=True)
1438
+ is_valid_visual = not is_visually_corrupt
1439
+
1440
+ if is_valid_basic and is_valid_thorough and is_valid_visual:
1441
+ print(f"{colorama.Fore.GREEN}This file appears to be valid by all checks.{colorama.Style.RESET_ALL}")
1442
+ elif not is_valid_visual:
1443
+ print(f"{colorama.Fore.RED}This file shows visible corruption in the image content.{colorama.Style.RESET_ALL}")
1444
+ print(f"Recommendation: Use --check-visual to detect this type of corruption.")
1445
+ elif is_valid_basic and not is_valid_thorough:
1446
+ print(f"{colorama.Fore.YELLOW}This file passes basic validation but fails thorough checks.{colorama.Style.RESET_ALL}")
1447
+ print(f"Recommendation: Use --thorough mode to detect this type of corruption.")
1448
+ else:
1449
+ print(f"{colorama.Fore.RED}This file is corrupt and would be detected by the basic scan.{colorama.Style.RESET_ALL}")
1450
+
1451
+ sys.exit(0)
1452
+
1453
+ # Handle session listing mode
1454
+ if args.list_sessions:
1455
+ sessions = list_saved_sessions(args.progress_dir)
1456
+ if sessions:
1457
+ print(f"\n{colorama.Style.BRIGHT}Saved Sessions:{colorama.Style.RESET_ALL}")
1458
+ for i, session in enumerate(sessions):
1459
+ ts = datetime.fromisoformat(session['timestamp']).strftime('%Y-%m-%d %H:%M:%S')
1460
+ print(f"\n{colorama.Fore.CYAN}Session ID: {session['id']}{colorama.Style.RESET_ALL}")
1461
+ print(f" Created: {ts}")
1462
+ print(f" Directory: {session['directory']}")
1463
+ print(f" Formats: {', '.join(session['formats'])}")
1464
+ print(f" Progress: {session['processed_count']} files processed, "
1465
+ f"{session['bad_count']} corrupt, {session['repaired_count']} repaired")
1466
+
1467
+ # Show resume command
1468
+ resume_cmd = f"find_bad_images.py --resume {session['id']}"
1469
+ if os.path.exists(session['directory']):
1470
+ print(f" {colorama.Fore.GREEN}Resume command: {resume_cmd}{colorama.Style.RESET_ALL}")
1471
+ else:
1472
+ print(f" {colorama.Fore.YELLOW}Directory no longer exists, cannot resume{colorama.Style.RESET_ALL}")
1473
+ else:
1474
+ print("No saved sessions found.")
1475
+ sys.exit(0)
1476
+
1477
+ # Check if directory is specified for a new scan
1478
+ if not args.directory and not args.resume:
1479
+ logging.error("Error: You must specify a directory to scan or use --resume to continue a session")
1480
+ sys.exit(1)
1481
+
1482
+ # If we're resuming without a directory, load from previous session
1483
+ directory = None
1484
+ if args.resume and not args.directory:
1485
+ progress = load_progress(args.resume, args.progress_dir)
1486
+ if progress:
1487
+ directory = Path(progress['directory'])
1488
+ logging.info(f"Using directory from saved session: {directory}")
1489
+ else:
1490
+ logging.error(f"Could not load session {args.resume}")
1491
+ sys.exit(1)
1492
+ elif args.directory:
1493
+ directory = Path(args.directory)
1494
+
1495
+ # Verify the directory exists
1496
+ if not directory.exists() or not directory.is_dir():
1497
+ logging.error(f"Error: {directory} is not a valid directory")
1498
+ sys.exit(1)
1499
+
1500
+ # Check for incompatible options
1501
+ if args.delete and args.move_to:
1502
+ logging.error("Error: Cannot use both --delete and --move-to options")
1503
+ sys.exit(1)
1504
+
1505
+ # Determine which formats to check
1506
+ formats = []
1507
+ if args.formats:
1508
+ formats = args.formats
1509
+ elif args.jpeg:
1510
+ formats.append('JPEG')
1511
+ elif args.png:
1512
+ formats.append('PNG')
1513
+ elif args.tiff:
1514
+ formats.append('TIFF')
1515
+ elif args.gif:
1516
+ formats.append('GIF')
1517
+ elif args.bmp:
1518
+ formats.append('BMP')
1519
+ else:
1520
+ # Default: check all formats
1521
+ formats = DEFAULT_FORMATS
1522
+
1523
+ dry_run = not (args.delete or args.move_to)
1524
+
1525
+ # Colorful mode indicators
1526
+ if args.repair:
1527
+ mode_str = f"{colorama.Fore.MAGENTA}REPAIR MODE{colorama.Style.RESET_ALL}: Attempting to fix corrupt files"
1528
+ logging.info(mode_str)
1529
+
1530
+ repairable_formats = [fmt for fmt in formats if fmt in REPAIRABLE_FORMATS]
1531
+ if repairable_formats:
1532
+ logging.info(f"Repairable formats: {', '.join(repairable_formats)}")
1533
+ else:
1534
+ logging.warning("None of the selected formats support repair")
1535
+
1536
+ if dry_run:
1537
+ mode_str = f"{colorama.Fore.YELLOW}DRY RUN MODE{colorama.Style.RESET_ALL}: No files will be deleted or moved"
1538
+ logging.info(mode_str)
1539
+ elif args.move_to:
1540
+ mode_str = f"{colorama.Fore.BLUE}MOVE MODE{colorama.Style.RESET_ALL}: Corrupt files will be moved to {args.move_to}"
1541
+ logging.info(mode_str)
1542
+ else:
1543
+ mode_str = f"{colorama.Fore.RED}DELETE MODE{colorama.Style.RESET_ALL}: Corrupt files will be permanently deleted"
1544
+ logging.info(mode_str)
1545
+
1546
+ # Add progress saving info
1547
+ if args.save_interval > 0:
1548
+ save_interval_str = f"{colorama.Fore.CYAN}PROGRESS SAVING{colorama.Style.RESET_ALL}: Every {args.save_interval} minutes"
1549
+ logging.info(save_interval_str)
1550
+ else:
1551
+ logging.info("Progress saving is disabled")
1552
+
1553
+ if args.resume:
1554
+ resume_str = f"{colorama.Fore.CYAN}RESUMING{colorama.Style.RESET_ALL}: From session {args.resume}"
1555
+ logging.info(resume_str)
1556
+
1557
+ if args.thorough:
1558
+ thorough_str = f"{colorama.Fore.MAGENTA}THOROUGH MODE{colorama.Style.RESET_ALL}: Using deep validation checks (slower but more accurate)"
1559
+ logging.info(thorough_str)
1560
+
1561
+ # Show sensitivity level
1562
+ sensitivity_colors = {
1563
+ 'low': colorama.Fore.GREEN,
1564
+ 'medium': colorama.Fore.YELLOW,
1565
+ 'high': colorama.Fore.RED
1566
+ }
1567
+ sensitivity_color = sensitivity_colors.get(args.sensitivity, colorama.Fore.YELLOW)
1568
+ sensitivity_str = f"{sensitivity_color}SENSITIVITY: {args.sensitivity.upper()}{colorama.Style.RESET_ALL}"
1569
+ logging.info(sensitivity_str)
1570
+
1571
+ # Show EOF handling
1572
+ if args.ignore_eof:
1573
+ eof_str = f"{colorama.Fore.CYAN}IGNORING EOF MARKERS{colorama.Style.RESET_ALL}: Allowing truncated but viewable files"
1574
+ logging.info(eof_str)
1575
+
1576
+ # Show visual corruption checking status
1577
+ if args.check_visual:
1578
+ strictness_color = {
1579
+ 'low': colorama.Fore.GREEN,
1580
+ 'medium': colorama.Fore.YELLOW,
1581
+ 'high': colorama.Fore.RED
1582
+ }.get(args.visual_strictness, colorama.Fore.YELLOW)
1583
+
1584
+ visual_str = f"{colorama.Fore.MAGENTA}VISUAL CHECK{colorama.Style.RESET_ALL}: " + \
1585
+ f"Analyzing image content (strictness: {strictness_color}{args.visual_strictness.upper()}{colorama.Style.RESET_ALL})"
1586
+ logging.info(visual_str)
1587
+
1588
+ # Show security checks status
1589
+ if args.security_checks:
1590
+ security_str = f"{colorama.Fore.RED}SECURITY CHECKS ENABLED{colorama.Style.RESET_ALL}: " + \
1591
+ f"Validating file sizes (max {humanize.naturalsize(MAX_FILE_SIZE)}), " + \
1592
+ f"dimensions (max {MAX_IMAGE_PIXELS:,} pixels), and format integrity"
1593
+ logging.info(security_str)
1594
+
1595
+ # Show which formats we're checking
1596
+ format_list = ", ".join(formats)
1597
+ logging.info(f"Checking image formats: {format_list}")
1598
+ logging.info(f"Searching for corrupt image files in {directory}")
1599
+
1600
+ try:
1601
+ bad_files, repaired_files, total_size_saved = process_images(
1602
+ directory,
1603
+ formats,
1604
+ dry_run=dry_run,
1605
+ repair=args.repair,
1606
+ max_workers=args.workers,
1607
+ recursive=not args.non_recursive,
1608
+ move_to=args.move_to,
1609
+ repair_dir=args.backup_dir,
1610
+ save_progress_interval=args.save_interval,
1611
+ resume_session=args.resume,
1612
+ progress_dir=args.progress_dir,
1613
+ thorough_check=args.thorough,
1614
+ sensitivity=args.sensitivity,
1615
+ ignore_eof=args.ignore_eof,
1616
+ check_visual=args.check_visual,
1617
+ visual_strictness=args.visual_strictness,
1618
+ enable_security_checks=args.security_checks
1619
+ )
1620
+
1621
+ # Colorful summary
1622
+ count_color = colorama.Fore.RED if bad_files else colorama.Fore.GREEN
1623
+ file_count = f"{count_color}{len(bad_files)}{colorama.Style.RESET_ALL}"
1624
+ logging.info(f"Found {file_count} corrupt image files")
1625
+
1626
+ if args.repair:
1627
+ repair_color = colorama.Fore.GREEN if repaired_files else colorama.Fore.YELLOW
1628
+ repair_count = f"{repair_color}{len(repaired_files)}{colorama.Style.RESET_ALL}"
1629
+ logging.info(f"Successfully repaired {repair_count} files")
1630
+
1631
+ if args.repair_report and repaired_files:
1632
+ with open(args.repair_report, 'w') as f:
1633
+ for file_path in repaired_files:
1634
+ f.write(f"{file_path}\n")
1635
+ logging.info(f"Saved list of repaired files to {args.repair_report}")
1636
+
1637
+ savings_str = humanize.naturalsize(total_size_saved)
1638
+ savings_color = colorama.Fore.GREEN if total_size_saved > 0 else colorama.Fore.RESET
1639
+ savings_msg = f"Total space savings: {savings_color}{savings_str}{colorama.Style.RESET_ALL}"
1640
+ logging.info(savings_msg)
1641
+
1642
+ if not args.no_color:
1643
+ # Add signature at the end of the run
1644
+ signature = f"\n{colorama.Fore.CYAN}2PAC v{VERSION} by Richard Young{colorama.Style.RESET_ALL}"
1645
+ quote = f"{colorama.Fore.YELLOW}\"{random.choice(QUOTES)}\"{colorama.Style.RESET_ALL}"
1646
+ print(signature)
1647
+ print(quote)
1648
+
1649
+ # Save list of corrupt files if requested
1650
+ if args.output and bad_files:
1651
+ with open(args.output, 'w') as f:
1652
+ for file_path in bad_files:
1653
+ f.write(f"{file_path}\n")
1654
+ logging.info(f"Saved list of corrupt files to {args.output}")
1655
+
1656
+ if bad_files and dry_run:
1657
+ logging.info("Run with --delete to remove these files or --move-to to relocate them")
1658
+
1659
+ except KeyboardInterrupt:
1660
+ logging.info("Operation cancelled by user")
1661
+ sys.exit(130)
1662
+ except Exception as e:
1663
+ logging.error(f"Error: {str(e)}")
1664
+ if args.verbose:
1665
+ import traceback
1666
+ traceback.print_exc()
1667
+ sys.exit(1)
1668
+
1669
+ if __name__ == "__main__":
1670
+ main()
rat_finder.py ADDED
@@ -0,0 +1,1223 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ RAT Finder - Beta steganography detection tool for 2PAC
4
+
5
+ This tool is designed to detect potential steganography in images.
6
+ It's part of the 2PAC toolkit but focused on security aspects.
7
+
8
+ Author: Richard Young
9
+ License: MIT
10
+ """
11
+
12
+ import os
13
+ import sys
14
+ import argparse
15
+ import concurrent.futures
16
+ import logging
17
+ import tempfile
18
+ import numpy as np
19
+ from pathlib import Path
20
+ from PIL import Image
21
+ import matplotlib.pyplot as plt
22
+ from scipy import stats
23
+ import colorama
24
+ from tqdm import tqdm
25
+
26
+ # Initialize colorama
27
+ colorama.init()
28
+
29
+ # Version
30
+ VERSION = "0.2.0"
31
+
32
+ # Set up logging
33
+ def setup_logging(verbose, no_color=False):
34
+ level = logging.DEBUG if verbose else logging.INFO
35
+
36
+ # Define color codes
37
+ if not no_color:
38
+ # Color scheme
39
+ COLORS = {
40
+ 'DEBUG': colorama.Fore.CYAN,
41
+ 'INFO': colorama.Fore.GREEN,
42
+ 'WARNING': colorama.Fore.YELLOW,
43
+ 'ERROR': colorama.Fore.RED,
44
+ 'CRITICAL': colorama.Fore.MAGENTA + colorama.Style.BRIGHT,
45
+ 'RESET': colorama.Style.RESET_ALL
46
+ }
47
+
48
+ # Custom formatter with colors
49
+ class ColoredFormatter(logging.Formatter):
50
+ def format(self, record):
51
+ levelname = record.levelname
52
+ if levelname in COLORS:
53
+ record.levelname = f"{COLORS[levelname]}{levelname}{COLORS['RESET']}"
54
+ record.msg = f"{COLORS[levelname]}{record.msg}{COLORS['RESET']}"
55
+ return super().format(record)
56
+
57
+ formatter = ColoredFormatter('%(asctime)s - %(levelname)s - %(message)s')
58
+ else:
59
+ formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
60
+
61
+ handler = logging.StreamHandler()
62
+ handler.setFormatter(formatter)
63
+
64
+ logging.basicConfig(
65
+ level=level,
66
+ handlers=[handler]
67
+ )
68
+
69
+ def print_banner():
70
+ """Print RAT Finder themed ASCII art banner"""
71
+ banner = r"""
72
+ ██████╗ █████╗ ████████╗ ███████╗██╗███╗ ██╗██████╗ ███████╗██████╗
73
+ ██╔══██╗██╔══██╗╚══██╔══╝ ██╔════╝██║████╗ ██║██╔══██╗██╔════╝██╔══██╗
74
+ ██████╔╝███████║ ██║█████╗█████╗ ██║██╔██╗ ██║██║ ██║█████╗ ██████╔╝
75
+ ██╔══██╗██╔══██║ ██║╚════╝██╔══╝ ██║██║╚██╗██║██║ ██║██╔══╝ ██╔══██╗
76
+ ██║ ██║██║ ██║ ██║ ██║ ██║██║ ╚████║██████╔╝███████╗██║ ██║
77
+ ╚═╝ ╚═╝╚═╝ ╚═╝ ╚═╝ ╚═╝ ╚═╝╚═╝ ╚═══╝╚═════╝ ╚══════╝╚═╝ ╚═╝
78
+ ╔═══════════════════════════════════════════════════════════════════════╗
79
+ ║ Steganography Detection Tool (v0.2.0) - Part of the 2PAC toolkit ║
80
+ ║ "What the eyes see and the ears hear, the mind believes" ║
81
+ ╚═══════════════════════════════════════════════════════════════════════╝
82
+ """
83
+
84
+ if 'colorama' in sys.modules:
85
+ banner_lines = banner.strip().split('\n')
86
+ colored_banner = []
87
+
88
+ # Color the RAT part in red, the FINDER part in blue
89
+ for i, line in enumerate(banner_lines):
90
+ if i < 6: # The logo lines
91
+ # Add the RAT part in red
92
+ part1 = line[:24]
93
+ # Add the FINDER part in blue
94
+ part2 = line[24:]
95
+ colored_line = f"{colorama.Fore.RED}{part1}{colorama.Fore.BLUE}{part2}{colorama.Style.RESET_ALL}"
96
+ colored_banner.append(colored_line)
97
+ elif i >= 6 and i <= 9: # The box with text
98
+ colored_banner.append(f"{colorama.Fore.YELLOW}{line}{colorama.Style.RESET_ALL}")
99
+ else:
100
+ colored_banner.append(f"{colorama.Fore.WHITE}{line}{colorama.Style.RESET_ALL}")
101
+
102
+ print('\n'.join(colored_banner))
103
+ else:
104
+ print(banner)
105
+ print()
106
+
107
+ #------------------------------------------------------------------------------
108
+ # STEGANOGRAPHY DETECTION TECHNIQUES
109
+ #------------------------------------------------------------------------------
110
+
111
+ def perform_ela_analysis(image_path, quality=75):
112
+ """
113
+ Performs Error Level Analysis (ELA) to detect manipulated areas in an image.
114
+
115
+ ELA works by intentionally resaving an image at a known quality level and
116
+ analyzing the differences between the original and resaved versions.
117
+ Areas that have been manipulated often show up as having different error levels.
118
+
119
+ Args:
120
+ image_path: Path to the image
121
+ quality: JPEG quality level to use for recompression (default: 75)
122
+
123
+ Returns:
124
+ (is_suspicious, confidence, details)
125
+ """
126
+ try:
127
+ # Only perform ELA on JPEG images
128
+ if not image_path.lower().endswith(('.jpg', '.jpeg', '.jfif')):
129
+ return False, 0, {"error": "ELA is only effective for JPEG images"}
130
+
131
+ with Image.open(image_path) as original_img:
132
+ # Convert to RGB if needed
133
+ if original_img.mode != 'RGB':
134
+ original_img = original_img.convert('RGB')
135
+
136
+ # Create a temporary file for the resaved image
137
+ temp_file = tempfile.NamedTemporaryFile(suffix='.jpg', delete=True)
138
+ resaved_path = temp_file.name
139
+
140
+ # Save the image with the specified quality
141
+ original_img.save(resaved_path, quality=quality)
142
+
143
+ # Read the resaved image
144
+ with Image.open(resaved_path) as resaved_img:
145
+ # Convert both to numpy arrays
146
+ original_array = np.array(original_img).astype('int32')
147
+ resaved_array = np.array(resaved_img).astype('int32')
148
+
149
+ # Calculate absolute difference
150
+ diff = np.abs(original_array - resaved_array)
151
+
152
+ # Calculate statistics from the difference
153
+ mean_diff = np.mean(diff)
154
+ std_diff = np.std(diff)
155
+ max_diff = np.max(diff)
156
+
157
+ # Scale the differences to make them more visible (for visualization)
158
+ diff_scaled = diff * 10
159
+
160
+ # Look for suspicious patterns
161
+ # 1. High variance in error levels can indicate manipulation
162
+ # 2. Localized areas with significantly different error levels are suspicious
163
+ # 3. Unnaturally low error in complex areas can indicate steganography
164
+
165
+ # Calculate local variation using sliding window approach
166
+ # We're looking for areas where the difference between neighboring pixels
167
+ # has unusually high or low variance
168
+
169
+ # Use a simple method - check variance in blocks
170
+ block_size = 8 # 8x8 blocks, common in JPEG
171
+ shape = diff.shape
172
+ block_variance = []
173
+
174
+ # Sample blocks throughout the image
175
+ for i in range(0, shape[0] - block_size, block_size):
176
+ for j in range(0, shape[1] - block_size, block_size):
177
+ # Extract block for each channel
178
+ for c in range(3): # RGB channels
179
+ block = diff[i:i+block_size, j:j+block_size, c]
180
+ block_var = np.var(block)
181
+ if block_var > 0: # Avoid divisions by zero
182
+ block_variance.append(block_var)
183
+
184
+ if not block_variance:
185
+ return False, 0, {"error": "Could not calculate block variance"}
186
+
187
+ # Calculate statistics on block variances
188
+ mean_block_var = np.mean(block_variance)
189
+ max_block_var = np.max(block_variance)
190
+ std_block_var = np.std(block_variance)
191
+
192
+ # What we're looking for:
193
+ # 1. Unusually high block variance in some areas (significantly above the mean)
194
+ # 2. Unusually consistent error levels (too perfect - could indicate manipulation)
195
+
196
+ # Determine suspiciousness based on these factors
197
+ # Calculate a normalized ratio of max variance to mean variance
198
+ if mean_block_var > 0:
199
+ var_ratio = max_block_var / mean_block_var
200
+ else:
201
+ var_ratio = 0
202
+
203
+ # Calculate coefficient of variation for block variances
204
+ if mean_block_var > 0:
205
+ coeff_var = std_block_var / mean_block_var
206
+ else:
207
+ coeff_var = 0
208
+
209
+ # Heuristics based on ELA characteristics
210
+ # Unusually high variation ratio can indicate manipulation
211
+ is_suspicious_var_ratio = var_ratio > 50
212
+
213
+ # High coefficient of variation indicates inconsistent error levels
214
+ is_suspicious_coeff_var = coeff_var > 2.0
215
+
216
+ # Unusually high mean difference can indicate manipulation
217
+ is_suspicious_mean_diff = mean_diff > 15
218
+
219
+ # Combine factors
220
+ is_suspicious = (is_suspicious_var_ratio or
221
+ is_suspicious_coeff_var or
222
+ is_suspicious_mean_diff)
223
+
224
+ # Calculate confidence based on these factors
225
+ confidence = 0
226
+ if is_suspicious_var_ratio:
227
+ # Scale based on how extreme the ratio is
228
+ confidence += min(40, var_ratio / 2)
229
+ if is_suspicious_coeff_var:
230
+ # Scale based on coefficient of variation
231
+ confidence += min(30, coeff_var * 10)
232
+ if is_suspicious_mean_diff:
233
+ # Scale based on mean difference
234
+ confidence += min(30, mean_diff)
235
+
236
+ # Cap confidence at 90%
237
+ confidence = min(confidence, 90)
238
+
239
+ # Save results for return
240
+ details = {
241
+ "mean_diff": float(mean_diff),
242
+ "max_diff": float(max_diff),
243
+ "var_ratio": float(var_ratio),
244
+ "coeff_var": float(coeff_var),
245
+ "diff_image": diff_scaled.astype(np.uint8), # For visualization
246
+ "quality_used": quality
247
+ }
248
+
249
+ return is_suspicious, confidence, details
250
+
251
+ except Exception as e:
252
+ logging.debug(f"Error performing ELA on {image_path}: {str(e)}")
253
+ return False, 0, {"error": str(e)}
254
+
255
+ def check_lsb_anomalies(image_path, threshold=0.03):
256
+ """
257
+ Detect potential LSB steganography by analyzing bit plane patterns.
258
+
259
+ Args:
260
+ image_path: Path to the image
261
+ threshold: Threshold for statistical anomaly detection
262
+
263
+ Returns:
264
+ (is_suspicious, confidence, details)
265
+ """
266
+ try:
267
+ with Image.open(image_path) as img:
268
+ # Convert to RGB
269
+ if img.mode != 'RGB':
270
+ img = img.convert('RGB')
271
+
272
+ # Get image data as numpy array
273
+ img_array = np.array(img)
274
+
275
+ # Extract least significant bits from each channel
276
+ red_lsb = img_array[:,:,0] % 2
277
+ green_lsb = img_array[:,:,1] % 2
278
+ blue_lsb = img_array[:,:,2] % 2
279
+
280
+ # Calculate statistics
281
+ # Chi-square test to detect non-random patterns in LSB
282
+ red_chi = stats.chisquare(np.bincount(red_lsb.flatten()))[1]
283
+ green_chi = stats.chisquare(np.bincount(green_lsb.flatten()))[1]
284
+ blue_chi = stats.chisquare(np.bincount(blue_lsb.flatten()))[1]
285
+
286
+ # Calculate entropy of the LSB plane
287
+ red_entropy = stats.entropy(np.bincount(red_lsb.flatten()))
288
+ green_entropy = stats.entropy(np.bincount(green_lsb.flatten()))
289
+ blue_entropy = stats.entropy(np.bincount(blue_lsb.flatten()))
290
+
291
+ # Suspicious if chi-square test shows non-random distribution
292
+ # or if entropy is too high (close to 1 for random, lower for non-random)
293
+ chi_suspicious = min(red_chi, green_chi, blue_chi) < threshold
294
+ entropy_suspicious = abs(np.mean([red_entropy, green_entropy, blue_entropy]) - 1.0) > 0.1
295
+
296
+ # Calculate a confidence score (0-100%)
297
+ confidence = 0
298
+ if chi_suspicious:
299
+ confidence += 50
300
+ if entropy_suspicious:
301
+ confidence += 30
302
+
303
+ # Additional checks for common LSB steganography patterns
304
+ # Check for abnormal color distributions
305
+ color_distribution = np.std([np.std(red_lsb), np.std(green_lsb), np.std(blue_lsb)])
306
+ if color_distribution < 0.1: # Suspicious if too uniform
307
+ confidence += 20
308
+
309
+ is_suspicious = confidence > 50
310
+
311
+ details = {
312
+ "chi_square_values": [red_chi, green_chi, blue_chi],
313
+ "entropy_values": [red_entropy, green_entropy, blue_entropy],
314
+ "color_distribution": color_distribution
315
+ }
316
+
317
+ return is_suspicious, confidence, details
318
+ except Exception as e:
319
+ logging.debug(f"Error analyzing LSB in {image_path}: {str(e)}")
320
+ return False, 0, {"error": str(e)}
321
+
322
+ def check_file_size_anomalies(image_path):
323
+ """
324
+ Check if file size is suspicious compared to image dimensions.
325
+
326
+ Args:
327
+ image_path: Path to the image
328
+
329
+ Returns:
330
+ (is_suspicious, confidence, details)
331
+ """
332
+ try:
333
+ # Get file size
334
+ file_size = os.path.getsize(image_path)
335
+
336
+ with Image.open(image_path) as img:
337
+ width, height = img.size
338
+ pixel_count = width * height
339
+
340
+ # Calculate expected file size range based on image type
341
+ expected_size = 0
342
+ if image_path.lower().endswith('.png'):
343
+ # PNG files have variable compression but generally follow a pattern
344
+ # This is a very rough estimate
345
+ expected_min = pixel_count * 0.1 # Minimum expected size
346
+ expected_max = pixel_count * 3 # Maximum expected size
347
+ elif image_path.lower().endswith(('.jpg', '.jpeg')):
348
+ # JPEG files are typically smaller due to compression
349
+ expected_min = pixel_count * 0.05 # Minimum for very compressed JPEG
350
+ expected_max = pixel_count * 1.5 # Maximum for high quality JPEG
351
+ else:
352
+ # For other formats, use a more generic range
353
+ expected_min = pixel_count * 0.1
354
+ expected_max = pixel_count * 4
355
+
356
+ # Check if file size is within expected range
357
+ is_too_small = file_size < expected_min
358
+ is_too_large = file_size > expected_max
359
+ is_suspicious = is_too_small or is_too_large
360
+
361
+ # Calculate confidence
362
+ confidence = 0
363
+ if is_too_large:
364
+ # More likely to contain hidden data if too large
365
+ ratio = file_size / expected_max
366
+ confidence = min(int((ratio - 1) * 100), 90) # Cap at 90%
367
+ elif is_too_small:
368
+ # Less likely but still suspicious if too small
369
+ ratio = expected_min / file_size
370
+ confidence = min(int((ratio - 1) * 50), 70) # Cap at 70%
371
+
372
+ details = {
373
+ "file_size": file_size,
374
+ "expected_min": expected_min,
375
+ "expected_max": expected_max,
376
+ "pixel_count": pixel_count,
377
+ "width": width,
378
+ "height": height
379
+ }
380
+
381
+ return is_suspicious, confidence, details
382
+ except Exception as e:
383
+ logging.debug(f"Error analyzing file size in {image_path}: {str(e)}")
384
+ return False, 0, {"error": str(e)}
385
+
386
+ def check_histogram_anomalies(image_path):
387
+ """
388
+ Analyze image histogram for unusual patterns that might indicate steganography.
389
+
390
+ Args:
391
+ image_path: Path to the image
392
+
393
+ Returns:
394
+ (is_suspicious, confidence, details)
395
+ """
396
+ try:
397
+ with Image.open(image_path) as img:
398
+ # Convert to RGB
399
+ if img.mode != 'RGB':
400
+ img = img.convert('RGB')
401
+
402
+ # Get image data as numpy array
403
+ img_array = np.array(img)
404
+
405
+ # Calculate histograms for each color channel
406
+ hist_r = np.histogram(img_array[:,:,0], bins=256, range=(0, 256))[0]
407
+ hist_g = np.histogram(img_array[:,:,1], bins=256, range=(0, 256))[0]
408
+ hist_b = np.histogram(img_array[:,:,2], bins=256, range=(0, 256))[0]
409
+
410
+ # Normalize histograms
411
+ pixel_count = img_array.shape[0] * img_array.shape[1]
412
+ hist_r = hist_r / pixel_count
413
+ hist_g = hist_g / pixel_count
414
+ hist_b = hist_b / pixel_count
415
+
416
+ # Analyze histogram characteristics
417
+ # 1. Check for comb patterns (alternating peaks/valleys) which can indicate LSB steganography
418
+ comb_pattern_r = np.sum(np.abs(np.diff(np.diff(hist_r))))
419
+ comb_pattern_g = np.sum(np.abs(np.diff(np.diff(hist_g))))
420
+ comb_pattern_b = np.sum(np.abs(np.diff(np.diff(hist_b))))
421
+
422
+ # 2. Check for unusual peaks at specific values
423
+ # LSB steganography often causes unusual spikes at even or odd values
424
+ even_odd_ratio_r = np.sum(hist_r[::2]) / np.sum(hist_r[1::2]) if np.sum(hist_r[1::2]) > 0 else 1
425
+ even_odd_ratio_g = np.sum(hist_g[::2]) / np.sum(hist_g[1::2]) if np.sum(hist_g[1::2]) > 0 else 1
426
+ even_odd_ratio_b = np.sum(hist_b[::2]) / np.sum(hist_b[1::2]) if np.sum(hist_b[1::2]) > 0 else 1
427
+
428
+ # Calculate an evenness score - how far from 1.0 (perfect balance) are we?
429
+ even_odd_deviation = max(
430
+ abs(even_odd_ratio_r - 1.0),
431
+ abs(even_odd_ratio_g - 1.0),
432
+ abs(even_odd_ratio_b - 1.0)
433
+ )
434
+
435
+ # 3. Calculate histogram smoothness (natural images tend to have smoother histograms)
436
+ smoothness_r = np.mean(np.abs(np.diff(hist_r)))
437
+ smoothness_g = np.mean(np.abs(np.diff(hist_g)))
438
+ smoothness_b = np.mean(np.abs(np.diff(hist_b)))
439
+
440
+ # Suspicious if large even/odd ratio deviation or high comb pattern values
441
+ is_suspicious_comb = max(comb_pattern_r, comb_pattern_g, comb_pattern_b) > 0.015
442
+ is_suspicious_even_odd = even_odd_deviation > 0.1
443
+ is_suspicious_smoothness = max(smoothness_r, smoothness_g, smoothness_b) > 0.01
444
+
445
+ is_suspicious = is_suspicious_comb or is_suspicious_even_odd or is_suspicious_smoothness
446
+
447
+ # Calculate confidence
448
+ confidence = 0
449
+ if is_suspicious_comb:
450
+ confidence += 30
451
+ if is_suspicious_even_odd:
452
+ confidence += 40
453
+ if is_suspicious_smoothness:
454
+ confidence += 20
455
+
456
+ # Cap confidence at 90%
457
+ confidence = min(confidence, 90)
458
+
459
+ details = {
460
+ "comb_pattern_values": [comb_pattern_r, comb_pattern_g, comb_pattern_b],
461
+ "even_odd_ratios": [even_odd_ratio_r, even_odd_ratio_g, even_odd_ratio_b],
462
+ "smoothness_values": [smoothness_r, smoothness_g, smoothness_b],
463
+ "even_odd_deviation": even_odd_deviation
464
+ }
465
+
466
+ return is_suspicious, confidence, details
467
+ except Exception as e:
468
+ logging.debug(f"Error analyzing histogram in {image_path}: {str(e)}")
469
+ return False, 0, {"error": str(e)}
470
+
471
+ def check_metadata_anomalies(image_path):
472
+ """
473
+ Look for unusual metadata or metadata inconsistencies that could indicate steganography.
474
+
475
+ Args:
476
+ image_path: Path to the image
477
+
478
+ Returns:
479
+ (is_suspicious, confidence, details)
480
+ """
481
+ try:
482
+ with Image.open(image_path) as img:
483
+ # Extract metadata (EXIF, etc)
484
+ metadata = {}
485
+ if hasattr(img, '_getexif') and img._getexif() is not None:
486
+ metadata = {k: v for k, v in img._getexif().items()}
487
+
488
+ # Check for known steganography software markers
489
+ steganography_markers = [
490
+ 'outguess', 'stegano', 'steghide', 'jsteg', 'f5', 'secret',
491
+ 'hidden', 'conceal', 'invisible', 'steganography'
492
+ ]
493
+
494
+ found_markers = []
495
+ for key, value in metadata.items():
496
+ if isinstance(value, str):
497
+ value_lower = value.lower()
498
+ for marker in steganography_markers:
499
+ if marker in value_lower:
500
+ found_markers.append((key, marker, value))
501
+
502
+ # Check for unusual metadata structure
503
+ is_suspicious = len(found_markers) > 0
504
+ confidence = min(len(found_markers) * 30, 90) if is_suspicious else 0
505
+
506
+ # Check for metadata size anomalies
507
+ if len(metadata) > 30: # Unusually large metadata
508
+ is_suspicious = True
509
+ confidence = max(confidence, 50)
510
+
511
+ details = {
512
+ "metadata_count": len(metadata),
513
+ "suspicious_markers": found_markers
514
+ }
515
+
516
+ return is_suspicious, confidence, details
517
+ except Exception as e:
518
+ logging.debug(f"Error analyzing metadata in {image_path}: {str(e)}")
519
+ return False, 0, {"error": str(e)}
520
+
521
+ def check_trailing_data(image_path):
522
+ """Detect suspicious data appended after the official end markers."""
523
+ try:
524
+ with open(image_path, 'rb') as f:
525
+ data = f.read()
526
+
527
+ appended_bytes = 0
528
+ lower = image_path.lower()
529
+
530
+ if lower.endswith(('.jpg', '.jpeg', '.jfif')):
531
+ marker = data.rfind(b'\xFF\xD9')
532
+ if marker != -1 and marker < len(data) - 2:
533
+ appended_bytes = len(data) - marker - 2
534
+ elif lower.endswith('.png'):
535
+ marker = data.rfind(b'\x00\x00\x00\x00IEND\xAEB\x60\x82')
536
+ if marker != -1 and marker < len(data) - 12:
537
+ appended_bytes = len(data) - marker - 12
538
+ else:
539
+ return False, 0, {"error": "unsupported format"}
540
+
541
+ is_suspicious = appended_bytes > 0
542
+ confidence = 0
543
+ if is_suspicious:
544
+ ratio = appended_bytes / len(data)
545
+ confidence = min(95, 50 + int(ratio * 500))
546
+
547
+ details = {
548
+ "appended_bytes": appended_bytes
549
+ }
550
+
551
+ return is_suspicious, confidence, details
552
+ except Exception as e:
553
+ logging.debug(f"Error analyzing trailing data in {image_path}: {str(e)}")
554
+ return False, 0, {"error": str(e)}
555
+
556
+ def check_visual_noise_anomalies(image_path):
557
+ """
558
+ Analyze visual noise patterns to detect potential steganography.
559
+
560
+ Args:
561
+ image_path: Path to the image
562
+
563
+ Returns:
564
+ (is_suspicious, confidence, details)
565
+ """
566
+ try:
567
+ with Image.open(image_path) as img:
568
+ # Convert to RGB
569
+ if img.mode != 'RGB':
570
+ img = img.convert('RGB')
571
+
572
+ # Resize if image is too large for faster processing
573
+ width, height = img.size
574
+ if width > 1000 or height > 1000:
575
+ ratio = min(1000 / width, 1000 / height)
576
+ new_width = int(width * ratio)
577
+ new_height = int(height * ratio)
578
+ img = img.resize((new_width, new_height))
579
+
580
+ # Get image data as numpy array
581
+ img_array = np.array(img)
582
+
583
+ # Apply noise detection
584
+ # Calculate noise in each channel by looking at differences between adjacent pixels
585
+ red_noise = np.mean(np.abs(np.diff(img_array[:,:,0], axis=0))) + np.mean(np.abs(np.diff(img_array[:,:,0], axis=1)))
586
+ green_noise = np.mean(np.abs(np.diff(img_array[:,:,1], axis=0))) + np.mean(np.abs(np.diff(img_array[:,:,1], axis=1)))
587
+ blue_noise = np.mean(np.abs(np.diff(img_array[:,:,2], axis=0))) + np.mean(np.abs(np.diff(img_array[:,:,2], axis=1)))
588
+
589
+ # Calculate noise ratio between channels
590
+ # In natural images, noise should be roughly similar across channels
591
+ # Large differences might indicate steganographic content
592
+ avg_noise = (red_noise + green_noise + blue_noise) / 3
593
+ noise_diffs = [abs(red_noise - avg_noise), abs(green_noise - avg_noise), abs(blue_noise - avg_noise)]
594
+ max_diff_ratio = max(noise_diffs) / avg_noise if avg_noise > 0 else 0
595
+
596
+ # Suspicious if significant differences between channels
597
+ is_suspicious = max_diff_ratio > 0.2
598
+ confidence = min(int(max_diff_ratio * 100), 90) if is_suspicious else 0
599
+
600
+ details = {
601
+ "red_noise": red_noise,
602
+ "green_noise": green_noise,
603
+ "blue_noise": blue_noise,
604
+ "max_diff_ratio": max_diff_ratio
605
+ }
606
+
607
+ return is_suspicious, confidence, details
608
+ except Exception as e:
609
+ logging.debug(f"Error analyzing visual noise in {image_path}: {str(e)}")
610
+ return False, 0, {"error": str(e)}
611
+
612
+ def analyze_image(image_path, sensitivity='medium'):
613
+ """
614
+ Perform comprehensive steganography detection on an image.
615
+
616
+ Args:
617
+ image_path: Path to the image
618
+ sensitivity: 'low', 'medium', or 'high'
619
+
620
+ Returns:
621
+ (is_suspicious, overall_confidence, detection_details)
622
+ """
623
+ # Set threshold based on sensitivity
624
+ thresholds = {
625
+ 'low': 0.01, # More likely to find steganography but more false positives
626
+ 'medium': 0.03, # Balanced detection
627
+ 'high': 0.05 # Fewer false positives but might miss some steganography
628
+ }
629
+
630
+ confidence_required = {
631
+ 'low': 60, # Lower bar for detection
632
+ 'medium': 70, # Moderate confidence required
633
+ 'high': 80 # High confidence required to report
634
+ }
635
+
636
+ threshold = thresholds.get(sensitivity, 0.03)
637
+ min_confidence = confidence_required.get(sensitivity, 70)
638
+
639
+ try:
640
+ results = {}
641
+
642
+ # Run all detection methods
643
+ lsb_result = check_lsb_anomalies(image_path, threshold)
644
+ results['lsb_analysis'] = {
645
+ 'suspicious': lsb_result[0],
646
+ 'confidence': lsb_result[1],
647
+ 'details': lsb_result[2]
648
+ }
649
+
650
+ size_result = check_file_size_anomalies(image_path)
651
+ results['file_size_analysis'] = {
652
+ 'suspicious': size_result[0],
653
+ 'confidence': size_result[1],
654
+ 'details': size_result[2]
655
+ }
656
+
657
+ metadata_result = check_metadata_anomalies(image_path)
658
+ results['metadata_analysis'] = {
659
+ 'suspicious': metadata_result[0],
660
+ 'confidence': metadata_result[1],
661
+ 'details': metadata_result[2]
662
+ }
663
+
664
+ trailing_result = check_trailing_data(image_path)
665
+ results['trailing_data_analysis'] = {
666
+ 'suspicious': trailing_result[0],
667
+ 'confidence': trailing_result[1],
668
+ 'details': trailing_result[2]
669
+ }
670
+
671
+ noise_result = check_visual_noise_anomalies(image_path)
672
+ results['visual_noise_analysis'] = {
673
+ 'suspicious': noise_result[0],
674
+ 'confidence': noise_result[1],
675
+ 'details': noise_result[2]
676
+ }
677
+
678
+ # Add the new histogram analysis
679
+ histogram_result = check_histogram_anomalies(image_path)
680
+ results['histogram_analysis'] = {
681
+ 'suspicious': histogram_result[0],
682
+ 'confidence': histogram_result[1],
683
+ 'details': histogram_result[2]
684
+ }
685
+
686
+ # Add Error Level Analysis (ELA) for JPEG images
687
+ if image_path.lower().endswith(('.jpg', '.jpeg', '.jfif')):
688
+ ela_result = perform_ela_analysis(image_path)
689
+ results['ela_analysis'] = {
690
+ 'suspicious': ela_result[0],
691
+ 'confidence': ela_result[1],
692
+ 'details': ela_result[2]
693
+ }
694
+
695
+ # Calculate overall confidence
696
+ # Weight the different tests
697
+ weights = {
698
+ 'lsb_analysis': 0.25, # LSB is a common technique
699
+ 'histogram_analysis': 0.20, # Histogram patterns are strong indicators
700
+ 'file_size_analysis': 0.10, # Size can be indicative
701
+ 'metadata_analysis': 0.10, # Metadata less common but useful indicator
702
+ 'trailing_data_analysis': 0.10, # Detects data after EOF markers
703
+ 'visual_noise_analysis': 0.15, # Visual noise can be a good indicator
704
+ 'ela_analysis': 0.20 # Error Level Analysis is effective for JPEG manipulation
705
+ }
706
+
707
+ # Only include weights for methods that were actually run
708
+ used_weights = {k: v for k, v in weights.items() if k in results}
709
+
710
+ # Normalize the weights to ensure they sum to 1.0
711
+ weight_sum = sum(used_weights.values())
712
+ if weight_sum > 0:
713
+ used_weights = {k: v/weight_sum for k, v in used_weights.items()}
714
+
715
+ # Calculate weighted confidence
716
+ overall_confidence = sum(
717
+ results[key]['confidence'] * used_weights[key] for key in used_weights
718
+ )
719
+
720
+ # Determine if image is suspicious overall
721
+ is_suspicious = overall_confidence >= min_confidence
722
+
723
+ return is_suspicious, overall_confidence, results
724
+ except Exception as e:
725
+ logging.debug(f"Error analyzing {image_path}: {str(e)}")
726
+ return False, 0, {"error": str(e)}
727
+
728
+ def process_file(args):
729
+ """Process a single image file."""
730
+ image_path, sensitivity, output_dir = args
731
+
732
+ try:
733
+ is_suspicious, confidence, details = analyze_image(image_path, sensitivity)
734
+
735
+ result = {
736
+ 'path': image_path,
737
+ 'suspicious': is_suspicious,
738
+ 'confidence': confidence,
739
+ 'details': details
740
+ }
741
+
742
+ # Create visual report if output directory is specified
743
+ if output_dir and is_suspicious:
744
+ create_visual_report(image_path, confidence, details, output_dir)
745
+
746
+ return result
747
+ except Exception as e:
748
+ logging.debug(f"Error processing {image_path}: {str(e)}")
749
+ return {
750
+ 'path': image_path,
751
+ 'suspicious': False,
752
+ 'confidence': 0,
753
+ 'details': {'error': str(e)}
754
+ }
755
+
756
+ def create_visual_report(image_path, confidence, details, output_dir):
757
+ """
758
+ Create a visual report showing the analysis of a suspicious image.
759
+
760
+ Args:
761
+ image_path: Path to the analyzed image
762
+ confidence: Detection confidence
763
+ details: Analysis details
764
+ output_dir: Directory to save report
765
+ """
766
+ try:
767
+ # Create output directory if it doesn't exist
768
+ os.makedirs(output_dir, exist_ok=True)
769
+
770
+ # Create a figure with 3x3 subplots to accommodate ELA visualization
771
+ fig, axs = plt.subplots(3, 3, figsize=(15, 15))
772
+ fig.suptitle(f"Steganography Analysis: {os.path.basename(image_path)}\nConfidence: {confidence:.1f}%", fontsize=16)
773
+
774
+ # Original image
775
+ with Image.open(image_path) as img:
776
+ axs[0, 0].imshow(img)
777
+ axs[0, 0].set_title("Original Image")
778
+ axs[0, 0].axis('off')
779
+
780
+ # LSB visualization
781
+ img_array = np.array(img.convert('RGB'))
782
+ lsb_img = np.zeros_like(img_array)
783
+
784
+ # Amplify LSB data by 255 for visibility
785
+ lsb_img[:,:,0] = (img_array[:,:,0] % 2) * 255
786
+ lsb_img[:,:,1] = (img_array[:,:,1] % 2) * 255
787
+ lsb_img[:,:,2] = (img_array[:,:,2] % 2) * 255
788
+
789
+ axs[0, 1].imshow(lsb_img)
790
+ axs[0, 1].set_title("LSB Visualization")
791
+ axs[0, 1].axis('off')
792
+
793
+ # ELA visualization (NEW)
794
+ if 'ela_analysis' in details and 'details' in details['ela_analysis']:
795
+ ela_data = details['ela_analysis']['details']
796
+ if 'diff_image' in ela_data and not isinstance(ela_data.get('error', ''), str):
797
+ # Display the ELA image
798
+ axs[0, 2].imshow(ela_data['diff_image'])
799
+ axs[0, 2].set_title("Error Level Analysis (ELA)")
800
+ axs[0, 2].axis('off')
801
+
802
+ # Add annotation with key metrics
803
+ metrics = []
804
+ if 'var_ratio' in ela_data:
805
+ metrics.append(f"Variance ratio: {ela_data['var_ratio']:.2f}")
806
+ if 'coeff_var' in ela_data:
807
+ metrics.append(f"Coefficient of var: {ela_data['coeff_var']:.2f}")
808
+ if 'mean_diff' in ela_data:
809
+ metrics.append(f"Mean diff: {ela_data['mean_diff']:.2f}")
810
+
811
+ if metrics:
812
+ axs[0, 2].text(0.05, 0.05, "\n".join(metrics), transform=axs[0, 2].transAxes,
813
+ fontsize=9, verticalalignment='bottom',
814
+ bbox=dict(boxstyle='round,pad=0.5',
815
+ facecolor='white', alpha=0.7))
816
+ else:
817
+ axs[0, 2].text(0.5, 0.5, "ELA data not available",
818
+ horizontalalignment='center', verticalalignment='center')
819
+ axs[0, 2].axis('off')
820
+ else:
821
+ axs[0, 2].text(0.5, 0.5, "ELA analysis not available",
822
+ horizontalalignment='center', verticalalignment='center')
823
+ axs[0, 2].axis('off')
824
+
825
+ # Histogram visualization
826
+ if 'histogram_analysis' in details:
827
+ # Create histograms for each channel
828
+ hist_r = np.histogram(img_array[:,:,0], bins=256, range=(0, 256))[0]
829
+ hist_g = np.histogram(img_array[:,:,1], bins=256, range=(0, 256))[0]
830
+ hist_b = np.histogram(img_array[:,:,2], bins=256, range=(0, 256))[0]
831
+
832
+ # Plot the histograms
833
+ bin_edges = np.arange(0, 257)
834
+ axs[1, 0].plot(bin_edges[:-1], hist_r, color='red', alpha=0.7)
835
+ axs[1, 0].plot(bin_edges[:-1], hist_g, color='green', alpha=0.7)
836
+ axs[1, 0].plot(bin_edges[:-1], hist_b, color='blue', alpha=0.7)
837
+ axs[1, 0].set_title("Color Channel Histograms")
838
+ axs[1, 0].set_xlabel("Pixel Value")
839
+ axs[1, 0].set_ylabel("Frequency")
840
+ axs[1, 0].legend(['Red', 'Green', 'Blue'])
841
+
842
+ # Show odd/even distribution analysis
843
+ histogram_data = details['histogram_analysis']['details']
844
+
845
+ # Get even/odd ratio values
846
+ if 'even_odd_ratios' in histogram_data:
847
+ even_odd_ratios = histogram_data['even_odd_ratios']
848
+
849
+ # Plot as bar chart
850
+ axs[1, 1].bar(['Red', 'Green', 'Blue'], even_odd_ratios,
851
+ color=['red', 'green', 'blue'], alpha=0.7)
852
+ axs[1, 1].axhline(y=1.0, linestyle='--', color='gray')
853
+ axs[1, 1].set_title("Even/Odd Value Ratios")
854
+ axs[1, 1].set_ylabel("Ratio (1.0 = balanced)")
855
+
856
+ # Annotate with explanatory text
857
+ deviation = histogram_data.get('even_odd_deviation', 0)
858
+ assessment = "SUSPICIOUS" if deviation > 0.1 else "NORMAL"
859
+ axs[1, 1].annotate(f"Deviation: {deviation:.3f}\nAssessment: {assessment}",
860
+ xy=(0.05, 0.05), xycoords='axes fraction')
861
+ else:
862
+ axs[1, 1].text(0.5, 0.5, "Histogram ratio data not available",
863
+ horizontalalignment='center', verticalalignment='center')
864
+ axs[1, 1].axis('off')
865
+ else:
866
+ axs[1, 0].text(0.5, 0.5, "Histogram analysis not available",
867
+ horizontalalignment='center', verticalalignment='center')
868
+ axs[1, 0].axis('off')
869
+ axs[1, 1].axis('off')
870
+
871
+ # Noise visualization
872
+ if 'visual_noise_analysis' in details:
873
+ noise_data = details['visual_noise_analysis']['details']
874
+ noise_values = [noise_data.get('red_noise', 0),
875
+ noise_data.get('green_noise', 0),
876
+ noise_data.get('blue_noise', 0)]
877
+
878
+ axs[1, 2].bar(['Red', 'Green', 'Blue'], noise_values, color=['red', 'green', 'blue'])
879
+ axs[1, 2].set_title("Noise Levels by Channel")
880
+ axs[1, 2].set_ylabel("Noise Level")
881
+ else:
882
+ axs[1, 2].text(0.5, 0.5, "Noise analysis not available",
883
+ horizontalalignment='center', verticalalignment='center')
884
+ axs[1, 2].axis('off')
885
+
886
+ # File size analysis visualization
887
+ if 'file_size_analysis' in details and 'details' in details['file_size_analysis']:
888
+ size_data = details['file_size_analysis']['details']
889
+
890
+ if ('file_size' in size_data and 'expected_min' in size_data
891
+ and 'expected_max' in size_data and 'pixel_count' in size_data):
892
+
893
+ # Create a simple bar chart comparing actual vs expected size
894
+ sizes = [size_data['file_size'],
895
+ size_data['expected_min'],
896
+ size_data['expected_max']]
897
+
898
+ labels = ['Actual Size', 'Min Expected', 'Max Expected']
899
+ colors = ['blue', 'green', 'green']
900
+
901
+ axs[2, 0].bar(labels, sizes, color=colors, alpha=0.7)
902
+ axs[2, 0].set_title("File Size Analysis")
903
+ axs[2, 0].set_ylabel("Size (bytes)")
904
+
905
+ # Format y-axis to show human-readable sizes
906
+ axs[2, 0].get_yaxis().set_major_formatter(
907
+ plt.FuncFormatter(lambda x, loc: f"{x/1024:.1f}KB" if x >= 1024 else f"{x}B"))
908
+
909
+ # Is it suspiciously large?
910
+ is_too_large = size_data['file_size'] > size_data['expected_max']
911
+ is_too_small = size_data['file_size'] < size_data['expected_min']
912
+
913
+ if is_too_large:
914
+ assessment = f"SUSPICIOUS: {(size_data['file_size'] - size_data['expected_max'])/1024:.1f}KB larger than expected"
915
+ elif is_too_small:
916
+ assessment = f"SUSPICIOUS: {(size_data['expected_min'] - size_data['file_size'])/1024:.1f}KB smaller than expected"
917
+ else:
918
+ assessment = "NORMAL: Size within expected range"
919
+
920
+ axs[2, 0].annotate(assessment, xy=(0.05, 0.05), xycoords='axes fraction',
921
+ fontsize=9, verticalalignment='bottom')
922
+
923
+ if 'trailing_data_analysis' in details:
924
+ tdata = details['trailing_data_analysis']['details']
925
+ if tdata.get('appended_bytes', 0) > 0:
926
+ axs[2, 0].annotate(
927
+ f"Appended data: {tdata['appended_bytes']} bytes",
928
+ xy=(0.05, 0.85), xycoords='axes fraction',
929
+ fontsize=9, verticalalignment='bottom',
930
+ color='red'
931
+ )
932
+ else:
933
+ axs[2, 0].text(0.5, 0.5, "Size analysis data not available",
934
+ horizontalalignment='center', verticalalignment='center')
935
+ axs[2, 0].axis('off')
936
+ else:
937
+ axs[2, 0].text(0.5, 0.5, "Size analysis not available",
938
+ horizontalalignment='center', verticalalignment='center')
939
+ axs[2, 0].axis('off')
940
+
941
+ # Metadata analysis visualization
942
+ if 'metadata_analysis' in details and 'details' in details['metadata_analysis']:
943
+ metadata = details['metadata_analysis']['details']
944
+
945
+ metadata_text = f"Total metadata entries: {metadata.get('metadata_count', 0)}\n\n"
946
+
947
+ if 'suspicious_markers' in metadata and metadata['suspicious_markers']:
948
+ metadata_text += "Suspicious markers found:\n"
949
+ for key, marker, value in metadata['suspicious_markers'][:3]: # Show top 3
950
+ metadata_text += f"- '{marker}' in {key}\n"
951
+
952
+ if len(metadata['suspicious_markers']) > 3:
953
+ metadata_text += f"...and {len(metadata['suspicious_markers'])-3} more\n"
954
+ else:
955
+ metadata_text += "No suspicious metadata markers found"
956
+
957
+ axs[2, 1].text(0.1, 0.5, metadata_text, fontsize=10,
958
+ verticalalignment='center', horizontalalignment='left')
959
+ axs[2, 1].set_title("Metadata Analysis")
960
+ axs[2, 1].axis('off')
961
+ else:
962
+ axs[2, 1].text(0.5, 0.5, "Metadata analysis not available",
963
+ horizontalalignment='center', verticalalignment='center')
964
+ axs[2, 1].axis('off')
965
+
966
+ # Overall analysis metrics
967
+ axs[2, 2].axis('off')
968
+ metrics_text = "Detection Confidence by Method:\n\n"
969
+
970
+ for analysis_type, results in details.items():
971
+ if isinstance(results, dict) and 'confidence' in results:
972
+ confidence_value = results['confidence']
973
+ if confidence_value > 70:
974
+ highlight = " 🚨 HIGH"
975
+ elif confidence_value > 40:
976
+ highlight = " ⚠️ MEDIUM"
977
+ else:
978
+ highlight = ""
979
+ metrics_text += f"{analysis_type.replace('_', ' ').title()}: {confidence_value:.1f}%{highlight}\n"
980
+
981
+ axs[2, 2].text(0.1, 0.5, metrics_text, fontsize=10, verticalalignment='center')
982
+ axs[2, 2].set_title("Overall Analysis Results")
983
+
984
+ # Adjust layout
985
+ plt.tight_layout(rect=[0, 0, 1, 0.95])
986
+
987
+ # Save figure
988
+ report_filename = os.path.join(output_dir, f"steganalysis_{os.path.basename(image_path)}.png")
989
+ plt.savefig(report_filename)
990
+ plt.close()
991
+
992
+ logging.debug(f"Created visual report: {report_filename}")
993
+ return report_filename
994
+ except Exception as e:
995
+ logging.debug(f"Error creating visual report for {image_path}: {str(e)}")
996
+ return None
997
+
998
+ def find_image_files(directory, recursive=True):
999
+ """Find all image files in a directory."""
1000
+ image_extensions = ('.jpg', '.jpeg', '.png', '.bmp', '.gif', '.tiff', '.tif', '.webp')
1001
+ image_files = []
1002
+
1003
+ if recursive:
1004
+ for root, _, files in os.walk(directory):
1005
+ for file in files:
1006
+ if file.lower().endswith(image_extensions):
1007
+ image_files.append(os.path.join(root, file))
1008
+ else:
1009
+ for file in os.listdir(directory):
1010
+ if os.path.isfile(os.path.join(directory, file)) and file.lower().endswith(image_extensions):
1011
+ image_files.append(os.path.join(directory, file))
1012
+
1013
+ return image_files
1014
+
1015
+ def analyze_images(directory, sensitivity='medium', recursive=True, output_dir=None, max_workers=None):
1016
+ """
1017
+ Analyze all images in a directory for steganography.
1018
+
1019
+ Args:
1020
+ directory: Directory to scan
1021
+ sensitivity: 'low', 'medium', or 'high'
1022
+ recursive: Whether to scan subdirectories
1023
+ output_dir: Directory to save visual reports
1024
+ max_workers: Number of worker processes
1025
+
1026
+ Returns:
1027
+ List of suspicious image details
1028
+ """
1029
+ # Find all image files
1030
+ image_files = find_image_files(directory, recursive)
1031
+ if not image_files:
1032
+ logging.warning("No image files found!")
1033
+ return []
1034
+
1035
+ logging.info(f"Found {len(image_files)} image files to analyze")
1036
+
1037
+ # Create output directory if specified
1038
+ if output_dir:
1039
+ os.makedirs(output_dir, exist_ok=True)
1040
+ logging.info(f"Visual reports will be saved to: {output_dir}")
1041
+
1042
+ # Prepare input arguments for workers
1043
+ input_args = [(file_path, sensitivity, output_dir) for file_path in image_files]
1044
+
1045
+ suspicious_images = []
1046
+
1047
+ # Process files in parallel
1048
+ with concurrent.futures.ProcessPoolExecutor(max_workers=max_workers) as executor:
1049
+ # Colorful progress bar
1050
+ results = []
1051
+ futures = {executor.submit(process_file, arg): arg[0] for arg in input_args}
1052
+
1053
+ with tqdm(
1054
+ total=len(image_files),
1055
+ desc=f"{colorama.Fore.RED}Analyzing images for steganography{colorama.Style.RESET_ALL}",
1056
+ unit="file",
1057
+ bar_format="{desc}: {percentage:3.0f}%|{bar:30}| {n_fmt}/{total_fmt} [{elapsed}<{remaining}, {rate_fmt}]",
1058
+ colour="red"
1059
+ ) as pbar:
1060
+ for future in concurrent.futures.as_completed(futures):
1061
+ file_path = futures[future]
1062
+ try:
1063
+ result = future.result()
1064
+ results.append(result)
1065
+
1066
+ # Update progress
1067
+ pbar.update(1)
1068
+
1069
+ # Add to suspicious images if applicable
1070
+ if result['suspicious']:
1071
+ suspicious_images.append(result)
1072
+ logging.info(f"Suspicious image found: {file_path} (confidence: {result['confidence']:.1f}%)")
1073
+ except Exception as e:
1074
+ logging.error(f"Error analyzing {file_path}: {str(e)}")
1075
+ pbar.update(1)
1076
+
1077
+ # Sort suspicious images by confidence
1078
+ suspicious_images.sort(key=lambda x: x['confidence'], reverse=True)
1079
+
1080
+ logging.info(f"Analysis complete. Found {len(suspicious_images)} suspicious images")
1081
+ return suspicious_images
1082
+
1083
+ def main():
1084
+ print_banner()
1085
+
1086
+ # Check for 'q' command to quit
1087
+ if len(sys.argv) == 2 and sys.argv[1].lower() == 'q':
1088
+ print(f"{colorama.Fore.YELLOW}Exiting RAT Finder. Stay vigilant!{colorama.Style.RESET_ALL}")
1089
+ sys.exit(0)
1090
+
1091
+ parser = argparse.ArgumentParser(
1092
+ description='RAT Finder: Steganography Detection Tool (v0.2.0)',
1093
+ epilog='Part of the 2PAC toolkit - Created by Richard Young'
1094
+ )
1095
+
1096
+ # Main action
1097
+ parser.add_argument('directory', nargs='?', help='Directory to search for images')
1098
+ parser.add_argument('--check-file', type=str, help='Check a specific file for steganography')
1099
+
1100
+ # Options
1101
+ parser.add_argument('--sensitivity', type=str, choices=['low', 'medium', 'high'], default='medium',
1102
+ help='Set detection sensitivity level (default: medium)')
1103
+ parser.add_argument('--non-recursive', action='store_true', help='Only search in the specified directory, not subdirectories')
1104
+ parser.add_argument('--output', type=str, help='Save list of suspicious files to this file')
1105
+ parser.add_argument('--visual-reports', type=str, help='Directory to save visual analysis reports')
1106
+ parser.add_argument('--workers', type=int, default=None, help='Number of worker processes (default: CPU count)')
1107
+ parser.add_argument('--verbose', '-v', action='store_true', help='Enable verbose logging')
1108
+ parser.add_argument('--no-color', action='store_true', help='Disable colored output')
1109
+ parser.add_argument('--version', action='version', version=f'RAT Finder v{VERSION} by Richard Young')
1110
+
1111
+ args = parser.parse_args()
1112
+
1113
+ # Setup logging
1114
+ setup_logging(args.verbose, args.no_color)
1115
+
1116
+ # Handle specific file check mode
1117
+ if args.check_file:
1118
+ file_path = args.check_file
1119
+ if not os.path.exists(file_path):
1120
+ logging.error(f"Error: File not found: {file_path}")
1121
+ sys.exit(1)
1122
+
1123
+ print(f"\n{colorama.Style.BRIGHT}Analyzing file for steganography: {file_path}{colorama.Style.RESET_ALL}\n")
1124
+
1125
+ is_suspicious, confidence, details = analyze_image(file_path, args.sensitivity)
1126
+
1127
+ # Print results
1128
+ if is_suspicious:
1129
+ print(f"{colorama.Fore.RED}[!] SUSPICIOUS: This image may contain hidden data{colorama.Style.RESET_ALL}")
1130
+ print(f"Confidence: {confidence:.1f}%\n")
1131
+ else:
1132
+ print(f"{colorama.Fore.GREEN}[✓] No steganography detected in this image{colorama.Style.RESET_ALL}")
1133
+ print(f"Confidence: {(100 - confidence):.1f}% clean\n")
1134
+
1135
+ # Details of analysis
1136
+ print(f"{colorama.Fore.CYAN}Detection Details:{colorama.Style.RESET_ALL}")
1137
+
1138
+ for analysis_type, results in details.items():
1139
+ if isinstance(results, dict) and 'confidence' in results:
1140
+ detection_status = f"{colorama.Fore.RED}[DETECTED]" if results['suspicious'] else f"{colorama.Fore.GREEN}[OK]"
1141
+ print(f"{detection_status} {analysis_type.replace('_', ' ').title()}: {results['confidence']:.1f}%{colorama.Style.RESET_ALL}")
1142
+
1143
+ # Print specific findings
1144
+ if 'details' in results and isinstance(results['details'], dict):
1145
+ for key, value in results['details'].items():
1146
+ if key != 'error':
1147
+ print(f" - {key}: {value}")
1148
+
1149
+ # Create visual report if requested
1150
+ if args.visual_reports:
1151
+ report_path = create_visual_report(file_path, confidence, details, args.visual_reports)
1152
+ if report_path:
1153
+ print(f"\n{colorama.Fore.CYAN}Visual report saved to: {report_path}{colorama.Style.RESET_ALL}")
1154
+
1155
+ sys.exit(0)
1156
+
1157
+ # Check if directory is specified
1158
+ if not args.directory:
1159
+ logging.error("Error: You must specify a directory to scan or use --check-file for a specific file")
1160
+ sys.exit(1)
1161
+
1162
+ directory = Path(args.directory)
1163
+
1164
+ # Verify the directory exists
1165
+ if not directory.exists() or not directory.is_dir():
1166
+ logging.error(f"Error: {directory} is not a valid directory")
1167
+ sys.exit(1)
1168
+
1169
+ # Begin analysis
1170
+ logging.info(f"Starting steganography analysis with {args.sensitivity} sensitivity")
1171
+ logging.info(f"Scanning for images in {directory}")
1172
+
1173
+ try:
1174
+ suspicious_images = analyze_images(
1175
+ directory,
1176
+ sensitivity=args.sensitivity,
1177
+ recursive=not args.non_recursive,
1178
+ output_dir=args.visual_reports,
1179
+ max_workers=args.workers
1180
+ )
1181
+
1182
+ # Print summary
1183
+ if suspicious_images:
1184
+ count_str = f"{colorama.Fore.RED}{len(suspicious_images)}{colorama.Style.RESET_ALL}"
1185
+ logging.info(f"Found {count_str} suspicious images that may contain hidden data")
1186
+
1187
+ # Print top findings
1188
+ print("\nTop suspicious images:")
1189
+ for i, result in enumerate(suspicious_images[:10]): # Show top 10
1190
+ confidence_color = colorama.Fore.RED if result['confidence'] > 80 else colorama.Fore.YELLOW
1191
+ print(f"{i+1}. {result['path']} - Confidence: {confidence_color}{result['confidence']:.1f}%{colorama.Style.RESET_ALL}")
1192
+
1193
+ if len(suspicious_images) > 10:
1194
+ print(f"... and {len(suspicious_images) - 10} more")
1195
+ else:
1196
+ logging.info(f"{colorama.Fore.GREEN}No suspicious images found{colorama.Style.RESET_ALL}")
1197
+
1198
+ # Save output if requested
1199
+ if args.output and suspicious_images:
1200
+ with open(args.output, 'w') as f:
1201
+ for result in suspicious_images:
1202
+ f.write(f"{result['path']},{result['confidence']:.1f}\n")
1203
+ logging.info(f"Saved list of suspicious files to {args.output}")
1204
+
1205
+ except KeyboardInterrupt:
1206
+ logging.info("Operation cancelled by user")
1207
+ sys.exit(130)
1208
+ except Exception as e:
1209
+ logging.error(f"Error: {str(e)}")
1210
+ if args.verbose:
1211
+ import traceback
1212
+ traceback.print_exc()
1213
+ sys.exit(1)
1214
+
1215
+ # Add signature at the end
1216
+ if not args.no_color:
1217
+ signature = f"\n{colorama.Fore.RED}RAT Finder v{VERSION} by Richard Young{colorama.Style.RESET_ALL}"
1218
+ tagline = f"{colorama.Fore.YELLOW}\"Uncovering what's hidden in plain sight.\"{colorama.Style.RESET_ALL}"
1219
+ print(signature)
1220
+ print(tagline)
1221
+
1222
+ if __name__ == "__main__":
1223
+ main()
requirements.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ Pillow
2
+ tqdm
3
+ humanize
4
+ colorama
5
+ numpy
6
+ scipy
7
+ matplotlib
8
+ gradio>=4.0.0
steg_embedder.py ADDED
@@ -0,0 +1,337 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ LSB Steganography Embedder for 2PAC
4
+ Hides and extracts data in images using Least Significant Bit technique
5
+ """
6
+
7
+ import io
8
+ import hashlib
9
+ import struct
10
+ from typing import Tuple, Optional
11
+ from PIL import Image
12
+ import numpy as np
13
+
14
+
15
+ class StegEmbedder:
16
+ """
17
+ LSB (Least Significant Bit) Steganography implementation
18
+ Hides data in the least significant bits of image pixels
19
+ """
20
+
21
+ HEADER_SIZE = 12 # 4 bytes for data length + 8 bytes for checksum
22
+ MAGIC_NUMBER = b'2PAC' # Signature to identify embedded data
23
+
24
+ def __init__(self):
25
+ self.last_capacity = 0
26
+ self.last_used = 0
27
+
28
+ def calculate_capacity(self, image: Image.Image, bits_per_channel: int = 1) -> int:
29
+ """
30
+ Calculate how many bytes can be hidden in the image
31
+
32
+ Args:
33
+ image: PIL Image object
34
+ bits_per_channel: Number of LSBs to use per color channel (1-4)
35
+
36
+ Returns:
37
+ Maximum bytes that can be hidden
38
+ """
39
+ if image.mode not in ['RGB', 'RGBA']:
40
+ raise ValueError(f"Unsupported image mode: {image.mode}. Use RGB or RGBA.")
41
+
42
+ width, height = image.size
43
+ channels = len(image.mode) # 3 for RGB, 4 for RGBA
44
+
45
+ # Total bits available
46
+ total_bits = width * height * channels * bits_per_channel
47
+
48
+ # Account for header (magic number + length + checksum)
49
+ header_bits = (len(self.MAGIC_NUMBER) + self.HEADER_SIZE) * 8
50
+
51
+ available_bits = total_bits - header_bits
52
+ capacity = available_bits // 8 # Convert to bytes
53
+
54
+ self.last_capacity = capacity
55
+ return capacity
56
+
57
+ def _string_to_bits(self, data: str) -> str:
58
+ """Convert string to binary representation"""
59
+ return ''.join(format(byte, '08b') for byte in data.encode('utf-8'))
60
+
61
+ def _bits_to_string(self, bits: str) -> str:
62
+ """Convert binary representation back to string"""
63
+ chars = []
64
+ for i in range(0, len(bits), 8):
65
+ byte = bits[i:i+8]
66
+ if len(byte) == 8:
67
+ chars.append(chr(int(byte, 2)))
68
+ return ''.join(chars)
69
+
70
+ def _encrypt_data(self, data: str, password: str) -> bytes:
71
+ """Simple XOR encryption with password-derived key"""
72
+ key = hashlib.sha256(password.encode()).digest()
73
+ data_bytes = data.encode('utf-8')
74
+
75
+ encrypted = bytearray()
76
+ for i, byte in enumerate(data_bytes):
77
+ encrypted.append(byte ^ key[i % len(key)])
78
+
79
+ return bytes(encrypted)
80
+
81
+ def _decrypt_data(self, encrypted_data: bytes, password: str) -> str:
82
+ """Decrypt XOR-encrypted data"""
83
+ key = hashlib.sha256(password.encode()).digest()
84
+
85
+ decrypted = bytearray()
86
+ for i, byte in enumerate(encrypted_data):
87
+ decrypted.append(byte ^ key[i % len(key)])
88
+
89
+ return bytes(decrypted).decode('utf-8', errors='replace')
90
+
91
+ def embed_data(
92
+ self,
93
+ image_path: str,
94
+ data: str,
95
+ output_path: str,
96
+ password: Optional[str] = None,
97
+ bits_per_channel: int = 1
98
+ ) -> Tuple[bool, str, dict]:
99
+ """
100
+ Hide data in an image using LSB steganography
101
+
102
+ Args:
103
+ image_path: Path to input image
104
+ data: Text data to hide
105
+ output_path: Path for output image (will be PNG)
106
+ password: Optional password for encryption
107
+ bits_per_channel: LSBs to use per channel (1=subtle, 2-4=more capacity)
108
+
109
+ Returns:
110
+ Tuple of (success, message, stats_dict)
111
+ """
112
+ try:
113
+ # Load image
114
+ img = Image.open(image_path)
115
+ if img.mode not in ['RGB', 'RGBA']:
116
+ img = img.convert('RGB')
117
+
118
+ # Calculate capacity
119
+ capacity = self.calculate_capacity(img, bits_per_channel)
120
+
121
+ # Encrypt data if password provided
122
+ if password:
123
+ data_bytes = self._encrypt_data(data, password)
124
+ is_encrypted = True
125
+ else:
126
+ data_bytes = data.encode('utf-8')
127
+ is_encrypted = False
128
+
129
+ data_length = len(data_bytes)
130
+
131
+ if data_length > capacity:
132
+ return False, f"Data too large! Maximum: {capacity} bytes, Provided: {data_length} bytes", {}
133
+
134
+ # Create header: MAGIC + encrypted_flag + length + checksum
135
+ checksum = hashlib.md5(data_bytes).digest()[:8]
136
+ encrypted_flag = b'\x01' if is_encrypted else b'\x00'
137
+ header = self.MAGIC_NUMBER + encrypted_flag + struct.pack('<I', data_length) + checksum
138
+
139
+ # Combine header and data
140
+ full_data = header + data_bytes
141
+
142
+ # Convert to bit string
143
+ bit_string = ''.join(format(byte, '08b') for byte in full_data)
144
+
145
+ # Embed in image
146
+ img_array = np.array(img, dtype=np.uint8)
147
+ flat_array = img_array.flatten()
148
+
149
+ bit_index = 0
150
+ for i in range(len(flat_array)):
151
+ if bit_index >= len(bit_string):
152
+ break
153
+
154
+ # Clear LSBs and set new bits
155
+ pixel = flat_array[i]
156
+ for bit in range(bits_per_channel):
157
+ if bit_index >= len(bit_string):
158
+ break
159
+ # Clear bit
160
+ pixel = (pixel & ~(1 << bit))
161
+ # Set new bit
162
+ if bit_string[bit_index] == '1':
163
+ pixel = pixel | (1 << bit)
164
+ bit_index += 1
165
+
166
+ flat_array[i] = pixel
167
+
168
+ # Reshape and save
169
+ steg_img_array = flat_array.reshape(img_array.shape)
170
+ steg_img = Image.fromarray(steg_img_array, img.mode)
171
+
172
+ # Save as PNG to preserve data
173
+ steg_img.save(output_path, 'PNG', optimize=False)
174
+
175
+ self.last_used = data_length
176
+
177
+ stats = {
178
+ 'data_size': data_length,
179
+ 'capacity': capacity,
180
+ 'utilization': f"{(data_length / capacity * 100):.1f}%",
181
+ 'encrypted': is_encrypted,
182
+ 'bits_per_channel': bits_per_channel,
183
+ 'image_size': f"{img.width}x{img.height}"
184
+ }
185
+
186
+ return True, f"Successfully embedded {data_length} bytes", stats
187
+
188
+ except Exception as e:
189
+ return False, f"Error embedding data: {str(e)}", {}
190
+
191
+ def extract_data(
192
+ self,
193
+ image_path: str,
194
+ password: Optional[str] = None,
195
+ bits_per_channel: int = 1
196
+ ) -> Tuple[bool, str, str]:
197
+ """
198
+ Extract hidden data from a steganographic image
199
+
200
+ Args:
201
+ image_path: Path to image with hidden data
202
+ password: Password if data is encrypted
203
+ bits_per_channel: LSBs used per channel (must match embedding)
204
+
205
+ Returns:
206
+ Tuple of (success, message, extracted_data)
207
+ """
208
+ try:
209
+ # Load image
210
+ img = Image.open(image_path)
211
+ img_array = np.array(img, dtype=np.uint8)
212
+ flat_array = img_array.flatten()
213
+
214
+ # Extract header first
215
+ header_bits = (len(self.MAGIC_NUMBER) + 1 + 4 + 8) * 8
216
+ extracted_bits = []
217
+
218
+ bit_index = 0
219
+ for i in range(len(flat_array)):
220
+ if bit_index >= header_bits:
221
+ break
222
+ pixel = flat_array[i]
223
+ for bit in range(bits_per_channel):
224
+ if bit_index >= header_bits:
225
+ break
226
+ extracted_bits.append(str((pixel >> bit) & 1))
227
+ bit_index += 1
228
+
229
+ # Convert bits to bytes
230
+ header_bytes = bytearray()
231
+ for i in range(0, len(extracted_bits), 8):
232
+ byte_bits = ''.join(extracted_bits[i:i+8])
233
+ if len(byte_bits) == 8:
234
+ header_bytes.append(int(byte_bits, 2))
235
+
236
+ # Verify magic number
237
+ magic = bytes(header_bytes[:len(self.MAGIC_NUMBER)])
238
+ if magic != self.MAGIC_NUMBER:
239
+ return False, "No hidden data found (invalid magic number)", ""
240
+
241
+ # Parse header
242
+ offset = len(self.MAGIC_NUMBER)
243
+ is_encrypted = header_bytes[offset] == 1
244
+ offset += 1
245
+
246
+ data_length = struct.unpack('<I', bytes(header_bytes[offset:offset+4]))[0]
247
+ offset += 4
248
+
249
+ stored_checksum = bytes(header_bytes[offset:offset+8])
250
+ offset += 8
251
+
252
+ # Extract data
253
+ total_bits_needed = (len(self.MAGIC_NUMBER) + 1 + 4 + 8 + data_length) * 8
254
+ extracted_bits = []
255
+
256
+ bit_index = 0
257
+ for i in range(len(flat_array)):
258
+ if bit_index >= total_bits_needed:
259
+ break
260
+ pixel = flat_array[i]
261
+ for bit in range(bits_per_channel):
262
+ if bit_index >= total_bits_needed:
263
+ break
264
+ extracted_bits.append(str((pixel >> bit) & 1))
265
+ bit_index += 1
266
+
267
+ # Convert to bytes
268
+ data_bytes = bytearray()
269
+ for i in range(0, len(extracted_bits), 8):
270
+ byte_bits = ''.join(extracted_bits[i:i+8])
271
+ if len(byte_bits) == 8:
272
+ data_bytes.append(int(byte_bits, 2))
273
+
274
+ # Skip header and get data
275
+ data_bytes = bytes(data_bytes[offset:offset+data_length])
276
+
277
+ # Verify checksum
278
+ calculated_checksum = hashlib.md5(data_bytes).digest()[:8]
279
+ if calculated_checksum != stored_checksum:
280
+ return False, "Data corruption detected (checksum mismatch)", ""
281
+
282
+ # Decrypt if needed
283
+ if is_encrypted:
284
+ if not password:
285
+ return False, "Data is encrypted but no password provided", ""
286
+ try:
287
+ data_str = self._decrypt_data(data_bytes, password)
288
+ except Exception as e:
289
+ return False, f"Decryption failed (wrong password?): {str(e)}", ""
290
+ else:
291
+ data_str = data_bytes.decode('utf-8', errors='replace')
292
+
293
+ return True, f"Successfully extracted {data_length} bytes", data_str
294
+
295
+ except Exception as e:
296
+ return False, f"Error extracting data: {str(e)}", ""
297
+
298
+
299
+ def main():
300
+ """Command-line interface for testing"""
301
+ import argparse
302
+
303
+ parser = argparse.ArgumentParser(description='LSB Steganography Tool')
304
+ parser.add_argument('mode', choices=['embed', 'extract'], help='Operation mode')
305
+ parser.add_argument('image', help='Input image path')
306
+ parser.add_argument('--data', help='Data to embed (for embed mode)')
307
+ parser.add_argument('--output', help='Output image path (for embed mode)')
308
+ parser.add_argument('--password', help='Encryption password (optional)')
309
+ parser.add_argument('--bits', type=int, default=1, help='Bits per channel (1-4)')
310
+
311
+ args = parser.parse_args()
312
+
313
+ embedder = StegEmbedder()
314
+
315
+ if args.mode == 'embed':
316
+ if not args.data or not args.output:
317
+ print("Error: --data and --output required for embed mode")
318
+ return
319
+
320
+ success, message, stats = embedder.embed_data(
321
+ args.image, args.data, args.output, args.password, args.bits
322
+ )
323
+ print(message)
324
+ if success:
325
+ print(f"Stats: {stats}")
326
+
327
+ elif args.mode == 'extract':
328
+ success, message, data = embedder.extract_data(
329
+ args.image, args.password, args.bits
330
+ )
331
+ print(message)
332
+ if success:
333
+ print(f"Extracted data:\n{data}")
334
+
335
+
336
+ if __name__ == '__main__':
337
+ main()