Simplify model (#2)

1f1b5fd verified 13 days ago

5.93 kB

	# BirdNET Audio Prediction Script

	This script loads a WAV file and uses the BirdNET ONNX model to predict bird species from audio recordings. It supports both single-window analysis (first 3 seconds) and moving window analysis (entire file) with species name mapping.

	## Features

	- Species Name Mapping: Uses `BirdNET_GLOBAL_6K_V2.4_Labels.txt` to display actual bird species names instead of class indices
	- Moving Window Analysis: Analyzes entire audio files using overlapping 3-second windows
	- Single Window Mode: Quick analysis of just the first 3 seconds
	- Configurable Parameters: Adjustable confidence thresholds, overlap ratios, and result counts
	- Detection Summary: Comprehensive overview of all detections with timestamps and confidence scores

	## Requirements

	- Python 3.7+
	- The model expects audio input of exactly 3 seconds duration at 48kHz sample rate (144,000 samples)
	- BirdNET labels file: `BirdNET_GLOBAL_6K_V2.4_Labels.txt`

	## Installation

	Install the required dependencies:

	```bash
	pip install -r requirements.txt
	```

	Required packages:

	- `numpy>=1.21.0`
	- `librosa>=0.9.0`
	- `onnxruntime>=1.12.0`

	## Usage

	### Moving Window Analysis (Full File)

	Analyze the entire audio file with overlapping windows:

	```bash
	python predict_audio.py audio.wav
	```

	### Single Window Analysis (First 3 seconds only)

	Quick analysis of just the beginning:

	```bash
	python predict_audio.py audio.wav --single-window
	```

	### Advanced Usage Examples

	```bash
	# High sensitivity analysis with more results
	python predict_audio.py audio.wav --confidence 0.1 --top-k 15

	# Fine-grained analysis with 75% window overlap
	python predict_audio.py audio.wav --overlap 0.75 --confidence 0.3

	# Custom model and labels files
	python predict_audio.py audio.wav --model custom_model.onnx --labels custom_labels.txt
	```

	### Command Line Arguments

	- `audio_file`: Path to the WAV audio file (required)
	- `--model`: Path to the ONNX model file (default: `model.onnx`)
	- `--labels`: Path to the species labels file (default: `BirdNET_GLOBAL_6K_V2.4_Labels.txt`)
	- `--top-k`: Number of top predictions to show (default: 5)
	- `--overlap`: Window overlap ratio 0.0-1.0 (default: 0.5 = 50% overlap)
	- `--confidence`: Minimum confidence threshold for detections (default: 0.1)
	- `--batch-size`: Batch size for inference processing (default: 128)
	- `--single-window`: Analyze only first 3 seconds instead of full file

	## Output Examples

	### Single Window Output

	```
	Loading labels from: BirdNET_GLOBAL_6K_V2.4_Labels.txt
	Loaded 6522 species labels
	Loading ONNX model: model.onnx
	Loading first 3 seconds of audio file: bird_recording.wav
	Audio loaded successfully. Shape: (144000,)
	Running inference on single window...

	Top 5 predictions for first 3 seconds:
	1. American Robin: 0.892456
	2. Song Sparrow: 0.234567
	3. House Finch: 0.123789
	4. Northern Cardinal: 0.089234
	5. Blue Jay: 0.056789
	```

	### Moving Window Output

	```
	Loading labels from: BirdNET_GLOBAL_6K_V2.4_Labels.txt
	Loaded 6522 species labels
	Loading ONNX model: model.onnx
	Loading full audio file: long_recording.wav
	Audio loaded successfully. Duration: 45.32 seconds
	Creating windows with 50% overlap...
	Created 28 windows of 3 seconds each
	Running inference on all windows...
	Processing window 1/28 (t=0.0s)
	Processing window 11/28 (t=15.0s)
	Processing window 21/28 (t=30.0s)
	Completed inference on 28 windows
	Analyzing detections with confidence threshold 0.1...

	=== DETECTION SUMMARY ===
	Audio duration: 45.32 seconds
	Windows analyzed: 28
	Species detected (>0.10 confidence): 4

	Top detections:

	American Robin
	Max confidence: 0.892456
	Detections: 12
	Time range: 0.0s - 18.0s
	1.5s: 0.892456
	3.0s: 0.845231
	4.5s: 0.723456

	Song Sparrow
	Max confidence: 0.567890
	Detections: 6
	Time range: 22.5s - 36.0s
	24.0s: 0.567890
	25.5s: 0.445678
	27.0s: 0.334567

	House Finch
	Max confidence: 0.345678
	Detections: 3
	Time range: 38.5s - 42.0s
	39.0s: 0.345678
	```

	## Technical Details

	### Model Input/Output

	- Input: Audio array of shape `[batch_size, 144000]` (3 seconds at 48kHz)
	- Output: Classification scores for 6522 bird species

	### Audio Preprocessing

	The script automatically handles:

	- Loading audio files with librosa (supports WAV, MP3, FLAC, etc.)
	- Resampling to 48kHz if necessary
	- Padding with zeros or truncating to exactly 3 seconds (144,000 samples)
	- Converting to float32 format

	### Moving Window Analysis

	- Creates overlapping 3-second windows from the full audio
	- Default 50% overlap means windows at 0s, 1.5s, 3s, 4.5s, etc.
	- Higher overlap (e.g., 75%) provides more fine-grained analysis but takes longer
	- Each window is analyzed independently, then results are aggregated

	### Batch Processing

	- Windows are processed in configurable batches (default: 128 windows per batch)
	- Significantly improves performance by utilizing vectorized operations
	- Automatically handles memory management and progress reporting
	- Optimal batch size depends on available system memory and model complexity

	### Species Labels

	- Uses the official BirdNET labels file with 6522 species
	- Format: `Scientific_name_Common Name` per line
	- Script extracts and displays the common names (part after underscore)

	## Performance Tips

	- Use `--single-window` for quick identification of prominent species
	- Increase `--overlap` (0.75-0.9) for detailed analysis of complex recordings
	- Lower `--confidence` (0.05-0.1) to catch weaker signals
	- Higher `--confidence` (0.3-0.5) for only very confident detections
	- Use `--top-k 1` to see only the most confident detection per analysis
	- Batch Processing: Default `--batch-size 128` provides optimal performance
	- Increase batch size (256, 512) if you have more GPU/RAM memory
	- Decrease batch size (32, 64) if you encounter memory issues
	- Batch processing significantly improves performance on longer audio files