{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "oGwrUZSxXSuG" }, "source": [ "# Multimodal Product Classification\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import matplotlib.image as mpimg\n", "from src.utils import ImageDownloader\n", "from src.vision_embeddings_tf import get_embeddings_df\n", "from src.nlp_models import HuggingFaceEmbeddings\n", "from src.utils import preprocess_data, train_test_split_and_feature_extraction\n", "import os\n", "\n", "from src.classifiers_classic_ml import train_and_evaluate_model as train_classic\n", "\n", "from src.classifiers_mlp import MultimodalDataset\n", "from sklearn.preprocessing import LabelEncoder\n", "from src.classifiers_mlp import train_mlp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Getting the data\n" ] }, { "cell_type": "markdown", "metadata": { "id": "vqPF-5jcXSuI" }, "source": [ "- Download the [images](https://drive.google.com/file/d/14s2aDNTEWse86cWyLhvVIKmob6EbQrm_/view?usp=sharing)\n", "- Place the images in the `data/images` directory.\n", "\n", "Optionally, you can use the following Python snippet to programmatically download the images and generate the `processed_products_with_images.csv` file. This is an alternative to the manual steps above.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```py\n", "# Load the data:\n", "CSV_PATH = \"data/raw/processed_products.csv\"\n", "df = pd.read_csv(CSV_PATH)\n", "\n", "# Download the images and add the image paths to the dataframe:\n", "DIR = \"data/images/\"\n", "SHAPE = (224, 224)\n", "OVERWRITE = False\n", "OUTPUT_CSV = \"data/processed_products_with_images.csv\"\n", "\n", "# Instantiate the ImageDownloader class\n", "image_downloader = ImageDownloader(image_dir=DIR, image_size=SHAPE, overwrite=OVERWRITE)\n", "\n", "# Download images and get the updated DataFrame\n", "updated_df = image_downloader.download_images(df)\n", "\n", "# Save the updated DataFrame\n", "updated_df.to_csv(CSV_PATH, index=False)\n", "```\n" ] }, { "cell_type": "markdown", "metadata": { "id": "SlO06ncGXSuJ" }, "source": [ "## 2. EDA\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Read the data and display the first few rows.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | sku | \n", "name | \n", "description | \n", "image | \n", "type | \n", "price | \n", "shipping | \n", "manufacturer | \n", "class_id | \n", "sub_class1_id | \n", "num_classes | \n", "image_path | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "43900 | \n", "Duracell - AAA Batteries (4-Pack) | \n", "Compatible with select electronic devices; AAA... | \n", "http://img.bbystatic.com/BestBuy_US/images/pro... | \n", "HardGood | \n", "5.49 | \n", "5.49 | \n", "Duracell | \n", "pcmcat312300050015 | \n", "pcmcat248700050021 | \n", "2 | \n", "data/images/43900.jpg | \n", "
| 1 | \n", "48530 | \n", "Duracell - AA 1.5V CopperTop Batteries (4-Pack) | \n", "Long-lasting energy; DURALOCK Power Preserve t... | \n", "http://img.bbystatic.com/BestBuy_US/images/pro... | \n", "HardGood | \n", "5.49 | \n", "5.49 | \n", "Duracell | \n", "pcmcat312300050015 | \n", "pcmcat248700050021 | \n", "2 | \n", "data/images/48530.jpg | \n", "
| 2 | \n", "127687 | \n", "Duracell - AA Batteries (8-Pack) | \n", "Compatible with select electronic devices; AA ... | \n", "http://img.bbystatic.com/BestBuy_US/images/pro... | \n", "HardGood | \n", "7.49 | \n", "5.49 | \n", "Duracell | \n", "pcmcat312300050015 | \n", "pcmcat248700050021 | \n", "2 | \n", "data/images/127687.jpg | \n", "
| 3 | \n", "150115 | \n", "Energizer - MAX Batteries AA (4-Pack) | \n", "4-pack AA alkaline batteries; battery tester i... | \n", "http://img.bbystatic.com/BestBuy_US/images/pro... | \n", "HardGood | \n", "4.99 | \n", "5.49 | \n", "Energizer | \n", "pcmcat312300050015 | \n", "pcmcat248700050021 | \n", "2 | \n", "data/images/150115.jpg | \n", "
| 4 | \n", "185230 | \n", "Duracell - C Batteries (4-Pack) | \n", "Compatible with select electronic devices; C s... | \n", "http://img.bbystatic.com/BestBuy_US/images/pro... | \n", "HardGood | \n", "8.99 | \n", "5.49 | \n", "Duracell | \n", "pcmcat312300050015 | \n", "pcmcat248700050021 | \n", "2 | \n", "data/images/185230.jpg | \n", "