omniscience

Runtime error

App Files Files Community

dwb2023 commited on Aug 8, 2024

Commit

3253f8e

verified ·

1 Parent(s): 79605da

Update README.md

Browse files

Files changed (1) hide show

README.md +266 -1

README.md CHANGED Viewed

@@ -10,4 +10,269 @@ pinned: false
 license: openrail
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 license: openrail
 ---
+## Creating instructions
+- Load the image from the given file path '/home/user/tmp9873xen5.jpg'.
+- Use the 'owl_v2' tool to detect brain tumors in the image. The prompt should be 'brain tumor'.
+- Use the 'grounding_sam' tool to segment brain tumors in the image. The prompt should be 'brain tumor'.
+- Overlay the bounding boxes from the detection results on the original image using the 'overlay_bounding_boxes' utility.
+- Overlay the segmentation masks from the segmentation results on the original image using the 'overlay_segmentation_masks' utility.
+- Save the final image with both bounding boxes and segmentation masks to a specified output path.
+## Retrieving tools
+- 'load_image' is a utility function that loads an image from the given file path string.
+'save_image' is a utility function that saves an image to a file path.
+- 'owl_v2' is a tool that can detect and count multiple objects given a text prompt such as category names or referring expressions. The categories in text prompt are separated by commas. It returns a list of bounding boxes with normalized coordinates, label names and associated probability scores.
+- 'florencev2_object_detection' is a tool that can detect common objects in an image without any text prompt or thresholding. It returns a list of detected objects as labels and their location as bounding boxes.
+- 'grounding_sam' is a tool that can segment multiple objects given a text prompt such as category names or referring expressions. The categories in text prompt are separated by commas or periods. It returns a list of bounding boxes, label names, mask file names and associated probability scores.
+- 'detr_segmentation' is a tool that can segment common objects in an image without any text prompt. It returns a list of detected objects as labels, their regions as masks and their scores.
+- 'overlay_bounding_boxes' is a utility function that displays bounding boxes on an image.
+- 'overlay_heat_map' is a utility function that displays a heat map on an image.
+- 'overlay_segmentation_masks' is a utility function that displays segmentation masks.
+### Retrieving tools - detailed notes on tool selection
+load_image(image_path: str) -> numpy.ndarray:
+'load_image' is a utility function that loads an image from the given file path string.
+    Parameters:
+        image_path (str): The path to the image.
+    Returns:
+        np.ndarray: The image as a NumPy array.
+    Example
+    -------
+        >>> load_image("path/to/image.jpg")
+save_image(image: numpy.ndarray, file_path: str) -> None:
+'save_image' is a utility function that saves an image to a file path.
+    Parameters:
+        image (np.ndarray): The image to save.
+        file_path (str): The path to save the image file.
+    Example
+    -------
+        >>> save_image(image)
+owl_v2(prompt: str, image: numpy.ndarray, box_threshold: float = 0.1, iou_threshold: float = 0.1) -> List[Dict[str, Any]]:
+'owl_v2' is a tool that can detect and count multiple objects given a text
+    prompt such as category names or referring expressions. The categories in text prompt
+    are separated by commas. It returns a list of bounding boxes with
+    normalized coordinates, label names and associated probability scores.
+    Parameters:
+        prompt (str): The prompt to ground to the image.
+        image (np.ndarray): The image to ground the prompt to.
+        box_threshold (float, optional): The threshold for the box detection. Defaults
+            to 0.10.
+        iou_threshold (float, optional): The threshold for the Intersection over Union
+            (IoU). Defaults to 0.10.
+    Returns:
+        List[Dict[str, Any]]: A list of dictionaries containing the score, label, and
+            bounding box of the detected objects with normalized coordinates between 0
+            and 1 (xmin, ymin, xmax, ymax). xmin and ymin are the coordinates of the
+            top-left and xmax and ymax are the coordinates of the bottom-right of the
+            bounding box.
+    Example
+    -------
+        >>> owl_v2("car. dinosaur", image)
+        [
+            {'score': 0.99, 'label': 'dinosaur', 'bbox': [0.1, 0.11, 0.35, 0.4]},
+            {'score': 0.98, 'label': 'car', 'bbox': [0.2, 0.21, 0.45, 0.5},
+        ]
+florencev2_object_detection(image: numpy.ndarray) -> List[Dict[str, Any]]:
+'florencev2_object_detection' is a tool that can detect common objects in an
+    image without any text prompt or thresholding. It returns a list of detected objects
+    as labels and their location as bounding boxes.
+    Parameters:
+        image (np.ndarray): The image to used to detect objects
+    Returns:
+        List[Dict[str, Any]]: A list of dictionaries containing the score, label, and
+            bounding box of the detected objects with normalized coordinates between 0
+            and 1 (xmin, ymin, xmax, ymax). xmin and ymin are the coordinates of the
+            top-left and xmax and ymax are the coordinates of the bottom-right of the
+            bounding box. The scores are always 1.0 and cannot be thresholded
+    Example
+    -------
+        >>> florencev2_object_detection(image)
+        [
+            {'score': 1.0, 'label': 'window', 'bbox': [0.1, 0.11, 0.35, 0.4]},
+            {'score': 1.0, 'label': 'car', 'bbox': [0.2, 0.21, 0.45, 0.5},
+            {'score': 1.0, 'label': 'person', 'bbox': [0.34, 0.21, 0.85, 0.5},
+        ]
+grounding_sam(prompt: str, image: numpy.ndarray, box_threshold: float = 0.2, iou_threshold: float = 0.2) -> List[Dict[str, Any]]:
+'grounding_sam' is a tool that can segment multiple objects given a
+    text prompt such as category names or referring expressions. The categories in text
+    prompt are separated by commas or periods. It returns a list of bounding boxes,
+    label names, mask file names and associated probability scores.
+    Parameters:
+        prompt (str): The prompt to ground to the image.
+        image (np.ndarray): The image to ground the prompt to.
+        box_threshold (float, optional): The threshold for the box detection. Defaults
+            to 0.20.
+        iou_threshold (float, optional): The threshold for the Intersection over Union
+            (IoU). Defaults to 0.20.
+    Returns:
+        List[Dict[str, Any]]: A list of dictionaries containing the score, label,
+            bounding box, and mask of the detected objects with normalized coordinates
+            (xmin, ymin, xmax, ymax). xmin and ymin are the coordinates of the top-left
+            and xmax and ymax are the coordinates of the bottom-right of the bounding box.
+            The mask is binary 2D numpy array where 1 indicates the object and 0 indicates
+            the background.
+    Example
+    -------
+        >>> grounding_sam("car. dinosaur", image)
+        [
+            {
+                'score': 0.99,
+                'label': 'dinosaur',
+                'bbox': [0.1, 0.11, 0.35, 0.4],
+                'mask': array([[0, 0, 0, ..., 0, 0, 0],
+                    [0, 0, 0, ..., 0, 0, 0],
+                    ...,
+                    [0, 0, 0, ..., 0, 0, 0],
+                    [0, 0, 0, ..., 0, 0, 0]], dtype=uint8),
+            },
+        ]
+detr_segmentation(image: numpy.ndarray) -> List[Dict[str, Any]]:
+'detr_segmentation' is a tool that can segment common objects in an
+    image without any text prompt. It returns a list of detected objects
+    as labels, their regions as masks and their scores.
+    Parameters:
+        image (np.ndarray): The image used to segment things and objects
+    Returns:
+        List[Dict[str, Any]]: A list of dictionaries containing the score, label
+            and mask of the detected objects. The mask is binary 2D numpy array where 1
+            indicates the object and 0 indicates the background.
+    Example
+    -------
+        >>> detr_segmentation(image)
+        [
+            {
+                'score': 0.45,
+                'label': 'window',
+                'mask': array([[0, 0, 0, ..., 0, 0, 0],
+                    [0, 0, 0, ..., 0, 0, 0],
+                    ...,
+                    [0, 0, 0, ..., 0, 0, 0],
+                    [0, 0, 0, ..., 0, 0, 0]], dtype=uint8),
+            },
+            {
+                'score': 0.70,
+                'label': 'bird',
+                'mask': array([[0, 0, 0, ..., 0, 0, 0],
+                    [0, 0, 0, ..., 0, 0, 0],
+                    ...,
+                    [0, 0, 0, ..., 0, 0, 0],
+                    [0, 0, 0, ..., 0, 0, 0]], dtype=uint8),
+            },
+        ]
+overlay_bounding_boxes(image: numpy.ndarray, bboxes: List[Dict[str, Any]]) -> numpy.ndarray:
+'overlay_bounding_boxes' is a utility function that displays bounding boxes on
+    an image.
+    Parameters:
+        image (np.ndarray): The image to display the bounding boxes on.
+        bboxes (List[Dict[str, Any]]): A list of dictionaries containing the bounding
+            boxes.
+    Returns:
+        np.ndarray: The image with the bounding boxes, labels and scores displayed.
+    Example
+    -------
+        >>> image_with_bboxes = overlay_bounding_boxes(
+            image, [{'score': 0.99, 'label': 'dinosaur', 'bbox': [0.1, 0.11, 0.35, 0.4]}],
+        )
+overlay_heat_map(image: numpy.ndarray, heat_map: Dict[str, Any], alpha: float = 0.8) -> numpy.ndarray:
+'overlay_heat_map' is a utility function that displays a heat map on an image.
+    Parameters:
+        image (np.ndarray): The image to display the heat map on.
+        heat_map (Dict[str, Any]): A dictionary containing the heat map under the key
+            'heat_map'.
+        alpha (float, optional): The transparency of the overlay. Defaults to 0.8.
+    Returns:
+        np.ndarray: The image with the heat map displayed.
+    Example
+    -------
+        >>> image_with_heat_map = overlay_heat_map(
+            image,
+            {
+                'heat_map': array([[0, 0, 0, ..., 0, 0, 0],
+                    [0, 0, 0, ..., 0, 0, 0],
+                    ...,
+                    [0, 0, 0, ..., 0, 0, 0],
+                    [0, 0, 0, ..., 125, 125, 125]], dtype=uint8),
+            },
+        )
+overlay_segmentation_masks(image: numpy.ndarray, masks: List[Dict[str, Any]]) -> numpy.ndarray:
+'overlay_segmentation_masks' is a utility function that displays segmentation
+    masks.
+    Parameters:
+        image (np.ndarray): The image to display the masks on.
+        masks (List[Dict[str, Any]]): A list of dictionaries containing the masks.
+    Returns:
+        np.ndarray: The image with the masks displayed.
+    Example
+    -------
+        >>> image_with_masks = overlay_segmentation_masks(
+            image,
+            [{
+                'score': 0.99,
+                'label': 'dinosaur',
+                'mask': array([[0, 0, 0, ..., 0, 0, 0],
+                    [0, 0, 0, ..., 0, 0, 0],
+                    ...,
+                    [0, 0, 0, ..., 0, 0, 0],
+                    [0, 0, 0, ..., 0, 0, 0]], dtype=uint8),
+            }],
+        )
+## Vision Agent Tools - model summary
+| Model Name          | Hugging Face Model                  | Primary Function               | Use Cases                                                    |
+|---------------------|-------------------------------------|-------------------------------|--------------------------------------------------------------|
+| OWL-ViT v2          | google/owlv2-base-patch16-ensemble  | Object detection and localization | - Open-world object detection<br>- Locating specific objects based on text prompts |
+| Florence-2          | microsoft/florence-base             | Multi-purpose vision tasks      | - Image captioning<br>- Visual question answering<br>- Object detection |
+| Depth Anything V2   | LiheYoung/depth-anything-v2-small   | Depth estimation                | - Estimating depth in images<br>- Generating depth maps      |
+| CLIP                | openai/clip-vit-base-patch32        | Image-text similarity           | - Zero-shot image classification<br>- Image-text matching    |
+| BLIP                | Salesforce/blip-image-captioning-base | Image captioning                | - Generating text descriptions of images                    |
+| LOCA                | Custom implementation               | Object counting                 | - Zero-shot object counting<br>- Object counting with visual prompts |
+| GIT v2              | microsoft/git-base-textcaps         | Visual question answering and image captioning | - Answering questions about image content<br>- Generating text descriptions of images |
+| Grounding DINO      | groundingdino/groundingdino-swint-ogc | Object detection and localization | - Detecting objects based on text prompts                   |
+| SAM                 | facebook/sam-vit-huge               | Instance segmentation           | - Text-prompted instance segmentation                       |
+| DETR                | facebook/detr-resnet-50             | Object detection                | - General object detection                                  |
+| ViT                 | google/vit-base-patch16-224         | Image classification            | - General image classification<br>- NSFW content detection  |
+| DPT                 | Intel/dpt-hybrid-midas              | Monocular depth estimation      | - Estimating depth from single images                       |