Skip to content

Tools API Reference

AIMQ provides a set of built-in tools for document processing and storage operations.

OCR Tools

Image OCR {#image-ocr}

aimq.tools.ocr.image_ocr

Tool for performing OCR on images.

Classes

ImageOCR(**kwargs)

Bases: BaseTool

Tool for performing OCR on images.

Initialize the OCR processor.

Source code in src/aimq/tools/ocr/image_ocr.py
def __init__(self, **kwargs):
    """Initialize the OCR processor."""
    super().__init__(**kwargs)
    self.processor = OCRProcessor()
Functions

ImageOCRInput

Bases: BaseModel

Input for ImageOCR.

PDF Processor {#pdf-processor}

aimq.tools.ocr.processor

OCR module for text extraction and processing from images.

This module provides functionality for extracting and processing text from images using the EasyOCR library. It includes utilities for handling text bounding boxes, merging overlapping detections, and creating debug visualizations.

Classes

OCRProcessor(languages=None)

Processor for performing OCR on images using EasyOCR.

This class provides a high-level interface for performing OCR on images. It handles initialization of the EasyOCR reader, image preprocessing, text detection, and optional debug visualization.

Attributes:

Name Type Description
languages

List of language codes for OCR

_reader

Lazy-loaded EasyOCR reader instance

Initialize OCR processor with specified languages.

Parameters:

Name Type Description Default
languages Optional[List[str]]

List of language codes (default: ['en'])

None
Source code in src/aimq/tools/ocr/processor.py
def __init__(self, languages: Optional[List[str]] = None) -> None:
    """Initialize OCR processor with specified languages.

    Args:
        languages: List of language codes (default: ['en'])
    """
    self.languages = languages or ['en']
    self._reader = None
Attributes
reader property

Get or initialize the EasyOCR reader.

Returns:

Type Description
Reader

easyocr.Reader: Initialized EasyOCR reader instance

Functions
process_image(image, save_debug_image=False)

Process an image and return OCR results.

Parameters:

Name Type Description Default
image Union[str, Path, Image, bytes]

The image to process. Can be one of: - Path to image file (str or Path) - PIL Image object - Bytes of image data

required
save_debug_image bool

If True, includes debug image in output

False

Returns:

Type Description
Dict[str, Any]

Dict[str, Any]: OCR results including: - processing_time: Time taken to process in seconds - text: Extracted text content - debug_image: Optional base64 encoded debug image - detections: List of text detections with coordinates

Raises:

Type Description
ValueError

If image format is invalid or unreadable

Source code in src/aimq/tools/ocr/processor.py
def process_image(
    self, 
    image: Union[str, Path, Image.Image, bytes], 
    save_debug_image: bool = False,
) -> Dict[str, Any]:
    """Process an image and return OCR results.

    Args:
        image: The image to process. Can be one of:
            - Path to image file (str or Path)
            - PIL Image object
            - Bytes of image data
        save_debug_image: If True, includes debug image in output

    Returns:
        Dict[str, Any]: OCR results including:
            - processing_time: Time taken to process in seconds
            - text: Extracted text content
            - debug_image: Optional base64 encoded debug image
            - detections: List of text detections with coordinates

    Raises:
        ValueError: If image format is invalid or unreadable
    """
    start_time = time.time()

    # Convert input to a format EasyOCR can process
    if isinstance(image, (str, Path)):
        image_path = str(image)
        pil_image = Image.open(image_path)
    elif isinstance(image, bytes):
        image_stream = io.BytesIO(image)
        pil_image = Image.open(image_stream)
        image_path = None
    elif isinstance(image, Image.Image):
        pil_image = image
        image_path = None
    else:
        raise ValueError("Image must be a file path, PIL Image, or bytes")

    # Convert PIL Image to numpy array for EasyOCR
    if pil_image.mode != 'RGB':
        pil_image = pil_image.convert('RGB')
    np_image = np.array(pil_image)

    # Read the image with optimized parameters
    results = self.reader.readtext(
        np_image,
        paragraph=False,
        min_size=20,
        text_threshold=0.7,
        link_threshold=0.4,
        low_text=0.4,
        width_ths=0.7,
        height_ths=0.9,
        ycenter_ths=0.9,
    )

    # Format initial results
    detections = []
    for result in results:
        if len(result) == 2:
            bbox, text = result
            confidence = 1.0
        else:
            bbox, text, confidence = result

        x1, y1 = int(bbox[0][0]), int(bbox[0][1])
        x2, y2 = int(bbox[1][0]), int(bbox[1][1])
        x3, y3 = int(bbox[2][0]), int(bbox[2][1])
        x4, y4 = int(bbox[3][0]), int(bbox[3][1])

        detections.append({
            "text": str(text),
            "confidence": float(round(float(confidence), 3)),
            "bounding_box": {
                "x": x1,
                "y": y1,
                "width": x2 - x1,
                "height": y3 - y1
            }
        })

    # Group the detections
    grouped_detections = group_text_boxes(
        detections,
        width_growth=20,
        height_growth=1
    )

    end_time = time.time()
    output = {
        "processing_time": float(round(end_time - start_time, 2)),
        "detections": grouped_detections,
        "text": " ".join(d["text"] for d in grouped_detections)
    }

    if save_debug_image:
        debug_image = self._create_debug_image(pil_image, grouped_detections)
        # Convert debug image to bytes
        debug_bytes = io.BytesIO()
        debug_image.save(debug_bytes, format='PNG')
        output["debug_image"] = debug_bytes.getvalue()

    return output

Functions

boxes_overlap(box1, box2)

Check if two boxes overlap at all.

Parameters:

Name Type Description Default
box1 Dict[str, int]

Dictionary with x, y, width, height

required
box2 Dict[str, int]

Dictionary with x, y, width, height

required

Returns:

Name Type Description
bool bool

True if boxes overlap

Source code in src/aimq/tools/ocr/processor.py
def boxes_overlap(box1: Dict[str, int], box2: Dict[str, int]) -> bool:
    """
    Check if two boxes overlap at all.

    Args:
        box1: Dictionary with x, y, width, height
        box2: Dictionary with x, y, width, height

    Returns:
        bool: True if boxes overlap
    """
    h_overlap = (
        box1['x'] < box2['x'] + box2['width'] and
        box2['x'] < box1['x'] + box1['width']
    )

    v_overlap = (
        box1['y'] < box2['y'] + box2['height'] and
        box2['y'] < box1['y'] + box1['height']
    )

    return h_overlap and v_overlap

group_text_boxes(detections, width_growth=0, height_growth=0)

Group text boxes that are spatially related.

This function groups text boxes that are spatially related, starting with overlapping boxes. It can optionally expand boxes horizontally and vertically before grouping to capture nearby text that may be related.

Parameters:

Name Type Description Default
detections List[Dict[str, Any]]

List of detection dictionaries containing text and bounding boxes

required
width_growth int

Number of pixels to expand boxes horizontally

0
height_growth int

Number of pixels to expand boxes vertically

0

Returns:

Type Description
List[Dict[str, Any]]

List[Dict[str, Any]]: List of grouped text detections with merged bounding boxes

Source code in src/aimq/tools/ocr/processor.py
def group_text_boxes(
    detections: List[Dict[str, Any]], 
    width_growth: int = 0, 
    height_growth: int = 0
) -> List[Dict[str, Any]]:
    """Group text boxes that are spatially related.

    This function groups text boxes that are spatially related, starting with
    overlapping boxes. It can optionally expand boxes horizontally and vertically
    before grouping to capture nearby text that may be related.

    Args:
        detections: List of detection dictionaries containing text and bounding boxes
        width_growth: Number of pixels to expand boxes horizontally
        height_growth: Number of pixels to expand boxes vertically

    Returns:
        List[Dict[str, Any]]: List of grouped text detections with merged bounding boxes
    """
    if not detections:
        return []

    def grow_box(box: Dict[str, int]) -> Dict[str, int]:
        """Helper to expand a box by the growth parameters"""
        return {
            'x': box['x'],
            'y': box['y'],
            'width': box['width'] + width_growth,
            'height': box['height'] + height_growth
        }

    groups = [[det] for det in detections]

    while True:
        merged = False
        new_groups = []
        used = set()

        for i, group1 in enumerate(groups):
            if i in used:
                continue

            merged_group = group1.copy()
            used.add(i)

            box1 = grow_box(merge_boxes([det['bounding_box'] for det in merged_group]))

            for j, group2 in enumerate(groups):
                if j in used:
                    continue

                box2 = merge_boxes([det['bounding_box'] for det in group2])

                if boxes_overlap(box1, box2):
                    merged_group.extend(group2)
                    used.add(j)
                    box1 = grow_box(merge_boxes([det['bounding_box'] for det in merged_group]))
                    merged = True

            new_groups.append(merged_group)

        if not merged:
            break

        groups = new_groups

    return [{
        "text": ' '.join(det['text'] for det in sorted(
            group,
            key=lambda d: (d['bounding_box']['y'], d['bounding_box']['x'])
        )),
        "confidence": float(round(
            sum(det['confidence'] for det in group) / len(group),
            3
        )),
        "bounding_box": merge_boxes([det['bounding_box'] for det in group])
    } for group in groups]

merge_boxes(boxes)

Merge a list of bounding boxes into a single box that encompasses all of them.

Parameters:

Name Type Description Default
boxes List[Dict[str, int]]

List of dictionaries with x, y, width, height

required

Returns:

Name Type Description
dict Optional[Dict[str, int]]

Merged bounding box or None if input is empty

Source code in src/aimq/tools/ocr/processor.py
def merge_boxes(boxes: List[Dict[str, int]]) -> Optional[Dict[str, int]]:
    """
    Merge a list of bounding boxes into a single box that encompasses all of them.

    Args:
        boxes: List of dictionaries with x, y, width, height

    Returns:
        dict: Merged bounding box or None if input is empty
    """
    if not boxes:
        return None

    min_x = min(box['x'] for box in boxes)
    min_y = min(box['y'] for box in boxes)
    max_x = max(box['x'] + box['width'] for box in boxes)
    max_y = max(box['y'] + box['height'] for box in boxes)

    return {
        'x': int(min_x),
        'y': int(min_y),
        'width': int(max_x - min_x),
        'height': int(max_y - min_y)
    }

Storage Tools

Supabase Storage

aimq.tools.supabase.read_file

Tool for reading files from Supabase Storage.

Classes

ReadFile

Bases: BaseTool

Tool for reading files from Supabase Storage.

ReadFileInput

Bases: BaseModel

Input for ReadFile.

aimq.tools.supabase.write_file

Tool for writing files to Supabase Storage.

Classes

WriteFile

Bases: BaseTool

Tool for writing files to Supabase Storage.

WriteFileInput

Bases: BaseModel

Input for WriteFile.

Supabase Database

aimq.tools.supabase.read_record

Tool for reading records from Supabase.

Classes

ReadRecord

Bases: BaseTool

Tool for reading records from Supabase.

ReadRecordInput

Bases: BaseModel

Input for ReadRecord.

aimq.tools.supabase.write_record

Tool for writing records to Supabase.

Classes

WriteRecord

Bases: BaseTool

Tool for writing records to Supabase.

WriteRecordInput

Bases: BaseModel

Input for WriteRecord.