Skip to content

Tools API Reference

AIMQ provides a set of built-in tools for document processing and storage operations.

OCR Tools

Image OCR {#image-ocr}

aimq.tools.ocr.image_ocr

Tool for performing OCR on images.

Classes

ImageOCR(**kwargs)

Bases: BaseTool

Tool for performing OCR on images.

Initialize the OCR processor.

Source code in src/aimq/tools/ocr/image_ocr.py
def __init__(self, **kwargs):
    """Initialize the OCR processor."""
    super().__init__(**kwargs)
Attributes
args_schema = ImageOCRInput class-attribute instance-attribute
description = 'Extract text from images using OCR' class-attribute instance-attribute
model_config = ConfigDict(arbitrary_types_allowed=True) class-attribute instance-attribute
name = 'image_ocr' class-attribute instance-attribute
processor = Field(default_factory=OCRProcessor) class-attribute instance-attribute
Functions

ImageOCRInput

Bases: BaseModel

Input for ImageOCR.

Attributes
image = Field(..., description='The image file to perform OCR on') class-attribute instance-attribute
save_debug_image = Field(default=False, description='If True, includes debug image in output showing detected text regions') class-attribute instance-attribute

PDF Processor {#pdf-processor}

aimq.tools.ocr.processor

OCR module for text extraction and processing from images.

This module provides functionality for extracting and processing text from images using the EasyOCR library. It includes utilities for handling text bounding boxes, merging overlapping detections, and creating debug visualizations.

Classes

OCRProcessor(languages=None)

Processor for performing OCR on images using EasyOCR.

This class provides a high-level interface for performing OCR on images. It handles initialization of the EasyOCR reader, image preprocessing, text detection, and optional debug visualization.

Attributes:

Name Type Description
languages

List of language codes for OCR

_reader

Lazy-loaded EasyOCR reader instance

Initialize OCR processor with specified languages.

Parameters:

Name Type Description Default
languages Optional[List[str]]

List of language codes (default: ['en'])

None
Source code in src/aimq/tools/ocr/processor.py
def __init__(self, languages: Optional[List[str]] = None) -> None:
    """Initialize OCR processor with specified languages.

    Args:
        languages: List of language codes (default: ['en'])
    """
    self.languages = languages or ['en']
    self._reader = None
Attributes
languages = languages or ['en'] instance-attribute
reader property

Get or initialize the EasyOCR reader.

Returns:

Type Description
Reader

easyocr.Reader: Initialized EasyOCR reader instance

Functions
process_image(image, save_debug_image=False)

Process an image and return OCR results.

Parameters:

Name Type Description Default
image Union[str, Path, Image, bytes]

The image to process. Can be one of: - Path to image file (str or Path) - PIL Image object - Bytes of image data

required
save_debug_image bool

If True, includes debug image in output

False

Returns:

Type Description
Dict[str, Any]

Dict[str, Any]: OCR results including: - processing_time: Time taken to process in seconds - text: Extracted text content - debug_image: Optional base64 encoded debug image - detections: List of text detections with coordinates

Raises:

Type Description
ValueError

If image format is invalid or unreadable

Source code in src/aimq/tools/ocr/processor.py
def process_image(
    self, 
    image: Union[str, Path, Image.Image, bytes], 
    save_debug_image: bool = False,
) -> Dict[str, Any]:
    """Process an image and return OCR results.

    Args:
        image: The image to process. Can be one of:
            - Path to image file (str or Path)
            - PIL Image object
            - Bytes of image data
        save_debug_image: If True, includes debug image in output

    Returns:
        Dict[str, Any]: OCR results including:
            - processing_time: Time taken to process in seconds
            - text: Extracted text content
            - debug_image: Optional base64 encoded debug image
            - detections: List of text detections with coordinates

    Raises:
        ValueError: If image format is invalid or unreadable
    """
    start_time = time.time()

    # Convert input to a format EasyOCR can process
    if isinstance(image, (str, Path)):
        image_path = str(image)
        pil_image = Image.open(image_path)
    elif isinstance(image, bytes):
        image_stream = io.BytesIO(image)
        pil_image = Image.open(image_stream)
        image_path = None
    elif isinstance(image, Image.Image):
        pil_image = image
        image_path = None
    else:
        raise ValueError("Image must be a file path, PIL Image, or bytes")

    # Convert PIL Image to numpy array for EasyOCR
    if pil_image.mode != 'RGB':
        pil_image = pil_image.convert('RGB')
    np_image = np.array(pil_image)

    # Read the image with optimized parameters
    results = self.reader.readtext(
        np_image,
        paragraph=False,
        min_size=20,
        text_threshold=0.7,
        link_threshold=0.4,
        low_text=0.4,
        width_ths=0.7,
        height_ths=0.9,
        ycenter_ths=0.9,
    )

    # Format initial results
    detections = []
    for result in results:
        if len(result) == 2:
            bbox, text = result
            confidence = 1.0
        else:
            bbox, text, confidence = result

        x1, y1 = int(bbox[0][0]), int(bbox[0][1])
        x2, y2 = int(bbox[1][0]), int(bbox[1][1])
        x3, y3 = int(bbox[2][0]), int(bbox[2][1])
        x4, y4 = int(bbox[3][0]), int(bbox[3][1])

        detections.append({
            "text": str(text),
            "confidence": float(round(float(confidence), 3)),
            "bounding_box": {
                "x": x1,
                "y": y1,
                "width": x2 - x1,
                "height": y3 - y1
            }
        })

    # Group the detections
    grouped_detections = group_text_boxes(
        detections,
        width_growth=20,
        height_growth=1
    )

    end_time = time.time()
    output = {
        "processing_time": float(round(end_time - start_time, 2)),
        "detections": grouped_detections,
        "text": " ".join(d["text"] for d in grouped_detections)
    }

    if save_debug_image:
        debug_image = self._create_debug_image(pil_image, grouped_detections)
        # Convert debug image to bytes
        debug_bytes = io.BytesIO()
        debug_image.save(debug_bytes, format='PNG')
        output["debug_image"] = debug_bytes.getvalue()

    return output

Functions

boxes_overlap(box1, box2)

Check if two boxes overlap at all.

Parameters:

Name Type Description Default
box1 Dict[str, int]

Dictionary with x, y, width, height

required
box2 Dict[str, int]

Dictionary with x, y, width, height

required

Returns:

Name Type Description
bool bool

True if boxes overlap

Source code in src/aimq/tools/ocr/processor.py
def boxes_overlap(box1: Dict[str, int], box2: Dict[str, int]) -> bool:
    """
    Check if two boxes overlap at all.

    Args:
        box1: Dictionary with x, y, width, height
        box2: Dictionary with x, y, width, height

    Returns:
        bool: True if boxes overlap
    """
    h_overlap = (
        box1['x'] < box2['x'] + box2['width'] and
        box2['x'] < box1['x'] + box1['width']
    )

    v_overlap = (
        box1['y'] < box2['y'] + box2['height'] and
        box2['y'] < box1['y'] + box1['height']
    )

    return h_overlap and v_overlap

group_text_boxes(detections, width_growth=0, height_growth=0)

Group text boxes that are spatially related.

This function groups text boxes that are spatially related, starting with overlapping boxes. It can optionally expand boxes horizontally and vertically before grouping to capture nearby text that may be related.

Parameters:

Name Type Description Default
detections List[Dict[str, Any]]

List of detection dictionaries containing text and bounding boxes

required
width_growth int

Number of pixels to expand boxes horizontally

0
height_growth int

Number of pixels to expand boxes vertically

0

Returns:

Type Description
List[Dict[str, Any]]

List[Dict[str, Any]]: List of grouped text detections with merged bounding boxes

Source code in src/aimq/tools/ocr/processor.py
def group_text_boxes(
    detections: List[Dict[str, Any]], 
    width_growth: int = 0, 
    height_growth: int = 0
) -> List[Dict[str, Any]]:
    """Group text boxes that are spatially related.

    This function groups text boxes that are spatially related, starting with
    overlapping boxes. It can optionally expand boxes horizontally and vertically
    before grouping to capture nearby text that may be related.

    Args:
        detections: List of detection dictionaries containing text and bounding boxes
        width_growth: Number of pixels to expand boxes horizontally
        height_growth: Number of pixels to expand boxes vertically

    Returns:
        List[Dict[str, Any]]: List of grouped text detections with merged bounding boxes
    """
    if not detections:
        return []

    def grow_box(box: Dict[str, int]) -> Dict[str, int]:
        """Helper to expand a box by the growth parameters"""
        return {
            'x': box['x'],
            'y': box['y'],
            'width': box['width'] + width_growth,
            'height': box['height'] + height_growth
        }

    groups = [[det] for det in detections]

    while True:
        merged = False
        new_groups = []
        used = set()

        for i, group1 in enumerate(groups):
            if i in used:
                continue

            merged_group = group1.copy()
            used.add(i)

            box1 = grow_box(merge_boxes([det['bounding_box'] for det in merged_group]))

            for j, group2 in enumerate(groups):
                if j in used:
                    continue

                box2 = merge_boxes([det['bounding_box'] for det in group2])

                if boxes_overlap(box1, box2):
                    merged_group.extend(group2)
                    used.add(j)
                    box1 = grow_box(merge_boxes([det['bounding_box'] for det in merged_group]))
                    merged = True

            new_groups.append(merged_group)

        if not merged:
            break

        groups = new_groups

    return [{
        "text": ' '.join(det['text'] for det in sorted(
            group,
            key=lambda d: (d['bounding_box']['y'], d['bounding_box']['x'])
        )),
        "confidence": float(round(
            sum(det['confidence'] for det in group) / len(group),
            3
        )),
        "bounding_box": merge_boxes([det['bounding_box'] for det in group])
    } for group in groups]

merge_boxes(boxes)

Merge a list of bounding boxes into a single box that encompasses all of them.

Parameters:

Name Type Description Default
boxes List[Dict[str, int]]

List of dictionaries with x, y, width, height

required

Returns:

Name Type Description
dict Optional[Dict[str, int]]

Merged bounding box or None if input is empty

Source code in src/aimq/tools/ocr/processor.py
def merge_boxes(boxes: List[Dict[str, int]]) -> Optional[Dict[str, int]]:
    """
    Merge a list of bounding boxes into a single box that encompasses all of them.

    Args:
        boxes: List of dictionaries with x, y, width, height

    Returns:
        dict: Merged bounding box or None if input is empty
    """
    if not boxes:
        return None

    min_x = min(box['x'] for box in boxes)
    min_y = min(box['y'] for box in boxes)
    max_x = max(box['x'] + box['width'] for box in boxes)
    max_y = max(box['y'] + box['height'] for box in boxes)

    return {
        'x': int(min_x),
        'y': int(min_y),
        'width': int(max_x - min_x),
        'height': int(max_y - min_y)
    }

Storage Tools

Supabase Storage

aimq.tools.supabase.read_file

Tool for reading files from Supabase Storage.

Attributes

Classes

ReadFile

Bases: BaseTool

Tool for reading files from Supabase Storage.

Attributes
args_schema = ReadFileInput class-attribute instance-attribute
bucket = Field('{{bucket}}', description='The storage bucket template to read the file from') class-attribute instance-attribute
description = 'Read a file from Supabase Storage' class-attribute instance-attribute
formater = Field('mustache', description='The format to use for the template') class-attribute instance-attribute
name = 'read_file' class-attribute instance-attribute
path = Field('{{path}}', description='The path template to use for the file') class-attribute instance-attribute

ReadFileInput

Bases: BaseModel

Input for ReadFile.

Attributes
bucket = Field('files', description='The storage bucket to read the file from') class-attribute instance-attribute
metadata = Field(None, description='Additional metadata to attach to the file') class-attribute instance-attribute
path = Field(..., description='The path values to apply to the template path') class-attribute instance-attribute

aimq.tools.supabase.write_file

Tool for writing files to Supabase Storage.

Attributes

Classes

WriteFile

Bases: BaseTool

Tool for writing files to Supabase Storage.

Attributes
args_schema = WriteFileInput class-attribute instance-attribute
bucket = Field('{{bucket}}', description='The storage bucket template to read the file from') class-attribute instance-attribute
description = 'Write a file to Supabase Storage' class-attribute instance-attribute
formater = Field('mustache', description='The format to use for the template') class-attribute instance-attribute
name = 'write_file' class-attribute instance-attribute
path = Field('{{path}}', description='The path template to use for the file') class-attribute instance-attribute

WriteFileInput

Bases: BaseModel

Input for WriteFile.

Attributes
bucket = Field('files', description='The storage bucket to read the file from') class-attribute instance-attribute
file = Field(..., description='The file to write') class-attribute instance-attribute
metadata = Field(None, description='Additional metadata to attach to the file') class-attribute instance-attribute
path = Field(None, description='The path values to apply to the template path') class-attribute instance-attribute

Supabase Database

aimq.tools.supabase.read_record

Tool for reading records from Supabase.

Attributes

Classes

ReadRecord

Bases: BaseTool

Tool for reading records from Supabase.

Attributes
args_schema = ReadRecordInput class-attribute instance-attribute
description = 'Read a record from Supabase' class-attribute instance-attribute
name = 'read_record' class-attribute instance-attribute
select = '*' class-attribute instance-attribute
table = 'records' class-attribute instance-attribute

ReadRecordInput

Bases: BaseModel

Input for ReadRecord.

Attributes
id = Field(..., description='The ID of the record to read') class-attribute instance-attribute
select = Field(None, description='The columns to select') class-attribute instance-attribute
table = Field(None, description='The table to read from') class-attribute instance-attribute

aimq.tools.supabase.write_record

Tool for writing records to Supabase.

Attributes

Classes

WriteRecord

Bases: BaseTool

Tool for writing records to Supabase.

Attributes
args_schema = WriteRecordInput class-attribute instance-attribute
description = 'Write a record to Supabase. If an ID is provided, updates existing record; otherwise creates new record.' class-attribute instance-attribute
name = 'write_record' class-attribute instance-attribute

WriteRecordInput

Bases: BaseModel

Input for WriteRecord.

Attributes
data = Field(..., description='The data to write') class-attribute instance-attribute
id = Field(..., description='The ID of the record to update (if updating existing record)') class-attribute instance-attribute
table = Field(..., description='The table to write to') class-attribute instance-attribute