Data Formats

This guide explains the data formats that Graphizy accepts and how to structure your data for optimal performance.

Overview

Graphizy accepts data in two primary formats:

  1. Array Format (aspect="array") - NumPy arrays with structured columns

  2. Dictionary Format (aspect="dict") - Python dictionaries with named keys

Both formats represent the same information: a collection of objects with IDs and 2D coordinates.

Array Format (aspect=”array”)

Structure: The array format uses a 2D NumPy array where each row represents one object and columns contain the object’s attributes.

Required Columns: - Column 0: Object ID (numeric) - Column 1: X coordinate - Column 2: Y coordinate - Columns 3+: Additional attributes (optional)

Basic Example:

import numpy as np
from graphizy import Graphing

# Basic format: [id, x, y]
data = np.array([
    [0, 100, 200],    # Object 0 at position (100, 200)
    [1, 300, 400],    # Object 1 at position (300, 400)
    [2, 500, 600],    # Object 2 at position (500, 600)
])

# Create grapher and use the data
grapher = Graphing(dimension=(800, 800), aspect="array")
graph = grapher.make_delaunay(data)

Extended Example with Additional Attributes:

# Extended format: [id, x, y, speed, active, type]
data = np.array([
    [0, 100, 200, 1.5, 1, 0],    # Object 0: speed=1.5, active=True, type=0
    [1, 300, 400, 2.3, 1, 1],    # Object 1: speed=2.3, active=True, type=1
    [2, 500, 600, 0.8, 0, 0],    # Object 2: speed=0.8, active=False, type=0
])

# Graphizy will use columns 0, 1, 2 for id, x, y
# Additional columns are preserved but not used for graph creation

Dictionary Format (aspect=”dict”)

Structure: The dictionary format uses a Python dictionary with three required keys, each containing a list of values.

Required Keys: - "id": List of object IDs (numeric) - "x": List of X coordinates - "y": List of Y coordinates

Basic Example:

# Dictionary format
data = {
    "id": [0, 1, 2],
    "x": [100, 300, 500],
    "y": [200, 400, 600]
}

# Create grapher and use the data
grapher = Graphing(dimension=(800, 800), aspect="dict")
graph = grapher.make_delaunay(data)

Extended Example with Additional Attributes:

# Dictionary with additional attributes
data = {
    "id": [0, 1, 2, 3],
    "x": [100, 300, 500, 700],
    "y": [200, 400, 600, 800],
    "speed": [1.5, 2.3, 0.8, 1.9],
    "color": ["red", "blue", "green", "yellow"],
    "active": [True, True, False, True],
    "category": ["A", "B", "A", "C"]
}

# Graphizy will use id, x, y for graph creation
# Additional keys are preserved for your use

Converting Between Formats

Array to Dictionary:

def array_to_dict(data_array):
    """Convert array format to dictionary format"""
    return {
        "id": data_array[:, 0].tolist(),
        "x": data_array[:, 1].tolist(),
        "y": data_array[:, 2].tolist()
    }

# Example usage
array_data = np.array([[0, 100, 200], [1, 300, 400]])
dict_data = array_to_dict(array_data)

Dictionary to Array:

def dict_to_array(data_dict):
    """Convert dictionary format to array format"""
    return np.column_stack([
        data_dict["id"],
        data_dict["x"],
        data_dict["y"]
    ])

# Example usage
dict_data = {"id": [0, 1], "x": [100, 300], "y": [200, 400]}
array_data = dict_to_array(dict_data)

Common Data Sources

From CSV Files:

import pandas as pd

# Read CSV file
df = pd.read_csv("objects.csv")  # columns: object_id, pos_x, pos_y

# Convert to array format
data_array = df[["object_id", "pos_x", "pos_y"]].values

# Or convert to dictionary format
data_dict = {
    "id": df["object_id"].tolist(),
    "x": df["pos_x"].tolist(),
    "y": df["pos_y"].tolist()
}

From Object Detection:

# From YOLO or similar detection systems
def detections_to_graphizy(detections):
    """Convert detection results to graphizy format"""
    data = []
    for i, detection in enumerate(detections):
        x_center, y_center = detection[0], detection[1]
        data.append([i, x_center, y_center])
    return np.array(data)

From Simulation Systems:

# From particle simulation
def particles_to_graphizy(particles, include_velocity=False):
    """Convert particle objects to graphizy format"""
    if include_velocity:
        return np.array([
            [p.id, p.x, p.y, p.vx, p.vy] for p in particles
        ])
    else:
        return np.array([
            [p.id, p.x, p.y] for p in particles
        ])

Data Validation

Always validate your data before creating graphs:

from graphizy import validate_graphizy_input

# Validate your data
result = validate_graphizy_input(
    data,
    aspect="array",           # or "dict"
    dimension=(800, 800),
    verbose=True
)

if not result["valid"]:
    print("Data issues found:")
    for error in result["errors"]:
        print(f"  - {error}")

For complete validation details, see the Data Validation guide.

Best Practices

  1. Use numeric IDs only - String IDs will cause errors

  2. Ensure coordinates fit within dimensions - Points outside bounds will generate warnings

  3. Choose array format for large datasets - Better memory efficiency

  4. Choose dictionary format for mixed data types - More readable and flexible

  5. Always validate data before graph creation - Catch issues early

Performance Tips

# For large datasets, use appropriate data types
large_data = np.random.randint(0, 1000, (10000, 3), dtype=np.int32)

# Array format is generally faster for large datasets
grapher = Graphing(aspect="array", dimension=(1000, 1000))