Data Formats
This guide explains the data formats that Graphizy accepts and how to structure your data for optimal performance.
Overview
Graphizy accepts data in two primary formats:
Array Format (
aspect="array") - NumPy arrays with structured columnsDictionary Format (
aspect="dict") - Python dictionaries with named keys
Both formats represent the same information: a collection of objects with IDs and 2D coordinates.
Array Format (aspect=”array”)
Structure: The array format uses a 2D NumPy array where each row represents one object and columns contain the object’s attributes.
Required Columns: - Column 0: Object ID (numeric) - Column 1: X coordinate - Column 2: Y coordinate - Columns 3+: Additional attributes (optional)
Basic Example:
import numpy as np
from graphizy import Graphing
# Basic format: [id, x, y]
data = np.array([
[0, 100, 200], # Object 0 at position (100, 200)
[1, 300, 400], # Object 1 at position (300, 400)
[2, 500, 600], # Object 2 at position (500, 600)
])
# Create grapher and use the data
grapher = Graphing(dimension=(800, 800), aspect="array")
graph = grapher.make_delaunay(data)
Extended Example with Additional Attributes:
# Extended format: [id, x, y, speed, active, type]
data = np.array([
[0, 100, 200, 1.5, 1, 0], # Object 0: speed=1.5, active=True, type=0
[1, 300, 400, 2.3, 1, 1], # Object 1: speed=2.3, active=True, type=1
[2, 500, 600, 0.8, 0, 0], # Object 2: speed=0.8, active=False, type=0
])
# Graphizy will use columns 0, 1, 2 for id, x, y
# Additional columns are preserved but not used for graph creation
Dictionary Format (aspect=”dict”)
Structure: The dictionary format uses a Python dictionary with three required keys, each containing a list of values.
Required Keys:
- "id": List of object IDs (numeric)
- "x": List of X coordinates
- "y": List of Y coordinates
Basic Example:
# Dictionary format
data = {
"id": [0, 1, 2],
"x": [100, 300, 500],
"y": [200, 400, 600]
}
# Create grapher and use the data
grapher = Graphing(dimension=(800, 800), aspect="dict")
graph = grapher.make_delaunay(data)
Extended Example with Additional Attributes:
# Dictionary with additional attributes
data = {
"id": [0, 1, 2, 3],
"x": [100, 300, 500, 700],
"y": [200, 400, 600, 800],
"speed": [1.5, 2.3, 0.8, 1.9],
"color": ["red", "blue", "green", "yellow"],
"active": [True, True, False, True],
"category": ["A", "B", "A", "C"]
}
# Graphizy will use id, x, y for graph creation
# Additional keys are preserved for your use
Converting Between Formats
Array to Dictionary:
def array_to_dict(data_array):
"""Convert array format to dictionary format"""
return {
"id": data_array[:, 0].tolist(),
"x": data_array[:, 1].tolist(),
"y": data_array[:, 2].tolist()
}
# Example usage
array_data = np.array([[0, 100, 200], [1, 300, 400]])
dict_data = array_to_dict(array_data)
Dictionary to Array:
def dict_to_array(data_dict):
"""Convert dictionary format to array format"""
return np.column_stack([
data_dict["id"],
data_dict["x"],
data_dict["y"]
])
# Example usage
dict_data = {"id": [0, 1], "x": [100, 300], "y": [200, 400]}
array_data = dict_to_array(dict_data)
Common Data Sources
From CSV Files:
import pandas as pd
# Read CSV file
df = pd.read_csv("objects.csv") # columns: object_id, pos_x, pos_y
# Convert to array format
data_array = df[["object_id", "pos_x", "pos_y"]].values
# Or convert to dictionary format
data_dict = {
"id": df["object_id"].tolist(),
"x": df["pos_x"].tolist(),
"y": df["pos_y"].tolist()
}
From Object Detection:
# From YOLO or similar detection systems
def detections_to_graphizy(detections):
"""Convert detection results to graphizy format"""
data = []
for i, detection in enumerate(detections):
x_center, y_center = detection[0], detection[1]
data.append([i, x_center, y_center])
return np.array(data)
From Simulation Systems:
# From particle simulation
def particles_to_graphizy(particles, include_velocity=False):
"""Convert particle objects to graphizy format"""
if include_velocity:
return np.array([
[p.id, p.x, p.y, p.vx, p.vy] for p in particles
])
else:
return np.array([
[p.id, p.x, p.y] for p in particles
])
Data Validation
Always validate your data before creating graphs:
from graphizy import validate_graphizy_input
# Validate your data
result = validate_graphizy_input(
data,
aspect="array", # or "dict"
dimension=(800, 800),
verbose=True
)
if not result["valid"]:
print("Data issues found:")
for error in result["errors"]:
print(f" - {error}")
For complete validation details, see the Data Validation guide.
Best Practices
Use numeric IDs only - String IDs will cause errors
Ensure coordinates fit within dimensions - Points outside bounds will generate warnings
Choose array format for large datasets - Better memory efficiency
Choose dictionary format for mixed data types - More readable and flexible
Always validate data before graph creation - Catch issues early
Performance Tips
# For large datasets, use appropriate data types
large_data = np.random.randint(0, 1000, (10000, 3), dtype=np.int32)
# Array format is generally faster for large datasets
grapher = Graphing(aspect="array", dimension=(1000, 1000))