Convect AI Flow Platform Logo

Datasets

Upload and manage your forecasting data

Datasets

Datasets are the foundation of your forecasting workflow. They contain the historical data that AI models use to learn patterns and generate accurate predictions.

Dataset Types

The forecast application supports three types of datasets, each serving a specific purpose in the forecasting process:

1. Main Historical Data

Purpose: Primary time series data containing historical demand/sales information.

Structure:

  • Required Columns: time, item_id, value
  • Format: CSV
  • Frequency: Daily, weekly, monthly, quarterly, or yearly

Example:

time,item_id,value
2023-01-01,PROD001,150
2023-01-02,PROD001,165
2023-01-01,PROD002,200
2023-01-02,PROD002,180

2. External Data

Purpose: Additional time series data that may influence demand (prices, promotions, inventory levels).

Structure:

  • Required Columns: time, item_id, plus one or more feature columns
  • Common Features: price, promotion_flag, inventory_level, weather_data

Example:

time,item_id,price,promotion_flag,inventory_level
2023-01-01,PROD001,29.99,0,500
2023-01-02,PROD001,24.99,1,450

3. Categorical Data

Purpose: Static information about each item (product category, brand, attributes).

Structure:

  • Required Columns: item_id, plus one or more categorical columns
  • No Time Column: This data doesn't change over time

Example:

item_id,category,brand,size,color
PROD001,Electronics,Apple,Medium,Black
PROD002,Clothing,Nike,Large,Red

Uploading Datasets

Step 1: Select Dataset Type

  1. Navigate to your workspace dashboard
  2. Click "Upload data" or go to the Datasets section
  3. Choose the appropriate dataset type for your data

Step 2: Upload Your File

  • Supported Formats: CSV
  • Encoding: UTF-8 recommended

Step 3: Configure Dataset

  • Name: Choose a descriptive name for your dataset
  • Frequency: Verify the data frequency (auto-detected)
  • Preview: Review the first few rows to ensure correct parsing

Data Requirements

Column Requirements

Main Historical Data

  • time: Date/time column (ISO format recommended)
  • item_id: Unique identifier for each item/product
  • value: Numerical demand/sales values

External Data

  • time: Date/time column (must match main dataset frequency)
  • item_id: Must match items in main dataset
  • Additional feature columns (numerical or categorical)

Categorical Data

  • item_id: Must match items in main dataset
  • Categorical columns (numerical values)

Data Quality Guidelines

Time Series Requirements

  • Minimum Data: At least 12-24 periods of historical data
  • Consistency: Regular intervals (no missing periods)
  • Completeness: Minimal missing values
  • Accuracy: Clean, validated data

Item Requirements

  • Unique IDs: Each item should have a consistent identifier
  • Consistent Naming: Use the same item_id across all datasets
  • Reasonable Volume: 100-10,000 items recommended for optimal performance

Data Processing

Automatic Analysis

When you upload a dataset, the system automatically:

  1. Detects Data Frequency: Analyzes time intervals to determine frequency
  2. Validates Data Quality: Checks for missing values and outliers
  3. Classifies Items: Identifies new, obsolete, and intermittent items
  4. Generates Insights: Provides data distribution and summary statistics

Item Classification

The system automatically classifies items based on their demand patterns:

  • New Items: Started appearing in the last 3 months
  • Obsolete Items: No demand in the last 3 months
  • Intermittent Items: Irregular demand patterns
  • Regular Items: Consistent demand patterns

Dataset Management

Viewing Datasets

  • List View: See all datasets with key information
  • Details View: Comprehensive dataset information and insights
  • Data Preview: Sample of the uploaded data

Dataset Actions

  • Edit: Modify dataset name and description
  • Download: Export dataset in original format
  • Delete: Remove dataset (affects associated forecasts)

Dataset Insights

Data Length Distribution

Shows the distribution of how many periods of data each item has, helping identify:

  • Items with sufficient historical data
  • Items that may need more data for accurate forecasting

Total Value Over Time

Visualizes the overall demand trend across all items, useful for:

  • Identifying seasonal patterns
  • Detecting overall business trends
  • Understanding data quality

Best Practices

Data Preparation

  1. Clean Your Data: Remove outliers and handle missing values
  2. Standardize Formats: Use consistent date formats and item IDs
  3. Validate Relationships: Ensure item_id consistency
  4. Check Frequency: Verify data is collected at consistent intervals

Performance Optimization

  1. Reasonable File Sizes: Keep files under 100MB
  2. Efficient Formats: Use CSV for large datasets
  3. Clean Structure: Avoid unnecessary columns
  4. Consistent Encoding: Use UTF-8 encoding

Troubleshooting

Common Issues

Upload Failures

  • File Format: Ensure file is CSV
  • File Size: Check if file exceeds 100MB limit
  • Encoding: Use UTF-8 encoding for special characters
  • Network: Check internet connection for large files

Data Parsing Errors

  • Date Format: Use ISO format (YYYY-MM-DD) for dates
  • Column Names: Ensure required columns are present
  • Data Types: Verify numerical columns contain numbers
  • Missing Values: Handle or remove missing data

Processing Errors

  • Insufficient Data: Ensure minimum 12 periods of data
  • Inconsistent Frequencies: Check for irregular time intervals
  • Invalid Item IDs: Verify item_id consistency
  • Encoding Issues: Check for special characters

Proper dataset preparation is crucial for accurate forecasting. Take time to clean and validate your data before uploading.