Datasets
Upload and manage your forecasting data
Datasets
Datasets are the foundation of your forecasting workflow. They contain the historical data that AI models use to learn patterns and generate accurate predictions.
Dataset Types
The forecast application supports three types of datasets, each serving a specific purpose in the forecasting process:
1. Main Historical Data
Purpose: Primary time series data containing historical demand/sales information.
Structure:
- Required Columns:
time
,item_id
,value
- Format: CSV
- Frequency: Daily, weekly, monthly, quarterly, or yearly
Example:
2. External Data
Purpose: Additional time series data that may influence demand (prices, promotions, inventory levels).
Structure:
- Required Columns:
time
,item_id
, plus one or more feature columns - Common Features: price, promotion_flag, inventory_level, weather_data
Example:
3. Categorical Data
Purpose: Static information about each item (product category, brand, attributes).
Structure:
- Required Columns:
item_id
, plus one or more categorical columns - No Time Column: This data doesn't change over time
Example:
Uploading Datasets
Step 1: Select Dataset Type
- Navigate to your workspace dashboard
- Click "Upload data" or go to the Datasets section
- Choose the appropriate dataset type for your data
Step 2: Upload Your File
- Supported Formats: CSV
- Encoding: UTF-8 recommended
Step 3: Configure Dataset
- Name: Choose a descriptive name for your dataset
- Frequency: Verify the data frequency (auto-detected)
- Preview: Review the first few rows to ensure correct parsing
Data Requirements
Column Requirements
Main Historical Data
time
: Date/time column (ISO format recommended)item_id
: Unique identifier for each item/productvalue
: Numerical demand/sales values
External Data
time
: Date/time column (must match main dataset frequency)item_id
: Must match items in main dataset- Additional feature columns (numerical or categorical)
Categorical Data
item_id
: Must match items in main dataset- Categorical columns (numerical values)
Data Quality Guidelines
Time Series Requirements
- Minimum Data: At least 12-24 periods of historical data
- Consistency: Regular intervals (no missing periods)
- Completeness: Minimal missing values
- Accuracy: Clean, validated data
Item Requirements
- Unique IDs: Each item should have a consistent identifier
- Consistent Naming: Use the same item_id across all datasets
- Reasonable Volume: 100-10,000 items recommended for optimal performance
Data Processing
Automatic Analysis
When you upload a dataset, the system automatically:
- Detects Data Frequency: Analyzes time intervals to determine frequency
- Validates Data Quality: Checks for missing values and outliers
- Classifies Items: Identifies new, obsolete, and intermittent items
- Generates Insights: Provides data distribution and summary statistics
Item Classification
The system automatically classifies items based on their demand patterns:
- New Items: Started appearing in the last 3 months
- Obsolete Items: No demand in the last 3 months
- Intermittent Items: Irregular demand patterns
- Regular Items: Consistent demand patterns
Dataset Management
Viewing Datasets
- List View: See all datasets with key information
- Details View: Comprehensive dataset information and insights
- Data Preview: Sample of the uploaded data
Dataset Actions
- Edit: Modify dataset name and description
- Download: Export dataset in original format
- Delete: Remove dataset (affects associated forecasts)
Dataset Insights
Data Length Distribution
Shows the distribution of how many periods of data each item has, helping identify:
- Items with sufficient historical data
- Items that may need more data for accurate forecasting
Total Value Over Time
Visualizes the overall demand trend across all items, useful for:
- Identifying seasonal patterns
- Detecting overall business trends
- Understanding data quality
Best Practices
Data Preparation
- Clean Your Data: Remove outliers and handle missing values
- Standardize Formats: Use consistent date formats and item IDs
- Validate Relationships: Ensure item_id consistency
- Check Frequency: Verify data is collected at consistent intervals
Performance Optimization
- Reasonable File Sizes: Keep files under 100MB
- Efficient Formats: Use CSV for large datasets
- Clean Structure: Avoid unnecessary columns
- Consistent Encoding: Use UTF-8 encoding
Troubleshooting
Common Issues
Upload Failures
- File Format: Ensure file is CSV
- File Size: Check if file exceeds 100MB limit
- Encoding: Use UTF-8 encoding for special characters
- Network: Check internet connection for large files
Data Parsing Errors
- Date Format: Use ISO format (YYYY-MM-DD) for dates
- Column Names: Ensure required columns are present
- Data Types: Verify numerical columns contain numbers
- Missing Values: Handle or remove missing data
Processing Errors
- Insufficient Data: Ensure minimum 12 periods of data
- Inconsistent Frequencies: Check for irregular time intervals
- Invalid Item IDs: Verify item_id consistency
- Encoding Issues: Check for special characters
Proper dataset preparation is crucial for accurate forecasting. Take time to clean and validate your data before uploading.