Edit
Command Palette
Search for a command to run...
Edit Data Operations
Modify this skill gene.
Gene Details
langfuseskilloptional
Content (Markdown)
Preview
Gene: Data Operations
Description
Data handling and manipulation capabilities. Combines HuggingFace datasets patterns with file operations for comprehensive data work.
Trigger Conditions
- User needs to work with datasets
- Data cleaning or transformation
- File import/export operations
- Data analysis requests
Capabilities
Dataset Operations
- Load datasets (HuggingFace, CSV, JSON, Parquet)
- Filter and select data
- Transform columns
- Handle missing values
- Train/test splits
File Operations
- Read/write various formats
- Batch processing
- File transformation
- Directory traversal
Data Analysis
- Summary statistics
- Distribution analysis
- Correlation analysis
- Data visualization prep
Supported Formats
- CSV/TSV
- JSON/JSONL
- Parquet
- Excel (.xlsx)
- HDF5 (for ML)
Execution Protocol
Step 1: Data Discovery
- Identify data sources
- Determine file formats
- Assess data quality
- Plan transformation pipeline
Step 2: Data Loading
- Load from appropriate source
- Parse format correctly
- Handle encoding issues
- Validate structure
Step 3: Transformation
- Clean missing values
- Encode categorical data
- Normalize/standardize
- Feature engineering
Step 4: Output
- Export to desired format
- Validate output
- Document transformations
Best Practices
- Preserve original data
- Document all transformations
- Validate at each step
- Handle errors gracefully
Guardrails
- Don't overwrite original data without backup
- Handle sensitive data appropriately
- Report data quality issues
- Validate output completeness
Integration
- Works with: code-execution, ml-pipeline, document-generation
- Essential for data-driven tasks