Introduction
DataFlowMapper is a lightweight yet powerful data transformation tool designed to bridge the gap between expensive enterprise ETL solutions and simple data cleaners. It's built for implementation teams and data migration specialists working with CSV, Excel, and JSON files.
Our platform enables you to create repeatable transformations with an intuitive interface, powerful logic builder, and AI-assisted mapping capabilities.
Getting Started
Get up and running with DataFlowMapper quickly. Learn how to run your first data transformation.
Account Setup
Schedule a demo to get your account set up.
Note: DataFlowMapper is currently offering a 30-day trial program available for all new teams & users.
Quick Start Guide
This guide will help you complete your first data transformation in minutes:
Select your file format and upload your CSV, Excel, or JSON file to begin the process.
Define how your source data maps to your desired output format.
Execute the transformation and download your transformed data.
For a more detailed walkthrough, continue to the Basic Workflow section below.
Basic Workflow
The standard DataFlowMapper workflow consists of three main steps:
1. Upload Source File
Start by uploading your source data file. DataFlowMapper supports:
- CSV files (with custom delimiters)
- Excel spreadsheets
- JSON files
You can specify header rows, concatenate multiple files, and configure other import settings.
2. Create or Upload Mapping
You have two options:
- Upload an existing mapping file - If you've previously created a mapping for similar data, you can reuse it.
- Create a new mapping - Use the Mapping Editor to define how your source data maps to your destination format.
- Import headers from a template file or manually define the header fields. System automatically detects file format and pulls first row or first JSON record
- Create direct field mappings by selecting from source field dropdown or dragging and dropping from the sidebar
- Add custom transformation logic as needed by clicking Add Custom Logic
- Add validation logic as needed by clicking Add Validation
- Specify output format and save mapping
3. Transform and Download
Once your mapping is complete:
- Execute the transformation
- Transformed data is automatically downloaded in your chosen format
- Review the results and any validations
Pro Tip: Accelerate with AI
Use the AI-Copilot features to speed up your mapping process. Try "Suggest Mappings" to automatically map fields or "Map All" to generate a complete mapping from a plain English description.
Transformation Engine
Overview
The Transformation Engine is a high-performance, Python-based processing pipeline designed for reliability and repeatability. It operates on a row-by-row streaming architecture, ensuring memory efficiency even for large datasets. Upon signing in, you will be on the Transformation Engine screen. Key principles include:
- Stateless Processing: Each row is transformed independently (unless stateful functions like
GroupCounterare used), ensuring consistent results regardless of chunk size or execution order. - Repeatability: The engine is deterministic; the same input + mapping always yields the same output.
- Unified Pipeline: All data, whether from CSV, Excel, JSON, API, or DB, is normalized and read as strings into a standard DataFrame structure before processing.
Source Configuration
Source Upload
The file upload system is built on a file upload dropzone and handles the initial ingestion of data. Users can drop files in the area or manually click to browse.
- Supported File Types:
- CSV/Text: Flexible support for standard CSV, TSV, pipe-delimited files, and custom delimited text files.
- JSON: Supports both flat arrays of objects and nested structures (auto-detects record paths). When uploading JSON files, the system will automatically detect potential record paths and prompt selection. The data will then be recursively expanded and flattened to effectively be mapped in a tabular fashion.
- Excel: Supports
.xlsxand.xlsformats.
JSON Path Syntax
When working with JSON files, DataFlowMapper uses a consistent path syntax to reference fields within nested structures. Understanding this syntax is essential for accurate mapping and transformation.
- Record Path: When parsing files in the UI, either source files, lookup tables, or to pull header fields, the system will automatically generate this syntax for you.
- Consistent Field Names: Field names in the mapping interface directly correspond to their paths in the JSON structure.
- Specific Array Indices: For non-record path arrays, specific indices (like
[0],[1]) are used rather than wildcards. - Record Path Handling: The record path (selected in the UI) is the only place where wildcards (
[*]) are used. - Deep Nesting: The system handles JSON structures of any depth (e.g.,
employees[0].projects[0].description). - Array Expansion: Arrays are automatically expanded into individual fields (e.g.,
skills[0],skills[1]).
Example Structure:
{ "employees": [ { "name": "John", "skills": ["JS", "Python"] } ] }
If employees is the record path, you can map:
-
employees[*].name -
employees[*].skills[0] -
Constraints:
- Max File Size: Supports any file size.
- Encoding: Automatic detection and standardization to UTF-8 before processing.
-
Drag-and-Drop: Visual indicators for valid/invalid file types based on the selected mode (CSV vs. JSON vs. Excel).
-
Resetting State: Users can remove uploaded source files or the mapping file directly from the Transformation Form by clicking the "X" button next to the file name in the bottom portion of the screen. This resets the respective configuration, allowing you to upload a new file or start fresh.
Input Types
Near the upload area for source data, there is a selector to change the source mode between File, API, DB, and Cloud. API, DB, and Cloud connections can be configured by clicking on the source in the selector, then clicking the gear icon on the respective source area.
- File: Supported formats: CSV, Excel, JSON. Upload files directly.
- API:
- Connection Manager: A centralized hub accessible via the settings (gear) icon. It organizes your API landscape into three tabs: Connections (Authentication), Sources (Read endpoints), and Destinations (Write endpoints).
- Authentication: Supports Basic Auth, API Key (Header or Query Parameter), Bearer Token, and OAuth2 Client Credentials.
- Security: All credentials are encrypted at rest using Fernet encryption.
- Source Configuration:
- Methods: Supports
GETandPOST. ForPOSTrequests, you can define payloads usingJSONorForm Datatemplates. - Response Parsing (Record Path): The system automatically scans JSON responses to detect arrays (e.g.,
data.items). The Record Path Selector allows you to choose exactly which nested array to extract rows from, or use the root array.
- Methods: Supports
- Pagination: Implements multiple strategies configurable in the source settings:
- Offset-based: Standard
limitandoffsetparameters. - Page-based: Uses
pageandper_pageparameters.
- Offset-based: Standard
Destination Configuration (Push Data)
The system supports pushing transformed data to external systems. Destinations are configured within the same Connection Manager used for sources.
-
API Destinations:
- Methods: Supports
POSTandPUToperations. - Body Format: Configurable as
JSONorForm Data. The output of the transformed data will become the payload. - Batching: Users can define a
Batch Sizeto chunk large datasets into multiple requests. - Validation: You can define a list of "Expected Response Codes" (default:
200,201,202). If the API returns a code outside this list, the batch is marked as failed. - Response Viewer: Upon completion, a modal displays the full API response payload, separated into "Our Response" (internal status) and "Destination Response" (external API body) for debugging.
- Methods: Supports
-
Database Destinations:
- Operations:
- Insert: Adds new rows to a table. Identity/Auto-increment columns are automatically detected and excluded from the insert payload to prevent errors.
- Update: Modifies existing rows. Requires explicit selection of Key Columns (Primary Keys) to construct the
WHEREclause. - Stored Procedure: Passes row data as parameters to a procedure.
- Mapping:
- By Name: Auto-matches source fields to destination columns by name.
- By Position: Maps fields based on their order index.
- Operations:
File Settings Panel
Located next to the source selection toggle, accessed via the Gear Icon.
- Header Management:
- Header Row Input: Allows specifying the row number containing headers. Note: Uses 0-based index logic in UI tooltips ("Row 0 = No header"), but defaults to 1 (standard 1-based count) for user intuition.
- Refresh Headers: A
Refreshbutton may appear upon loading a mapping file. It triggers a re-parse of the already uploaded file with new settings (delimiter, header row). This prevents the need to re-upload the file just to change parsing options. - Drift Warning: If settings are changed manually (e.g., changing delimiter from comma to pipe), a warning appears indicating the current preview data may be out of sync with the new settings. Users are give the option to refresh headers. This is the same refresh button as mentioned above in Refresh Headers.
- Multi-File:
- Concatenation: Enabling "Multiple files" allows uploading multiple source files. The backend concatenates them into a single dataset before processing. Headers are taken from the first file. Column layout should be the same between each file.
- Source Filename Injection: An optional checkbox ("Add file name column") injects a new column
source_filenameinto the dataset, allowing you to track the origin of each row.
Mapping Configuration
Loading Mappings
- From Library: Users can load mapping templates directly from the Template Library.
- Access: Click "Load from Library" in the Transformation Form.
- Browsing: The library displays both Personal templates (created by you) and Shared templates (created by team members).
- Process: Selecting a template fetches the encrypted content, decrypts it, and parses it into a virtual mapping file. This triggers the standard reconciliation process for any associated lookup tables.
- From File: Users can upload a previously saved
.dfmor.csvmapping file.- File Structure: The
.dfmfile is technically a CSV file containing metadata headers (source config), destination fields, and transformation logic. - Upload: Drag and drop the file into the "Mapping File" zone.
- File Structure: The
- Validation:
- Structure Check: The system verifies the file follows the expected CSV structure.
- Reference Reconciliation: (See "Reference Checks" below) The system verifies that all Lookups/Reference Tables used in the mapping exist in your library.
- Source Field Validation: When entering the Mapping Editor or running a transformation, the system validates that the source fields defined in the mapping actually exist in your currently uploaded Source File. If fields are missing, you will be warned but allowed to proceed to fix the mapping.
Reference Checks (Technical Deep Dive)
When loading a mapping file (.dfm or .csv), the system performs a rigorous check to reconcile any included lookup tables with the user's personal Library. This ensures data integrity while preventing accidental overwrites. The logic categorizes incoming tables into four "buckets":
- Bucket A: Exact Match (Hash Exists)
- Scenario: The incoming table's content hash matches a table already in the user's library.
- Action: The system automatically links to the existing library table. No user action is required.
- Bucket B: Name Reconciliation
- Scenario: The table exists in the library (same content hash), but the name in the mapping file differs from the name in the library (e.g., "CustData" vs "Customer_Data").
- Action: The system auto-updates the mapping configuration to use the Library's name. It effectively "heals" the mapping to match the local environment.
- Bucket C: Hash Conflict (Critical)
- Scenario: A table with the same name exists in the library, but the content hash is different. This indicates a drift between the mapping's expected data and the user's local copy.
- Action: A Hash Reconciliation Modal appears, forcing the user to resolve the conflict via one of three paths:
- Override: Replace the local library table with the version from the mapping file.
- Rename: Import the mapping's version as a new, separate table with a different name.
- Use Library Version: Ignore the mapping's version and use the existing local table (useful if the local version is newer).
- Bucket D: New Table
- Scenario: The table does not exist in the library (neither name nor hash matches).
- Action: The system prompts the user to import the new table into their library via the New Tables Found Modal.
Execution
Job Execution & Configuration
Located in the main Transformation Form, the execution controls allow you to fine-tune how your data is processed.
- Transform Data Button:
- Pre-Flight Check: Initiates a secure WebSocket connection.
- Handshake Protocol: Enforces a strict "ready_check" -> "ready_ack" handshake to ensure channel stability before processing begins.
- Mapping Validation: The system scans the source file and mapping file to confirm all fields that are referenced exist in the source file.
- Run Configuration (Settings):
Accessed via the Sliders Icon next to the history button, this popover allows you to override default behaviors for specific runs.
- Run Validations: (Default: Checked) Toggles whether the system performs validation checks against your rules. Unchecking this skips validation logic for faster processing during initial testing.
- Row Limit: Allows you to process a specific number of rows (e.g., 50) from the start of your file. This is ideal for rapidly testing logic changes without waiting for large datasets to process.
- Override Indicator: An amber indicator appears on the settings icon if any defaults are changed (e.g., Row Limit is active), providing a visual cue that the job will run with custom parameters.
Transformation History
Accessed via the History Button (Clock icon), the Run History panel provides a secure audit trail of your recent transformation jobs.
- Session Continuity: View the real-time status of all jobs (Queued, Processing, Completed, Failed) from your recent sessions.
- Ephemeral Retention: For enterprise security, job data and download links are strictly retained for 24 hours. After this period, data is automatically purged from the system to ensure compliance.
- Actions:
- Restore: Click any job to load its results, logs, and download links into the main view.
- Reprocess: Quickly re-run a past job with the exact same settings or modified parameters. Key Benefit: This allows you to iterate on your mapping logic without needing to re-upload the source file each time, significantly speeding up the development cycle.
- Status Indicators: Visual cues provide instant feedback on job health, including breakdown counts for validation errors vs. system errors.
Error Handling & Reliability
DataFlowMapper is engineered to be resilient. Data transformation is messy, and source data is often imperfect. Our engine ensures that a single bad character, unexpected null value, or divide-by-zero error does not crash your entire job.
Isolation Architecture
Cell-Level Isolation
The engine is granular. If a specific cell triggers a transformation error (e.g., trying to divide by zero in the "Price" field):
- Catches the error for that specific cell.
- Logs the issue with a user-friendly explanation.
- Leaves that cell empty in the output.
- Continues processing other fields in the same row.
- Continues processing subsequent rows.
This means a single calculation error doesn't discard the entire row's data. You keep the valid data (Name, Date, etc.) while only losing the specific value that failed. The row will be flagged as "Failed" in the summary statistics to alert you to the partial data loss.
Field-Level Compilation
Logic for each field is compiled independently. If you make a syntax error in the custom logic for one field (e.g., missing a closing parenthesis):
- That specific field will be flagged and skipped (resulting in empty values).
- All other fields will continue to process normally.
- You will receive a warning log identifying exactly which field has the syntax error.
Intelligent Error Reporting
Instead of cryptic Python tracebacks, the system maps technical errors to helpful, actionable feedback via our Error Catalog. This feedback can be found in the log post-transformation.
System Reliability
- Proactive Type Safety: Before saving results, the system scans for mixed data types (e.g., a column with both Numbers and Text) and automatically standardizes them to prevent file corruption or schema errors. This is only done when required as to prevent any unexpected data corruption or interference.
- Timeouts: To protect system resources, infinite loops or extremely complex logic will trigger a safety timeout. This is one of the few "Fatal" errors that will stop a job to prevent hanging.
Transformation Log Summary
A detailed report card displayed after execution in the transformation summary card.
- Metrics: Visual bars showing:
- Processed: Total rows handled.
- Pre-Filtered: Rows removed before transformation logic.
- Post-Filtered: Rows removed after transformation logic.
- Errors: Rows that failed transformation.
- Warnings: Non-fatal issues.
- View Log: Opens a modal with the raw text log for deep debugging.
- Push Data: Dropdown to send results directly to API, Database, or Cloud Storage. Hovering over the Push Data button will allow you to choose which destination type to upload to. An expandable section will appear with a destination selector depending on the destination type and subsequent modals to configure where and how to push the data.
- Download: A secure download link for the transformed file. Security Note: This link has a strict 24-hour expiration timer for data security.
Data Viewer
An interactive preview of the transformation results and data validation.
- Error Navigation: A "Find error" button jumps the view directly to the next row containing a validation error.
- Filtering:
- Show Errors Only: Toggle to hide all successful rows and focus on fixes.
- Column Filtering: Searchable multi-select checkboxes for filtering specific column values or specific errors.
- Export: "Export Report" button allows downloading the current view (respecting active filters) for offline analysis with validations and messages highlighted in the cells.
- Row Deletion: Users can select and "Delete Selected" rows from the dataset before final export.
AI-Powered Data Onboarding (Agents)
Auto-Generate
The AI-Powered Data Onboarding system (Async AI) automates the creation of complex mapping configurations by using an iterative, agentic approach. It runs as a background job, allowing users to continue working while the AI analyzes, maps, and refines the data transformation logic. (The Async AI agent doesn't create validation rules)
- Entry Point: Accessed via the AI button on the main Transformation Form, inside the Mapping Configuration card. Options include "Generate Mapping", "View Jobs", and "Configure Instructions".
Instruction Profile
Instruction Profiles are reusable blueprints that guide the AI agent. They define what the output should look like and how to handle the data.
- Creation Flow:
- Basic Info: Name and optional description.
- Template Selection: Choose a mapping file from the library. This defines the target schema (fields) and validation rules by taking them from an existing template. The validation rules from this template serve as a guardrail for the AI's logic.
- Instructions: A free-text system prompt (e.g., "Map price data from source to destination. Ensure all prices are USD. Validate positive numbers.").
- Review: Summary of settings and advanced configuration (Max Attempts, Sample Size).
- Advanced Settings:
- Max Refinement Attempts: (Default: 3, Max: 5) How many times the AI should try to fix errors by analyzing logs and adjusting logic.
- Sample Size: (Default: 1000) Number of rows to use for validation sampling during the refinement process.
Job Kickoff
The process of starting a new AI generation task.
- Prerequisites: A source file must be uploaded in the Transformation Form.
- Steps:
- Source Info: Confirms the file name, type (CSV/JSON/Excel), and parsing settings.
- Reference Tables (Optional): Users can upload new reference files or select existing lookup tables from their library. These tables are made available to the AI context for lookups and enrichment.
- Review: Displays a summary of the profile, source file, and credit estimation before starting.
- Credit Check: The system estimates credit usage based on complexity and blocks execution if the user has insufficient credits.
Job List (Monitoring)
A centralized dashboard to track the status of all AI jobs.
- Statuses:
QUEUED: Waiting to be processed.GENERATING_MAPPING: AI is creating the initial field-to-field logic.TRANSFORMING: Running the generated logic against the sample data.VALIDATING: Checking results against template validation rules.ANALYZING: AI is reviewing error logs to understand root causes.REFINING: AI is rewriting logic to fix identified errors (looping back to transform).COMPLETED: Successfully finished.COMPLETED_WITH_WARNINGS/ERRORS: Finished but some rows failed or raised warnings.FAILED: System error or max attempts reached without success.
- Actions:
- View Details: Opens the detailed Job Report.
- Cancel: Stops a running job (revokes the Celery task).
- Delete: Removes the job record and associated temporary files.
Job Detail & Results
Clicking a completed job opens the Job Detail Modal, which serves as the final report and integration point.
- Metrics: Displays a summary of rows processed, success rate, and error counts.
- Root Cause Analysis: If errors occurred, the system categorizes them into:
- BAD_LOGIC: The AI generated incorrect code (e.g., syntax error, wrong logic). The agent attempts to fix these automatically.
- BAD_DATA: The source data itself is flawed (e.g., "N/A" in a numeric field). The agent flags these and does not retry, as code cannot fix bad data.
- Apply Mapping: A primary action that loads the generated mapping configuration directly into the Mapping Editor, allowing the user to review and finalize the setup manually.
- Download Output: Provides a link to download the transformed sample data.
Technical Architecture (Invisible Logic)
- Celery Pipeline: Jobs are executed as asynchronous tasks.
- Iterative Refinement:
- Attempt 1: The AI generates an initial mapping for all fields based on the profile instructions and source statistics.
- Error Analysis: If validation or transformation errors occur, the system parses the logs and categorizes errors.
- Targeted Refinement: For subsequent attempts (2 to N), the AI only re-generates logic for fields identified as having "BAD_LOGIC". It preserves the working logic for other fields.
- Fresh Template Data: At runtime, the job fetches the latest version of the mapping file to ensure it uses the most up-to-date fields and validation rules, even if the profile was created weeks ago.
- Cleanup: A scheduled job runs to remove expired temporary files (transformed outputs, source files) from storage to manage security.
Mapping Editor
The Mapping Editor is the central workspace where users define how source data is transformed into the desired output format by creating or editing mapping files.
New Mapping Field Configuration
The Mapping Editor requires you to define destination fields which will determine the fields your transformed data has. Upon opening the Mapping Editor to create a new mapping, you'll be prompted to choose how you want to start the mapping.
- Start from Scratch: Each field gets created manually. After selecting this option, the overlay closes and you're left with an empty Mapping Editor. By clicking add field in the top left corner of the mapping editor, you can create additional fields (represented by field mapping cards)
- Get Fields from File: If you are starting a new mapping from scratch you can use the "Get Fields from File" option in the initial overlay. This allows you to upload a sample file (CSV/Excel/JSON) solely to extract the header names (first row only), populating the field mapping cards for each field so you can begin mapping immediately. When uploading a JSON file, the user will be prompted to select the record path and the system will automatically determine the proper syntax to create the transformed JSON in that format.
AI Copilot Tools
Suggest Mappings
This feature uses a backend algorithm to analyze field names and sample data to propose 1-to-1 connections between source and destination fields.
- Confidence Scores: Suggestions are color-coded based on confidence:
- Green (90-100%): Exact or near-exact matches (e.g., "Email" -> "email_address").
- Yellow (70-89%): Strong semantic matches (e.g., "DOB" -> "Birth Date").
- Red (<50%): Low confidence or speculative matches.
- Behavior: Users can "Select All" high-confidence matches or manually toggle individual proposals before applying them. The AI will provide its rationale for each field and why it made the judgement.
Map All (Orchestrator)
A powerful, multi-step wizard that converts plain English instructions into a complete mapping configuration.
- Analyze: The AI scans user requirements and categorizes them into Direct Mappings, Transformations, Validations, and Filters.
- Prime: A shared context is established, including sample data and available functions.
- Generate: The system executes parallel logic generation tasks for every field.
- Review: A detailed review table appears, allowing users to inspect the generated code and rationale for each field before final approval. Users can selectively approve or decline each piece of the configuration.
Field Mappings
Each card in the editor represents a single destination field and its transformation logic.
- Structure:
- Destination Field: The target column name (e.g.,
Full Name). - Source Field: The input column (e.g.,
first_name) for direct 1:1 mapping. - Custom Logic: Python code for complex transformations (e.g.,
row['first'] + ' ' + row['last']). - Validation: Logic to check data integrity (e.g.,
len(value) > 0).
- Destination Field: The target column name (e.g.,
- Drag-and-Drop: Fields can be reordered via drag-and-drop by clicking and dragging the grip in the top left corner
- Layouts: Users can toggle between a "Card" view (Horizontal) and a "List" view (Vertical) depending on their screen real estate. This can be done by clicking the Switch to Vertical/Horizontal button in the bottom left corner of the Mapping Editor.
Rules & Constraints
To ensure data integrity, the system enforces the following rules for field mappings:
- Mandatory Naming: Every destination field must have a non-empty name.
- Unique Names: Destination field names must be unique across the entire mapping. Duplicate names will be flagged as errors.
- Mutually Exclusive Logic: A field can be mapped via Direct Source Mapping (selecting a source column) OR Custom Logic (using Python code), but not both.
- Priority: If Custom Logic is present, it takes precedence.
- UI Behavior: The system prevents simultaneous selection to avoid ambiguity.
Saving Mappings
- Save to Library:
- Ownership Check:
- Owned Template: If you own a template with the same name, the system updates it to the next version upon saving, effectively overwriting.
- Shared Template: If the template is owned by someone else, the system forces a "Save as Copy" to prevent overwriting shared resources.
- Metadata: Users can tag templates and add metadata (Customer, Source Vendor) to organize their library.
- Ownership Check:
- Save Local:
- File System Access: If supported by the browser, the system uses the native "Save As" dialog to write directly to the user's disk. Otherwise, it triggers a standard browser download of the
.dfm(CSV) file.
- File System Access: If supported by the browser, the system uses the native "Save As" dialog to write directly to the user's disk. Otherwise, it triggers a standard browser download of the
- Outbound File Type
- Configure Outbound File Type: Above the save buttons is a selector with options of CSV, Excel, or JSON. Whichever is selected determines what the output of the transformation will be.
Advanced Features
- Pre-Filters:
- Logic that runs before any row transformations.
- Use Case: Skipping bad data, ignoring empty rows, or filtering out data based on raw source values.
- Post-Filters:
- Logic that runs after all transformations are complete.
- Use Case: Validating the final output structure or removing rows where calculations resulted in specific values (e.g., "Remove rows where Total < 0").
- Importing Validations:
- Users can import validation rules from other mapping files (
.dfmor.csv). The system matches rules to destination fields by name. Validations can be pulled from templates in the library or from.dfmmapping files via upload. - Conflict Resolution: If a field already has validation logic, the system warns the user that it will be overwritten.
- Users can import validation rules from other mapping files (
Logic Builder
The Logic Builder is a dual-mode interface (Visual + Code) that allows users to define custom transformation logic for a specific field. It bridges the gap between drag-and-drop simplicity and full Python flexibility.
AI Logic Assist
The AI Logic Assist modal allows users to describe their desired transformation in plain English, and the system generates the corresponding Python code.
- Access: Click the "AI Logic Assist" button (Wand icon) in the bottom right corner next to the Save button.
- Functionality:
- Prompting: Users type a description (e.g., "If the status is 'active', return 'Yes', otherwise 'No'"). The existing code is passed as context to the AI so it can be used for troubleshooting and refinement.
- Generation: The system generates Python code using standard library functions if possible and falling back to any Python for complex scenarios.
- Preview: The generated code is displayed with an explanation.
- Apply: Clicking "Apply" inserts the code into the Manual Tab, replacing any existing code. Click Save or Parse to UI to retain the changes.
- Modes: It supports generation for Transformation Logic, Validation Logic, and Filter Logic depending on the context.
Progress Checklist & Execution Order
The right-hand panel of the Logic Builder displays the Progress Checklist, which serves a dual purpose as both a visual history and an execution controller.
- Execution Order: The checklist strictly defines the Python execution order (top-to-bottom). The generated code will execute operations exactly as they appear in this list.
- Visual History: It lists all variables, logic blocks, and function calls created for the current field.
- Drag-and-Drop Reordering: Users can drag items in this list to reorder them. Reordering items in the checklist directly rewrites the underlying code sequence to match the new order.
- Smart Grouping: Complex logic structures, such as
If/Else If/Elsechains, move as a single unit to maintain logic integrity. You cannot drag anElseblock away from its parentIfblock. - Editing: Clicking the pencil icon any item in the checklist (e.g., a Variable or Function) instantly switches the main view to the relevant tab and populates the form with that item's details for editing.
- Deletion: Clicking the trash icon any item in the checklist (e.g., a Variable or Function) deletes it from the generated code and logic builder state.
- Unparsable Blocks: Code sections that cannot be represented visually (e.g., complex loops) appear as "Unparsable" items. Clicking the "Edit" pencil on these items automatically switches you to the Manual Tab, highlighting the specific lines of code for manual review.
Core Interface
The Logic Builder interface is divided into 5 main tabs, each representing a stage in the logic construction process. It uses a state synchronization engine to ensure that changes in visual blocks are instantly reflected in the underlying Python code, and vice versa.
- Initialization (Auto-Parsing): When you open the Logic Builder for a field that already has logic, the system automatically sends the existing Python code to the backend parser. It attempts to reconstruct the visual state (Variables, Logic Blocks, Functions). If parts of the code are too complex to be represented visually (e.g., custom loops), they are flagged as "Unparsable" but preserved safely in the execution flow.
1. Variables Tab
The Variables Tab is the starting point for defining intermediate values, calculations, or clean references to source data. Variables created here are local to the transformation function for the specific row being processed.
- UI Components:
- "Add Variable" Button: Located at the bottom of the variables list. Clicking it adds a new row to the definition table.
- Name Input: A text field to define the variable name (e.g.,
clean_price). The system validates that names are valid Python identifiers. - Value Input: A Switchable Input field (see below) that allows you to assign a value.
- Delete Button (Trash Icon): Located on the right of each variable row.
- Workflow:
- Click Add Variable.
- Define a Name and a Value.
- Click Apply Variables to save the variable to the session state and add it to the Progress Checklist.
- Scope: Variables declared here are local to the transformation function for this specific row.
- Inputs: Users define a Name and a Value.
- Value Types:
- Static: Hardcoded values (strings, numbers).
- Source: References to source fields (e.g.,
row['Field A']). - Function: Result of a function call.
- Switchable Inputs: Most value inputs allow toggling between Static Value (hardcoded text/number) and Source/Variable/Function (dynamic reference) modes. This is indicated by a toggle icon inside the input field.
2. If/Then Tab
The If/Then Tab is a visual block builder for creating conditional logic trees. It allows you to define complex branching execution paths without writing code.
- UI Components:
- "Add If Block" Button: Located at the top of the canvas. Creates a new root-level conditional block.
- Condition Builder: Inside each block, you can add conditions.
- + Condition: Adds a single comparison (e.g.,
Field A == 'Value'). - + Group: Adds a nested group of conditions (e.g.,
(A==1 AND B==2)).
- + Condition: Adds a single comparison (e.g.,
- Action Definition: Each block has an "Actions" section defining what happens if the condition is TRUE.
- Set Variable: Assign a value to a pre-defined variable.
- Set Result: Immediately return a value and exit the function.
- Run Function: Execute a side-effect function (like
ADD_TO_LIST).
- Nesting Controls: Buttons to add
Else IforElsebranches to an existingIfblock at root level or on nested blocks.
- Workflow:
- Click Add If Block.
- Build your conditions (e.g.,
Field A == 'Value'). - Define actions to take if the condition is true.
- Click Apply Logic to commit the block to the Progress Checklist.
- Structure: Supports
If,Else If, and blocks.
3. Functions Tab
The Functions Tab provides access to a comprehensive library of pre-built Python functions for data manipulation (String, Math, Date, Utility, Conversion).
- UI Components:
- Search Bar: Real-time filter to find functions by name or description.
- Category Filter: Chips to filter by type (String, Math, Date, etc.).
- Function List: Clickable cards for each available function.
- Configuration Form: Appears when a function is selected.
- Parameter Inputs: Dynamic inputs based on the function's requirements.
- Variable Name (Output): If the function returns a value, you must assign it to a variable name.
- Workflow:
- Give the function a descriptive name. This will be used to reference the function in other inputs and to track in the progress checklist.
- Select a function from the library (Search or Browse).
- Configure the required Parameters.
- Click Apply Function to save the function call.
- Categories: String, Math, Date, Utility, Conversion.
- Configuration: Selecting a function opens a configuration form. User's must give each function they define a name that doesn't conflict with reserved words.
- Parameters: Inputs can be static values, source fields, or variables.
- Switchable Inputs: Some inputs allow toggling between "Static Value" and "Source" modes. "Source" mode allows for an input with dropdowns of either variables, source fields, functions, or a mix of one or more.
- Output: The result of a function is typically assigned to a variable or used as the final result.
4. Return Result
The Return Result Tab defines the final output value of the transformation logic. This is mandatory for "Transform Mode".
- UI Components:
- Result Type Selector: Dropdown to choose between
Static Value,Variable, orFunction Result. - Value Selector: Context-aware input to select the specific value or variable to return.
- Test Console: A collapsible panel at the bottom of the screen.
- "Run Test" Button: Executes the current logic against the first 500 rows of sample data.
- Output Preview: Shows the
Input(Source) andOutput(Result) side-by-side for verification.
- Result Type Selector: Dropdown to choose between
- Workflow:
- Select the return type (Static, Variable, or Function).
- Choose the specific value/item.
- Click Apply Result.
- Types:
- Static: A fixed value.
- Variable: The value of a variable (potentially modified by If/Then logic).
- Function: The direct result of a function call.
- Test Console: A built-in tester that allows running the current logic against sample rows (1-500) to verify the output without saving.
5. Manual / Python Code
The Manual Tab features a full-featured Monaco code editor, providing raw access to the underlying Python logic. This tab acts as the ultimate source of truth.
- UI Components:
- Code Editor: Syntax-highlighted Python editor with auto-completion.
- "Parse to UI" Button (Checkmark): Located in the top-right of the editor toolbar. This attempts to convert your manual code back into visual blocks.
- Syncing Direction:
- Visual → Code: As you build logic in the other 4 tabs, this code updates in real-time.
- Code → Visual: If you write code manually, you must click the "Parse to UI" button (Checkmark icon) to sync it back to the visual blocks.
- Unparsable Blocks:
- Definition: Valid Python code that doesn't map to a specific UI component (e.g.,
forloops,try/exceptblocks, or complex class definitions). - Behavior: These blocks are preserved exactly as written. In the visual tabs and checklist, they appear as "Unparsable Elements". You can still edit them here in the Manual Tab.
- Execution: Crucially, unparsable code is still valid. It will execute correctly during transformation. The "Unparsable" label refers only to the UI's inability to render a drag-and-drop block for it.
- Definition: Valid Python code that doesn't map to a specific UI component (e.g.,
- Deletion: To remove unparsable logic, you can either:
- Delete the code directly in this editor and click Save or Parse to UI
- Click the trash icon on the corresponding "Unparsable" item in the Progress Checklist in most scenarios.
- Environment: The code runs in a sandboxed Python environment where
rowis a dictionary of the current row's data. In the sandbox, import statements are prohibited, but the following are available as injected context: 'Pandas': pd, 're': re, 'math': math, 'Decimal': Decimal.
Global Actions
- Save Logic: Validates the code and saves it to the mapping configuration. If errors exist (syntax errors, missing variables), saving is warned. Code from the Manual tab is always saved over the read-only Python Preview.
- X Button The logic builder can be closed by clicking the X in the top right corner. The user will be prompted about leaving with unsaved changes.
Technical Behaviors (Reference)
Input Handling (Auto-Casting)
To ensure a smooth user experience ("Pit of Success"), the system employs smart input handling for Variable and If/Then inputs to interpret user intent without explicit type dropdowns. Function inputs are dependent on parameter types and format accordingly. For text parameter inputs, smart quoting logic as mentioned below does apply.
- Quotes = String: If a value is wrapped in matching quotes (
"123"or'123'), it is treated strictly as a string. - Number = Number: If a value is a valid raw number (
123,10.5), it is treated as a number (Integer or Float/Decimal). - Boolean = Boolean:
TrueandFalseare treated as Python booleans. - List = List: If a value is wrapped in matching brackets (
[or]), it is treated as a list literal. - Fallback = String: Any other value (e.g.,
0123,ABC) is automatically wrapped in quotes and treated as a string.- Safety: This prevents
0123from being interpreted as an invalid octal number in Python orABCfrom being seen as an undefined variable. - None:
Nonegets treated as Python None and won't be wrapped in quotes.
- Safety: This prevents
Code Generation
The visual blocks are transpiled into a Python function body:
- Variables: Generated as
var_name = value. - Logic: Visual blocks become standard Python
if/elif/elsestructures. - Functions: Functions are given user defined name and are formatted in the code as userDefinedFunctionName(row) where row is passed as a parameter. See function definitions below for examples.
- Result: The final output is assigned to a reserved variable
result. - Row Context: All source field references are compiled to
row['Field Name'].
Logic Builder Modes
The Logic Builder adapts its interface and expected output based on the context in which it is invoked.
-
Transform Mode (Default):
- Goal: Produce a final value for a destination field.
- Context: Accessed via the "Custom Logic" code icon on a field card in the Mapping Editor.
- Interface: All 5 tabs are available. The "Return Result" tab is active by default. Result can be set in "Return Result" or dynamically in "If/Then" tab.
- Expected Output: Any output from static (String, Number, Boolean, List, etc.), variable, or function.
- Result Handling: The value returned by the logic is written directly to the destination field for that row.
-
Validation Mode:
- Goal: Check if a row's data conforms to specific rules.
- Context: Accessed via the "Validation Logic" shield icon on a field card.
- Interface: Similar to Transform mode, but focused on boolean logic.
- Expected Output:
True: The row is valid.[False, "Error Message"]: The row is invalid. The string provided is logged as the error reason.
- Result Handling: If invalid, the row is flagged in the transformation report and can be reviewed in the Data Viewer after running the transformation. "Return Result" tab is not available and Result is set dynamically based on the validation conditions in "If/Then" tab. By default or if not specified, all rows default to
True.
-
Filter Mode (Pre/Post):
- Goal: Decide whether a row should be included in the processing pipeline.
- Context: Accessed via "Add Pre-Processing Filter" or "Add Post-Processing Filter" buttons in the Mapping Editor. Note that unlike validation logic and custom logic, the filters are not on field mappings and are just 2 distinct buttons below the field mapping scroll area.
Lookup Panel (Lookup/Reference Tables)
The Lookup Panel is located in the Left Sidebar, accessible by clicking the Lookup tab. It serves as the central hub for managing reference data used in your mappings. Lookup tables are used with the LocalLookup and Fuzzy_LocalLookup functions to pull data in from tables.
Management
-
Library vs. Workspace:
- Library (Global): Your permanent, private collection of lookup tables. This is your "source of truth".
- Access: Click the Library button (Database icon) in the sidebar.
- Workspace (Local): The specific tables active in your current mapping project.
- Access: The main view of the Lookup Panel shows "filled sockets" for each active table in the mapping.
- Behavior: Removing a table from the Workspace (via the Trash icon on the socket) does not delete it from your Library.
- Library (Global): Your permanent, private collection of lookup tables. This is your "source of truth".
-
Adding New Tables:
- Click the New button in the sidebar to open the Add Lookup Table Modal.
- Upload: Drag and drop your CSV, Excel, or JSON file into the upload zone.
- Safety: The system checks your library for duplicate names to prevent confusion. It will also check the table's data to confirm another table with the exact same data doesn't already exist.
-
Replacing Data (Atomic Update):
- Navigation: Open Library -> Find Table -> Click Edit (Pencil icon) -> Click Manage Data & Keys (Zap icon).
- Atomic Safety: Replacing a table is an "all-or-nothing" operation. If the upload is interrupted, your original data remains untouched.
- Schema Intelligence: The system automatically compares the new file against the old one. It displays a Schema Comparison summary (Added/Removed columns) and warns you if you have removed columns that are critical for your existing mappings.
Key Editor (Computed Keys)
- Concept: Computed Keys allow you to create new, virtual columns (e.g., combining
First Name+Last Name->Full Name) without manually editing your source Excel/CSV files. This process uses the same logic builder you're familiar with to generate these computed keys. - Navigation:
- In the Add/Replace Modal, after uploading a file, click the Create Keys (or Edit Keys) button.
- This opens the Key Manager, where you can add new keys or edit existing ones using the Logic Builder.
- Materialization:
- Once defined, click Generate Keys (Zap icon).
- The system runs a transformation to "materialize" these keys—saving them as actual, searchable columns in the lookup table.
- Automatic Regeneration (Smart Refresh):
- When you replace a table's data with a new file (e.g., next month's price list), you do not need to rebuild your keys.
- The system detects your existing Computed Key logic.
- Upon uploading the new file, it automatically runs the key generation process against the new data, ensuring your
Full NameorPrice Codekeys are instantly available and up-to-date.
Conflict Resolution
-
Smart Deduplication:
- The system calculates a unique "fingerprint" (hash) for every table based on its content, not just its filename.
- It can detect if you upload a file that is identical to one already in your library, even if the rows are sorted differently or the filename has changed.
-
Import Handling:
- When you load a mapping file (
.dfm) from an export or shared by a colleague, it may contain references to tables you don't have. - Match Found: If you already have the exact same table (same content fingerprint), the system automatically links to your local copy.
- New Table: If the table is missing, you are prompted with a confirmation to import it into your library from the mapping file.
- Name Conflict: If a table has the same name but different data, you will be prompted to resolve the conflict (see Reference Checks section for more information):
- Override: Replace your local version with the one from the mapping file.
- Keep Existing: Ignore the import and use your version (useful if you have a newer version locally).
- Rename Incoming: Rename the incoming table and save it as something else.
- When you load a mapping file (
Fields Panel
The Fields Panel (first tab in the sidebar) is context-aware, displaying the relevant fields available for mapping or logic construction based on the current mode.
-
Context Switching:
- Source Fields (Default): Displays fields from your uploaded source file (CSV/Excel/JSON). This is the standard view when mapping fields or writing Pre-Filter logic.
- Destination Fields: Automatically switches to show Target fields when you are writing Validation Logic or Post-Filter Logic. This allows you to validate or filter based on the final transformed values rather than the raw input.
- Lookup Table Fields: When using the Computed Key builder for a lookup table, this panel displays columns from that specific lookup table, ensuring you build keys using the correct data.
-
Interactions:
- Drag-and-Drop: Fields can be dragged from the sidebar directly into the Logic Builder inputs or onto the Mapping Editor canvas.
- Search: A real-time search bar filters the list, useful for datasets with hundreds of columns.
- Preview: A "Eye" icon next to the header allows you to preview the raw data for the currently selected context (e.g., viewing the first 500 rows of the source file).
Variables & Functions Panels
These panels appear dynamically in the sidebar only when the Logic Builder is open. They provide quick access to local resources created within the current session.
-
Variables Panel:
- Scope: Displays variables defined in the current Logic Builder session. These are local variables specific to the field transformation logic being edited; they do not persist globally across different fields.
- Drag-and-Drop: Variables can be dragged from this panel directly into function parameters or logic block inputs.
-
Functions Panel:
- Scope: Displays functions defined in the current Logic Builder session. Like variables, these are local to the specific transformation logic being edited.
- Drag-and-Drop: Functions can be dragged from this panel into other inputs to chain operations (e.g., using the result of
getCleanString()as an input forgetUpperCase()).
User Interface Workflows
The application follows a "Hub-and-Spoke" navigation model, with the Transformation Engine serving as the central hub.
Transformation Form ↔ Mapping Editor
- Entry: Users enter the Mapping Editor by clicking "Edit Mapping" (if a mapping file is loaded) or "Create Mapping" (to start fresh) on the Transformation Form.
- State Preservation: When transitioning to the editor, the current state (Source File, Delimiter Settings, Header Row) is passed forward. This ensures the editor always works with the live context of your source data. The Mapping Editor's state is exclusive to itself however and navigating away without saving will result in loss of unsaved work or changes.
- Return: Clicking "Back to Transform" in the top-left corner of the Mapping Editor returns you to the hub.
- Unsaved Changes: If you have unsaved changes in the editor, a confirmation modal ("Unsaved Changes") appears to prevent accidental data loss.
Mapping Editor ↔ Logic Builder
The Logic Builder operates as a transient overlay on top of the Mapping Editor, keeping you in context.
- Entry: Click the Add Custom Logic/Code icon or Add/Edit Validation on any field mapping card. For filtering, click the respective Filter Buttons (Pre/Post) found underneath the field mapping scroll area. The only exception is Key Logic Builder used for computed keys in Lookup tables. This is launched from the Lookup table library as outlined in the Computed Key Mode section.
- Context Awareness: The overlay opens specifically for that field. The background (Mapping Editor) is dimmed but visible, maintaining context.
- Exit:
- Save Logic: Commits the code to the specific field and closes the overlay.
- Close (X): Discards changes and returns to the Mapping Editor. A warning appears if changes were made but not saved.
Mapping Editor ↔ Lookup Table Workspace
Lookup Table management is integrated directly into the Mapping Editor's sidebar to allow "flow-of-thought" updates without leaving your work.
- Workspace Sidebar: The "Lookups" tab in the sidebar shows tables currently used in your mapping.
- Library Modal: Clicking "Library" opens a modal stack to browse your global collection.
- Add/Edit Modal: Adding a new table or editing an existing one opens a secondary modal on top of the Library modal.
- Atomic Updates: Changes made to lookup tables (like adding a key) are immediately reflected in the Mapping Editor's "Lookup Table Fields" panel once the modal closes.
Account Settings
Account settings are accessed by clicking the icon in the top right corner of the header.
Usage & Billing
AI Credits
The AI-Powered features (Logic Assist, Suggestions, Map All, Async AI) are powered by a consumption-based credit system.
- Monthly Pool: Each subscription plan comes with a monthly allowance of AI Credits (e.g., 2,000 for Professional, 12,500 for Team). This pool resets automatically on the 1st of every month.
- Usage Tracking: Users can monitor their credit usage in real-time via the Usage & Billing tab. The system provides a visual progress bar showing the percentage of the monthly limit used.
- Purchased Credits: If you run out of monthly credits, you can purchase one-time "Credit Packs" (e.g., 1,000 credits) that do not expire. The system intelligently deducts from your monthly pool first, and only dips into your purchased balance once the monthly limit is exhausted.
Subscription Plans
- Professional Plan:
- Designed for power users and solo consultants.
- Seats: Includes 1 User License.
- AI Credits: 2,000 credits/month.
- Support: Standard Email Support.
- Team Plan:
- Designed for growing organizations requiring collaboration.
- Seats: Includes 5 User Licenses (expandable). Additional seats can be added for a monthly fee.
- AI Credits: 12,500 credits/month.
- Collaboration: Includes Shared Template Library.
- Support: Priority Support.
- Billing Only Users:
- Team Owners can designate members as "Billing Only." These users do not consume a paid seat license.
- Capabilities: They can access Account Settings, manage subscriptions, view invoices, and manage team members.
- Restrictions: They cannot access the Transformation Engine, Mapping Editor, or execute jobs.
Team Management
Roles & Permissions
- Owner:
- Full control over the team, billing, and subscription.
- Can assign/revoke Admin roles.
- Can designate "Billing Only" status.
- Cannot be removed from the team.
- Admin:
- Can invite new members and remove existing members.
- Can view pending invitations.
- Cannot manage billing or change the team name.
- Member:
- Standard access to all features (Transform, Map, Logic).
- Can view team members but cannot invite/remove.
Invitations
- Invite Flow: Admins/Owners send invitations via email from the Team Management tab.
- New Users: If the email is not associated with an existing account, the user receives an email to sign up. Upon registration, they are automatically added to the team.
- Existing Users: If the user already has an account, they receive a notification and can accept the invite to join the team. They will be prompted to visit the account settings panel to accept the invite.
- Zombie Seat Prevention: To ensure data security and clean license management, if a user accepts an invitation to a new team, they are automatically removed from their previous team. This prevents "zombie" accounts lingering in old workspaces.
Security & Sessions
- Single Session Enforcement: To prevent account sharing and ensure security, a user can only have one active session at a time. Signing in on a new device or browser automatically invalidates the previous session, logging the user out of the old device.
- Session Management: Users can view a list of their active sessions (including device type, IP address, and last active time) in the Sessions tab and manually terminate suspicious sessions.
Function Definitions (Reference)
DataFlowMapper includes a comprehensive library of built-in functions. All functions can be dynamically constructed using the Logic Builder.
# Example function structure with generated Python code showing user defined function name and function body that represents the function definition's pythonGen patterns. # UPPER - Converts text to uppercase def myUpperFunction(row): return str(row['field1']).upper() # CONCAT - Joins text values together def myConcatFunction(row): return ''.join([str(x) for x in [row['first_name'], row['last_name']]])`
String Functions
Manipulate text data with functions like UPPER, LOWER, CONCAT, SUBSTRING, REPLACE, and more.
CONCAT
Combines multiple text values into one string.
Parameters:
- Text (required) - Values to join
Returns:
string
# CONCAT - Combines multiple text values into one string.
''.join([str(x) for x in [row['first_name'], " ", rowMath Functions
Perform calculations with functions like SUM, MULTIPLY, DIVIDE, ROUND, MAX, MIN, and AVERAGE.
ABS
Returns the absolute (positive) value of a number.
Parameters:
- Value (required) - Number to convert
Returns:
number
# ABS - Returns the absolute (positive) value of a number.
(abs(Decimal(row['amount'])) if row['amount'] is not None Date Functions
Work with dates using functions like FORMAT_DATE, DATE_NOW, TODAY, and ADD_TO_DATE.
ADD_TO_DATE
Add a specified amount of time to a date. Returns a datetime object.
Parameters:
- Date (required) - Starting date
- Amount (required) - Amount to add
- Unit (required) - Time unit
Returns:
string
# ADD_TO_DATE - Add a specified amount of time to a date. Returns a datetime object.
pd.to_datetime(row['start_date']) + pd.Timedelta(weeks=float(row[Conversion Functions
Convert between data types with functions like TO_TEXT, TO_DATE, TO_BOOLEAN, TO_INT, and TO_FLOAT.
CLEAN_NUMBER
Removes non-numeric characters and converts to number.
Parameters:
- Text (required) - Text to clean and convert
Returns:
number
# CLEAN_NUMBER - Removes non-numeric characters and converts to number.
(lambda x: Decimal(x) if x else None)(re.sub(r'[^0-9.-]', Utility Functions
Additional utility functions like LOOKUP, VALUECOUNT, and IS_EMPTY help with common data transformation tasks and validations.
ADD_TO_LIST
Appends an item to the end of a list.
Parameters:
- List (required) - Target list
- Item (required) - Value to add to the list
Returns:
any
# ADD_TO_LIST - Appends an item to the end of a list.
my_list.append(row['my_item'])APPLY
Applies the transformation logic defined in another destination field. This will return the transformed value of the referenced destination field for the given row.
Parameters:
- Field (required) - Field to apply transformation from
Returns:
any