βœ…Evaluating Jobs

An Evaluate Job performs comprehensive validations on a single data source to ensure data quality and integrity.

Validations Performed

  • Row Count Verification – Confirms the dataset contains the expected number of records.

  • Column Type Verification – Validates that each column's data type matches expectations.

  • Non-Null Columns Verification – Ensures specified columns contain no null values.

  • Duplicate Columns Verification – Detects duplicate records based on selected columns.

  • Regex Verification – Validates column values against regular expression patterns.

  • User-Function Verification – Applies custom business logic using Python functions (lambda or custom functions).

Note: Functions must return True or False. Only Python language and built-in packages are supported.

Common Regex Examples

  • Email validation: ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

  • US phone number: ^\(\d{3}\) \d{3}-\d{4}$

  • Credit card number: ^\d{4} \d{4} \d{4} \d{4}$

Example: Custom Function

def validate_complex_data(data):
    # Check if data is a string
    if not isinstance(data, str):
        return False

    # Check length (assuming hash length of 64 characters)
    if len(data) != 64:
        return False

    import re
    # Validate hexadecimal format
    hex_pattern = re.compile(r'^[0-9a-fA-F]{64}$')
    if not hex_pattern.match(data):
        return False

    return True

Example: Lambda Function

Last updated