πŸ’ΎBlob Storage & File Connectors

Configure blob or file storage connectors to validate file-based datasets stored in cloud platforms or on-premises systems.

Overview

Blob storage and file systems provide scalable, cost-effective storage for large datasets. Data Testing supports major cloud storage providers and file transfer protocols for comprehensive file-based data validation.

circle-info

File-based connectors support various file formats (CSV, JSON, Parquet, etc.) and enable validation of data stored in cloud storage or on file servers.


Available Storage Connectors

Cloud-native object storage services:

Connector
Platform
Use Case

Amazon S3

AWS

AWS data lake storage

Azure Data Lake Storage Gen2

Microsoft Azure

Azure cloud storage

Ideal for:

  • βœ… Scalable file storage

  • βœ… Data lake architectures

  • βœ… Cost-effective storage

  • βœ… Multi-region replication

AWS S3 β†’ Azure ADLS Gen2 β†’


Common File Connector Configuration

Connection Parameters

Parameter
Description
Required

Host/Endpoint

Storage endpoint or SFTP server address

βœ… Yes

Port

Service port (S3: 443, SFTP: 22)

βœ… Yes

Authentication

API keys, credentials, or certificates

βœ… Yes

Bucket/Path

Storage location or directory path

βœ… Yes

File Configuration

Parameter
Description

File Path/Prefix

Location of data files

File Format

CSV, JSON, Parquet, XML, etc.

Delimiter

Field separator for structured formats

Header Row

Whether file includes headers

Encoding

Character encoding (UTF-8, etc.)


πŸ“Š Amazon S3

Cloud-native object storage from AWS:

circle-info

Amazon S3 is ideal for:

  • AWS cloud data lakes

  • Large-scale file storage

  • Multi-region data distribution

  • Integration with AWS analytics services

View Amazon S3 Connector β†’


πŸ”· Azure Data Lake Storage Gen2

Enterprise data lake on Azure:

circle-info

ADLS Gen2 is ideal for:

  • Azure cloud deployments

  • Enterprise data lakes

  • Hadoop file system compatibility

  • Integration with Azure Synapse and Power BI

View ADLS Gen2 Connector β†’


πŸ” SFTP

Secure file transfer protocol:

circle-info

SFTP is ideal for:

  • On-premises file servers

  • Legacy system integration

  • Secure file transfer

  • Remote team collaboration

View SFTP Connector β†’


πŸ” Security Best Practices

circle-exclamation

Supported File Formats

Format
Extension
Best For
Support

CSV

.csv

Tabular data

βœ… Full

JSON

.json

Semi-structured data

βœ… Full

Parquet

.parquet

Columnar storage

βœ… Full

XML

.xml

Hierarchical data

⚠️ Limited

Text

.txt

Plain text

βœ… Full

Excel

.xlsx/.xls

Spreadsheets

⚠️ Limited


Performance Considerations

Factor
Impact
Optimization

File Size

Memory usage

Use compression, split large files

File Count

Processing time

Batch files, use prefixes

Network

Transfer speed

Use regional endpoints

Format

Parse time

Use efficient formats (Parquet)

Encoding

Processing

Ensure consistent encoding


Storage Connector Comparison

Feature
Amazon S3
ADLS Gen2
SFTP

Cloud Provider

AWS

Azure

Any

Authentication

IAM/Access Keys

AD/SAS

SSH Keys/Password

Scalability

Unlimited

Unlimited

Server-dependent

Encryption

βœ… Yes

βœ… Yes

βœ… Yes (TLS)

Versioning

βœ… Yes

βœ… Yes

Manual

Cost

Low

Low

Varies

Setup

Cloud-native

Cloud-native

Quick


πŸš€ Quick Start

  1. Choose your storage platform (S3, ADLS Gen2, or SFTP)

  2. Configure access credentials with minimal required permissions

  3. Enable encryption for data security

  4. Identify target files and their location/path

  5. Configure file format and parsing options

  6. Test the connection before creating jobs

  7. Create validation jobs for data quality checks


Last updated