# Blob Storage & File Connectors

> Configure blob or file storage connectors to validate file-based datasets stored in cloud platforms or on-premises systems.

## Overview

Blob storage and file systems provide scalable, cost-effective storage for large datasets. Data Testing supports major cloud storage providers and file transfer protocols for comprehensive file-based data validation.

{% hint style="info" %}
File-based connectors support various file formats (CSV, JSON, Parquet, etc.) and enable validation of data stored in cloud storage or on file servers.
{% endhint %}

***

## Available Storage Connectors

{% tabs %}
{% tab title="Cloud Storage" %}
**Cloud-native object storage services:**

| Connector                        | Platform        | Use Case              |
| -------------------------------- | --------------- | --------------------- |
| **Amazon S3**                    | AWS             | AWS data lake storage |
| **Azure Data Lake Storage Gen2** | Microsoft Azure | Azure cloud storage   |

Ideal for:

* ✅ Scalable file storage
* ✅ Data lake architectures
* ✅ Cost-effective storage
* ✅ Multi-region replication

[AWS S3 →](/data-testing/blob-storage-and-file-systems/index/amazon-s3.md) [Azure ADLS Gen2 →](/data-testing/blob-storage-and-file-systems/index/adls-gen2.md)
{% endtab %}

{% tab title="File Transfer" %}
**Network file transfer and on-premises systems:**

| Connector | Protocol          | Use Case            |
| --------- | ----------------- | ------------------- |
| **SFTP**  | SSH File Transfer | Remote file systems |

Ideal for:

* ✅ On-premises file servers
* ✅ Legacy system integration
* ✅ Secure file transfer
* ✅ Password & key authentication

[SFTP →](/data-testing/blob-storage-and-file-systems/index/sftp.md)
{% endtab %}
{% endtabs %}

***

## Common File Connector Configuration

### Connection Parameters

| Parameter          | Description                             | Required |
| ------------------ | --------------------------------------- | -------- |
| **Host/Endpoint**  | Storage endpoint or SFTP server address | ✅ Yes    |
| **Port**           | Service port (S3: 443, SFTP: 22)        | ✅ Yes    |
| **Authentication** | API keys, credentials, or certificates  | ✅ Yes    |
| **Bucket/Path**    | Storage location or directory path      | ✅ Yes    |

### File Configuration

| Parameter            | Description                            |
| -------------------- | -------------------------------------- |
| **File Path/Prefix** | Location of data files                 |
| **File Format**      | CSV, JSON, Parquet, XML, etc.          |
| **Delimiter**        | Field separator for structured formats |
| **Header Row**       | Whether file includes headers          |
| **Encoding**         | Character encoding (UTF-8, etc.)       |

***

## 📊 Amazon S3

Cloud-native object storage from AWS:

{% hint style="info" %}
Amazon S3 is ideal for:

* AWS cloud data lakes
* Large-scale file storage
* Multi-region data distribution
* Integration with AWS analytics services
  {% endhint %}

[**View Amazon S3 Connector →**](/data-testing/blob-storage-and-file-systems/index/amazon-s3.md)

***

## 🔷 Azure Data Lake Storage Gen2

Enterprise data lake on Azure:

{% hint style="info" %}
ADLS Gen2 is ideal for:

* Azure cloud deployments
* Enterprise data lakes
* Hadoop file system compatibility
* Integration with Azure Synapse and Power BI
  {% endhint %}

[**View ADLS Gen2 Connector →**](/data-testing/blob-storage-and-file-systems/index/adls-gen2.md)

***

## 🔐 SFTP

Secure file transfer protocol:

{% hint style="info" %}
SFTP is ideal for:

* On-premises file servers
* Legacy system integration
* Secure file transfer
* Remote team collaboration
  {% endhint %}

[**View SFTP Connector →**](/data-testing/blob-storage-and-file-systems/index/sftp.md)

***

## 🔐 Security Best Practices

{% hint style="warning" %}
**Essential Security Practices:**

1. ✅ Use IAM roles instead of static credentials
2. ✅ Enable bucket policies for least-privilege access
3. ✅ Use SSH keys for SFTP instead of passwords
4. ✅ Enable encryption at rest and in transit
5. ✅ Enable versioning for data protection
6. ✅ Configure MFA delete protection
7. ✅ Audit access logs regularly
8. ✅ Implement network isolation
   {% endhint %}

***

## Supported File Formats

| Format      | Extension  | Best For             | Support    |
| ----------- | ---------- | -------------------- | ---------- |
| **CSV**     | .csv       | Tabular data         | ✅ Full     |
| **JSON**    | .json      | Semi-structured data | ✅ Full     |
| **Parquet** | .parquet   | Columnar storage     | ✅ Full     |
| **XML**     | .xml       | Hierarchical data    | ⚠️ Limited |
| **Text**    | .txt       | Plain text           | ✅ Full     |
| **Excel**   | .xlsx/.xls | Spreadsheets         | ⚠️ Limited |

***

## Performance Considerations

| Factor         | Impact          | Optimization                       |
| -------------- | --------------- | ---------------------------------- |
| **File Size**  | Memory usage    | Use compression, split large files |
| **File Count** | Processing time | Batch files, use prefixes          |
| **Network**    | Transfer speed  | Use regional endpoints             |
| **Format**     | Parse time      | Use efficient formats (Parquet)    |
| **Encoding**   | Processing      | Ensure consistent encoding         |

***

## Storage Connector Comparison

| Feature            | Amazon S3       | ADLS Gen2    | SFTP              |
| ------------------ | --------------- | ------------ | ----------------- |
| **Cloud Provider** | AWS             | Azure        | Any               |
| **Authentication** | IAM/Access Keys | AD/SAS       | SSH Keys/Password |
| **Scalability**    | Unlimited       | Unlimited    | Server-dependent  |
| **Encryption**     | ✅ Yes           | ✅ Yes        | ✅ Yes (TLS)       |
| **Versioning**     | ✅ Yes           | ✅ Yes        | Manual            |
| **Cost**           | Low             | Low          | Varies            |
| **Setup**          | Cloud-native    | Cloud-native | Quick             |

***

## 🚀 Quick Start

1. **Choose your storage platform** (S3, ADLS Gen2, or SFTP)
2. **Configure access credentials** with minimal required permissions
3. **Enable encryption** for data security
4. **Identify target files** and their location/path
5. **Configure file format** and parsing options
6. **Test the connection** before creating jobs
7. **Create validation jobs** for data quality checks

***

## Related Documentation

* [Amazon S3 Connector](/data-testing/blob-storage-and-file-systems/index/amazon-s3.md)
* [Azure ADLS Gen2 Connector](/data-testing/blob-storage-and-file-systems/index/adls-gen2.md)
* [SFTP Connector](/data-testing/blob-storage-and-file-systems/index/sftp.md)
* [Data Source Overview](/data-testing/data-sources/index.md)
* [Create Compare Job](/data-testing/jobs-and-workflows/index/compare-job.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.qyrus.com/data-testing/blob-storage-and-file-systems/index.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
