--- name: azure-storage-blob-py description: | Azure Blob Storage SDK for Python. Use for uploading, downloading, listing blobs, managing containers, and blob lifecycle. Triggers: "blob storage", "BlobServiceClient", "ContainerClient", "BlobClient", "upload blob", "download blob". package: azure-storage-blob --- # Azure Blob Storage SDK for Python Client library for Azure Blob Storage — object storage for unstructured data. ## Installation ```bash pip install azure-storage-blob azure-identity ``` ## Environment Variables ```bash AZURE_STORAGE_ACCOUNT_NAME= # Or use full URL AZURE_STORAGE_ACCOUNT_URL=https://.blob.core.windows.net ``` ## Authentication ```python from azure.identity import DefaultAzureCredential from azure.storage.blob import BlobServiceClient credential = DefaultAzureCredential() account_url = "https://.blob.core.windows.net" blob_service_client = BlobServiceClient(account_url, credential=credential) ``` ## Client Hierarchy | Client | Purpose | Get From | |--------|---------|----------| | `BlobServiceClient` | Account-level operations | Direct instantiation | | `ContainerClient` | Container operations | `blob_service_client.get_container_client()` | | `BlobClient` | Single blob operations | `container_client.get_blob_client()` | ## Core Workflow ### Create Container ```python container_client = blob_service_client.get_container_client("mycontainer") container_client.create_container() ``` ### Upload Blob ```python # From file path blob_client = blob_service_client.get_blob_client( container="mycontainer", blob="sample.txt" ) with open("./local-file.txt", "rb") as data: blob_client.upload_blob(data, overwrite=True) # From bytes/string blob_client.upload_blob(b"Hello, World!", overwrite=True) # From stream import io stream = io.BytesIO(b"Stream content") blob_client.upload_blob(stream, overwrite=True) ``` ### Download Blob ```python blob_client = blob_service_client.get_blob_client( container="mycontainer", blob="sample.txt" ) # To file with open("./downloaded.txt", "wb") as file: download_stream = blob_client.download_blob() file.write(download_stream.readall()) # To memory download_stream = blob_client.download_blob() content = download_stream.readall() # bytes # Read into existing buffer stream = io.BytesIO() num_bytes = blob_client.download_blob().readinto(stream) ``` ### List Blobs ```python container_client = blob_service_client.get_container_client("mycontainer") # List all blobs for blob in container_client.list_blobs(): print(f"{blob.name} - {blob.size} bytes") # List with prefix (folder-like) for blob in container_client.list_blobs(name_starts_with="logs/"): print(blob.name) # Walk blob hierarchy (virtual directories) for item in container_client.walk_blobs(delimiter="/"): if item.get("prefix"): print(f"Directory: {item['prefix']}") else: print(f"Blob: {item.name}") ``` ### Delete Blob ```python blob_client.delete_blob() # Delete with snapshots blob_client.delete_blob(delete_snapshots="include") ``` ## Performance Tuning ```python # Configure chunk sizes for large uploads/downloads blob_client = BlobClient( account_url=account_url, container_name="mycontainer", blob_name="large-file.zip", credential=credential, max_block_size=4 * 1024 * 1024, # 4 MiB blocks max_single_put_size=64 * 1024 * 1024 # 64 MiB single upload limit ) # Parallel upload blob_client.upload_blob(data, max_concurrency=4) # Parallel download download_stream = blob_client.download_blob(max_concurrency=4) ``` ## SAS Tokens ```python from datetime import datetime, timedelta, timezone from azure.storage.blob import generate_blob_sas, BlobSasPermissions sas_token = generate_blob_sas( account_name="", container_name="mycontainer", blob_name="sample.txt", account_key="", # Or use user delegation key permission=BlobSasPermissions(read=True), expiry=datetime.now(timezone.utc) + timedelta(hours=1) ) # Use SAS token blob_url = f"https://.blob.core.windows.net/mycontainer/sample.txt?{sas_token}" ``` ## Blob Properties and Metadata ```python # Get properties properties = blob_client.get_blob_properties() print(f"Size: {properties.size}") print(f"Content-Type: {properties.content_settings.content_type}") print(f"Last modified: {properties.last_modified}") # Set metadata blob_client.set_blob_metadata(metadata={"category": "logs", "year": "2024"}) # Set content type from azure.storage.blob import ContentSettings blob_client.set_http_headers( content_settings=ContentSettings(content_type="application/json") ) ``` ## Async Client ```python from azure.identity.aio import DefaultAzureCredential from azure.storage.blob.aio import BlobServiceClient async def upload_async(): credential = DefaultAzureCredential() async with BlobServiceClient(account_url, credential=credential) as client: blob_client = client.get_blob_client("mycontainer", "sample.txt") with open("./file.txt", "rb") as data: await blob_client.upload_blob(data, overwrite=True) # Download async async def download_async(): async with BlobServiceClient(account_url, credential=credential) as client: blob_client = client.get_blob_client("mycontainer", "sample.txt") stream = await blob_client.download_blob() data = await stream.readall() ``` ## Best Practices 1. **Use DefaultAzureCredential** instead of connection strings 2. **Use context managers** for async clients 3. **Set `overwrite=True`** explicitly when re-uploading 4. **Use `max_concurrency`** for large file transfers 5. **Prefer `readinto()`** over `readall()` for memory efficiency 6. **Use `walk_blobs()`** for hierarchical listing 7. **Set appropriate content types** for web-served blobs