Glue Iceberg REST Catalog

Introduction

AWS Glue exposes an Iceberg REST endpoint that lets external query engines read and write Iceberg tables managed in the Glue Data Catalog. When the Glue catalog is federated to S3 Tables, the REST endpoint serves Iceberg tables stored in S3 Tables buckets.

The Snowflake emulator can connect to this Glue Iceberg REST endpoint through a CATALOG INTEGRATION of source ICEBERG_REST with CATALOG_API_TYPE = AWS_GLUE. The integration uses AWS SigV4 to sign catalog requests and VENDED_CREDENTIALS to obtain scoped credentials for the underlying S3 Tables data files, so you can query the same tables from Snowflake and from PyIceberg without duplicating data.

Getting started

This guide walks through creating an Iceberg table in S3 Tables through the Glue Iceberg REST endpoint, registering that table with the Snowflake emulator through a Glue catalog integration, and querying it with SQL. It assumes basic knowledge of the AWS CLI, our awslocal wrapper, and Snowflake.

In this guide, you will:

Create an S3 Tables bucket and namespace
Register a federated s3tablescatalog in Glue
Create and populate an Iceberg table through the Glue Iceberg REST endpoint with PyIceberg
Create a Snowflake catalog integration that points at the Glue REST endpoint
Create an Iceberg table in Snowflake that references the remote Glue table and query it

Start your Snowflake emulator and connect to it using an SQL client in order to execute the queries below. Make sure to install Python and pyiceberg[s3fs,pyarrow] packages before starting.

Create an S3 Tables bucket and namespace

The Glue Iceberg REST endpoint serves tables stored in S3 Tables. Create a table bucket:

awslocal s3tables create-table-bucket --name my-table-bucket

{
    "arn": "arn:aws:s3tables:us-east-1:000000000000:bucket/my-table-bucket"
}

Now create a namespace to hold the Iceberg table:

awslocal s3tables create-namespace \
    --table-bucket-arn arn:aws:s3tables:us-east-1:000000000000:bucket/my-table-bucket \
    --namespace my_namespace

{
    "tableBucketARN": "arn:aws:s3tables:us-east-1:000000000000:bucket/my-table-bucket",
    "namespace": [
        "my_namespace"
    ]
}

Register the Glue federated catalog

Glue exposes S3 Tables buckets through a federated catalog. Register a catalog named s3tablescatalog that federates to all S3 Tables buckets in the account:

1
awslocal glue create-catalog \
2
    --name s3tablescatalog \
3
    --catalog-input '{
4
        "FederatedCatalog": {
5
            "Identifier": "arn:aws:s3tables:us-east-1:000000000000:bucket/*",
6
            "ConnectionName": "aws:s3tables"
7
        },
8
        "CreateTableDefaultPermissions": [],
9
        "CreateDatabaseDefaultPermissions": []
10
    }'

Confirm the catalog was registered:

awslocal glue get-catalogs

The response includes a CatalogList entry with Name: s3tablescatalog and a FederatedCatalog block pointing at S3 Tables. Snowflake will reference this catalog through its WAREHOUSE identifier in the form <account-id>:s3tablescatalog/<table-bucket-name>.

Create and populate the table

Use PyIceberg to talk to the Glue Iceberg REST endpoint at http://glue.localhost.localstack.cloud:4566/iceberg. The same endpoint is later used by the Snowflake catalog integration, so creating the table through PyIceberg first lets you confirm that signing and federation are configured correctly.

Save the script as setup_glue_iceberg.py:

1
import pyarrow as pa
2
from pyiceberg.catalog.rest import RestCatalog
3
from pyiceberg.schema import Schema
4
from pyiceberg.types import NestedField, StringType, LongType
5

6
LOCALSTACK_URL = "http://localhost.localstack.cloud:4566"
7
GLUE_URL = "http://glue.localhost.localstack.cloud:4566"
8
ACCOUNT_ID = "000000000000"
9
TABLE_BUCKET_NAME = "my-table-bucket"
10
NAMESPACE = "my_namespace"
11
TABLE_NAME = "customer_orders"
12
REGION = "us-east-1"
13

14
catalog = RestCatalog(
15
    name="glue_catalog",
16
    uri=f"{GLUE_URL}/iceberg",
17
    warehouse=f"{ACCOUNT_ID}:s3tablescatalog/{TABLE_BUCKET_NAME}",
18
    **{
19
        "s3.region": REGION,
20
        "s3.endpoint": LOCALSTACK_URL,
21
        "client.access-key-id": ACCOUNT_ID,
22
        "client.secret-access-key": "test",
23
        "rest.sigv4-enabled": "true",
24
        "rest.signing-name": "glue",
25
        "rest.signing-region": REGION,
26
    },
27
)
28

29
schema = Schema(
30
    NestedField(field_id=1, name="order_id", field_type=StringType(), required=False),
31
    NestedField(field_id=2, name="customer_name", field_type=StringType(), required=False),
32
    NestedField(field_id=3, name="amount", field_type=LongType(), required=False),
33
)
34

35
catalog.create_table(identifier=(NAMESPACE, TABLE_NAME), schema=schema)
36
table = catalog.load_table((NAMESPACE, TABLE_NAME))
37

38
table.append(pa.table({
39
    "order_id": ["ORD001", "ORD002", "ORD003"],
40
    "customer_name": ["Alice", "Bob", "Charlie"],
41
    "amount": [100, 250, 175],
42
}))
43

44
print(f"Tables in {NAMESPACE}: {catalog.list_tables(NAMESPACE)}")

Run the script:

python setup_glue_iceberg.py

Tables in my_namespace: [('my_namespace', 'customer_orders')]

Create the catalog integration

Connect to the Snowflake emulator with your SQL client of choice and create the catalog integration. The REST_CONFIG block declares the Glue REST endpoint and warehouse, and the REST_AUTHENTICATION block configures AWS SigV4 signing against the glue service.

1
CREATE OR REPLACE CATALOG INTEGRATION glue_rest_catalog_int
2
    CATALOG_SOURCE = ICEBERG_REST
3
    TABLE_FORMAT = ICEBERG
4
    CATALOG_NAMESPACE = 'my_namespace'
5
    REST_CONFIG = (
6
        CATALOG_URI = 'http://glue.localhost.localstack.cloud:4566/iceberg'
7
        CATALOG_API_TYPE = AWS_GLUE
8
        WAREHOUSE = '000000000000:s3tablescatalog/my-table-bucket'
9
        ACCESS_DELEGATION_MODE = VENDED_CREDENTIALS
10
    )
11
    REST_AUTHENTICATION = (
12
        TYPE = AWS_SIGV4
13
        AWS_ACCESS_KEY_ID = '000000000000'
14
        AWS_SECRET_ACCESS_KEY = 'test'
15
        AWS_REGION = 'us-east-1'
16
        AWS_SERVICE = 'glue'
17
    )
18
    ENABLED = TRUE
19
    REFRESH_INTERVAL_SECONDS = 60
20
    COMMENT = 'Glue Iceberg REST catalog integration';

The key fields are:

CATALOG_API_TYPE = AWS_GLUE selects the Glue dialect of the Iceberg REST protocol. This skips the /api/catalog suffix that Polaris-style endpoints expect.
WAREHOUSE is the Glue catalog identifier in the form <account-id>:s3tablescatalog/<table-bucket-name> and points at the federated catalog created above.
ACCESS_DELEGATION_MODE = VENDED_CREDENTIALS instructs the catalog to return scoped S3 credentials so Snowflake can read the underlying data files without a separate external volume.
REST_AUTHENTICATION uses AWS_SIGV4 with AWS_SERVICE = 'glue'. AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY accept any LocalStack-compatible credentials.

You can verify the integration with:

SHOW CATALOG INTEGRATIONS;

Create an Iceberg table referencing the Glue catalog

Reference the existing Glue table by its fully-qualified name and let Snowflake infer the schema from the catalog metadata:

1
CREATE OR REPLACE ICEBERG TABLE iceberg_customer_orders
2
    CATALOG = 'glue_rest_catalog_int'
3
    CATALOG_TABLE_NAME = 'my_namespace.customer_orders'
4
    AUTO_REFRESH = TRUE;

CATALOG_TABLE_NAME uses the <namespace>.<table> format from the Glue catalog. With AUTO_REFRESH = TRUE, Snowflake re-reads the table metadata on the schedule defined by the integration’s REFRESH_INTERVAL_SECONDS.

Query the table

Query the table like any other Snowflake table:

SELECT * FROM iceberg_customer_orders;

+----------+---------------+--------+
| ORDER_ID | CUSTOMER_NAME | AMOUNT |
+----------+---------------+--------+
| ORD001   | Alice         | 100    |
| ORD002   | Bob           | 250    |
| ORD003   | Charlie       | 175    |
+----------+---------------+--------+

Rows appended through PyIceberg are visible to Snowflake on the next metadata refresh, and any further changes you make on the Glue side propagate through the same glue_rest_catalog_int integration.

Was this page helpful?