Querying Metadata for Staged Files

Snowflake automatically generates metadata for files in internal (i.e. Snowflake) stages or external (i.e. S3 or Azure) stages. This metadata is “stored” in virtual columns that can be:

  • Queried using a standard SELECT statement.
  • Loaded into a table, along with the regular data columns, using COPY INTO table. For general information about querying staged data files, see Querying Staged Data.

In this Topic:

Metadata Columns

Currently, the following metadata columns can be queried or copied into tables:

METADATA$FILENAME
Name of the staged data file the current row belongs to. Includes the path to the data file in the stage.
METADATA$FILE_ROW_NUMBER
Row number for each record in the container staged data file.

Query Limitations

Query Examples

Example 1: Querying the Metadata Columns for a CSV File

The following example illustrates staging multiple CSV data files (with the same file format) and then querying the metadata columns, as well as the regular data columns, in the files.

This example assumes the files have the following names and are located in the root directory in a Mac OS or Linux environment:

  • /tmp/data1.csv contains two records:

    a|b
    c|d
    
  • /tmp/data2.csv contains two records:

    e|f
    g|h
    

To stage and query the files:

-- Create a file format
CREATE OR REPLACE FILE FORMAT myformat
  TYPE = 'csv' FIELD_DELIMITER = '|';

-- Create an internal stage
CREATE OR REPLACE STAGE mystage1;

-- Stage a data file
PUT file:///tmp/data*.csv @mystage1;

-- Query the filename and row number metadata columns and the regular data columns in the staged file
-- Note that the table alias is provided to make the statement easier to read and is not required
SELECT METADATA$FILENAME, METADATA$FILE_ROW_NUMBER, t.$1, t.$2 FROM @mystage1 (file_format => myformat) t;

+-------------------+--------------------------+----+----+
| METADATA$FILENAME | METADATA$FILE_ROW_NUMBER | $1 | $2 |
|-------------------+--------------------------+----+----|
| data2.csv.gz      |                        1 | e  | f  |
| data2.csv.gz      |                        2 | g  | h  |
| data1.csv.gz      |                        1 | a  | b  |
| data1.csv.gz      |                        2 | c  | d  |
+-------------------+--------------------------+----+----+

SELECT METADATA$FILENAME, METADATA$FILE_ROW_NUMBER, t.$1, t.$2 FROM @mystage1 t;

+-------------------+--------------------------+-----+------+
| METADATA$FILENAME | METADATA$FILE_ROW_NUMBER | $1  | $2   |
|-------------------+--------------------------+-----+------|
| data2.csv.gz      |                        1 | e|f | NULL |
| data2.csv.gz      |                        2 | g|h | NULL |
| data1.csv.gz      |                        1 | a|b | NULL |
| data1.csv.gz      |                        2 | c|d | NULL |
+-------------------+--------------------------+-----+------+

Note

The file format is required in this example to correctly parse the fields in the staged files. In the second query, the file format is omitted, causing the | field delimiter to be ignored and resulting in the values returned for $1 and $2.

However, if the file format is included in the stage definition, you can omit it from the SELECT statement. See the next example for details.

Example 2: Querying the Metadata Columns for a JSON File

This example illustrates staging a JSON data file containing the following objects and then querying the metadata columns, as well as the objects, in the file:

{"a": {"b": "x1","c": "y1"}},
{"a": {"b": "x2","c": "y2"}}

This example assumes the file is named /tmp/data1.json and is located in the root directory in a Mac OS or Linux environment.

To stage and query the file:

-- Create a file format
CREATE OR REPLACE FILE FORMAT my_json_format
  TYPE = 'json';

-- Create an internal stage
CREATE OR REPLACE STAGE mystage2
  FILE_FORMAT = my_json_format;

-- Stage a data file
PUT file:///tmp/data1.json @mystage2;

-- Query the filename and row number metadata columns and the regular data columns in the staged file
SELECT METADATA$FILENAME, METADATA$FILE_ROW_NUMBER, parse_json($1) FROM @mystage2/data1.json.gz;

+-------------------+--------------------------+----------------+
| METADATA$FILENAME | METADATA$FILE_ROW_NUMBER | PARSE_JSON($1) |
|-------------------+--------------------------+----------------|
| data1.json.gz     |                        1 | {              |
|                   |                          |   "a": {       |
|                   |                          |     "b": "x1", |
|                   |                          |     "c": "y1"  |
|                   |                          |   }            |
|                   |                          | }              |
| data1.json.gz     |                        2 | {              |
|                   |                          |   "a": {       |
|                   |                          |     "b": "x2", |
|                   |                          |     "c": "y2"  |
|                   |                          |   }            |
|                   |                          | }              |
+-------------------+--------------------------+----------------+

Example 3: Loading Metadata Columns into a Table

The COPY INTO table command supports copying metadata from staged data files into a target table. Use the data transformation syntax (i.e. a SELECT list) in your COPY statement. For more information about transforming data using a COPY statement, see Transforming Data During a Load.

The following example loads the metadata columns and regular data columns from Example 1: Querying the Metadata Columns for a CSV File into a table:

CREATE OR REPLACE TABLE table1 (
  filename varchar,
  file_row_number varchar,
  col1 varchar,
  col2 varchar
);

COPY INTO table1(filename, file_row_number, col1, col2)
  FROM (SELECT METADATA$FILENAME, METADATA$FILE_ROW_NUMBER, t.$1, t.$2 FROM @mystage1/data1.csv.gz (file_format => myformat) t);

SELECT * FROM table1;

+--------------+-----------------+------+------+
| FILENAME     | FILE_ROW_NUMBER | COL1 | COL2 |
|--------------+-----------------+------+------|
| data1.csv.gz | 1               | a    | b    |
| data1.csv.gz | 2               | d    | e    |
+--------------+-----------------+------+------+