ParquetMetadata
Description
Special format for reading Parquet file metadata (https://parquet.apache.org/docs/file-format/metadata/). It always outputs one row with the next structure/content:
num_columns
- the number of columns- ``num_rows` - the total number of rows
num_row_groups
- the total number of row groupsformat_version
- parquet format version, always 1.0 or 2.6total_uncompressed_size
- total uncompressed bytes size of the data, calculated as the sum of total_byte_size from all row groupstotal_compressed_size
- total compressed bytes size of the data, calculated as the sum of total_compressed_size from all row groupscolumns
- the list of columns metadata with the next structure:name
- column namepath
- column path (differs from name for nested column)max_definition_level
- maximum definition levelmax_repetition_level
- maximum repetition levelphysical_type
- column physical typelogical_type
- column logical typecompression
- compression used for this columntotal_uncompressed_size
- total uncompressed bytes size of the column, calculated as the sum of total_uncompressed_size of the column from all row groupstotal_compressed_size
- total compressed bytes size of the column, calculated as the sum of total_compressed_size of the column from all row groupsspace_saved
- percent of space saved by compression, calculated as (1 - total_compressed_size/total_uncompressed_size).encodings
- the list of encodings used for this column
row_groups
- the list of row groups metadata with the next structure:num_columns
- the number of columns in the row groupnum_rows
- the number of rows in the row grouptotal_uncompressed_size
- total uncompressed bytes size of the row grouptotal_compressed_size
- total compressed bytes size of the row groupcolumns
- the list of column chunks metadata with the next structure:name
- column namepath
- column pathtotal_compressed_size
- total compressed bytes size of the columntotal_uncompressed_size
- total uncompressed bytes size of the row grouphave_statistics
- boolean flag that indicates if column chunk metadata contains column statisticsstatistics
- column chunk statistics (all fields are NULL if have_statistics = false) with the next structure:num_values
- the number of non-null values in the column chunknull_count
- the number of NULL values in the column chunkdistinct_count
- the number of distinct values in the column chunkmin
- the minimum value of the column chunkmax
- the maximum column of the column chunk
Example Usage
Example: