Giter VIP home page Giter VIP logo

Comments (6)

cmdlineluser avatar cmdlineluser commented on June 17, 2024 2

Can reproduce.

It seems to be specific to n_unique()?

If I use .unique().len() instead, its runs as fast as the others.

from polars.

kszlim avatar kszlim commented on June 17, 2024 1

Interestingly if you do:

ds = pl.read_parquet("weird2.parquet", rechunk=True)

It makes it fast again. It has something to do with the layout of the parquet file, using parquet-layout I see:

{
  "row_groups": [
    {
      "columns": [
        {
          "path": "date",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 4,
              "compressed_bytes": 12313,
              "uncompressed_bytes": 21688,
              "header_bytes": 18,
              "num_values": 5422
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 12335,
              "compressed_bytes": 150331,
              "uncompressed_bytes": 366078,
              "header_bytes": 39,
              "num_values": 264562
            }
          ]
        },
        {
          "path": "ident",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 162752,
              "compressed_bytes": 480,
              "uncompressed_bytes": 1272,
              "header_bytes": 16,
              "num_values": 159
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 163248,
              "compressed_bytes": 64231,
              "uncompressed_bytes": 216222,
              "header_bytes": 39,
              "num_values": 264562
            }
          ]
        },
        {
          "path": "label",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 227568,
              "compressed_bytes": 19,
              "uncompressed_bytes": 10,
              "header_bytes": 13,
              "num_values": 2
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 227600,
              "compressed_bytes": 11921,
              "uncompressed_bytes": 24011,
              "header_bytes": 33,
              "num_values": 264562
            }
          ]
        }
      ],
      "row_count": 264562
    },
    {
      "columns": [
        {
          "path": "date",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 239598,
              "compressed_bytes": 11534,
              "uncompressed_bytes": 20328,
              "header_bytes": 18,
              "num_values": 5082
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 251150,
              "compressed_bytes": 86553,
              "uncompressed_bytes": 206159,
              "header_bytes": 39,
              "num_values": 148837
            }
          ]
        },
        {
          "path": "ident",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 337791,
              "compressed_bytes": 460,
              "uncompressed_bytes": 1216,
              "header_bytes": 16,
              "num_values": 152
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 338267,
              "compressed_bytes": 36050,
              "uncompressed_bytes": 122588,
              "header_bytes": 39,
              "num_values": 148837
            }
          ]
        },
        {
          "path": "label",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 374406,
              "compressed_bytes": 19,
              "uncompressed_bytes": 10,
              "header_bytes": 13,
              "num_values": 2
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 374438,
              "compressed_bytes": 6897,
              "uncompressed_bytes": 13421,
              "header_bytes": 32,
              "num_values": 148837
            }
          ]
        }
      ],
      "row_count": 148837
    },
    {
      "columns": [
        {
          "path": "date",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 381410,
              "compressed_bytes": 12348,
              "uncompressed_bytes": 21744,
              "header_bytes": 18,
              "num_values": 5436
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 393776,
              "compressed_bytes": 150910,
              "uncompressed_bytes": 367061,
              "header_bytes": 39,
              "num_values": 264562
            }
          ]
        },
        {
          "path": "ident",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 544774,
              "compressed_bytes": 476,
              "uncompressed_bytes": 1264,
              "header_bytes": 16,
              "num_values": 158
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 545266,
              "compressed_bytes": 64053,
              "uncompressed_bytes": 216534,
              "header_bytes": 39,
              "num_values": 264562
            }
          ]
        },
        {
          "path": "label",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 609408,
              "compressed_bytes": 19,
              "uncompressed_bytes": 10,
              "header_bytes": 13,
              "num_values": 2
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 609440,
              "compressed_bytes": 11997,
              "uncompressed_bytes": 23708,
              "header_bytes": 33,
              "num_values": 264562
            }
          ]
        }
      ],
      "row_count": 264562
    },
    {
      "columns": [
        {
          "path": "date",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 621514,
              "compressed_bytes": 11664,
              "uncompressed_bytes": 20560,
              "header_bytes": 18,
              "num_values": 5140
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 633196,
              "compressed_bytes": 86374,
              "uncompressed_bytes": 205668,
              "header_bytes": 39,
              "num_values": 149237
            }
          ]
        },
        {
          "path": "ident",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 719658,
              "compressed_bytes": 472,
              "uncompressed_bytes": 1248,
              "header_bytes": 16,
              "num_values": 156
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 720146,
              "compressed_bytes": 36122,
              "uncompressed_bytes": 120994,
              "header_bytes": 39,
              "num_values": 149237
            }
          ]
        },
        {
          "path": "label",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 756357,
              "compressed_bytes": 19,
              "uncompressed_bytes": 10,
              "header_bytes": 13,
              "num_values": 2
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 756389,
              "compressed_bytes": 6919,
              "uncompressed_bytes": 13176,
              "header_bytes": 32,
              "num_values": 149237
            }
          ]
        }
      ],
      "row_count": 149237
    },
    {
      "columns": [
        {
          "path": "date",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 763383,
              "compressed_bytes": 12389,
              "uncompressed_bytes": 21816,
              "header_bytes": 18,
              "num_values": 5454
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 775790,
              "compressed_bytes": 150689,
              "uncompressed_bytes": 366531,
              "header_bytes": 39,
              "num_values": 264562
            }
          ]
        },
        {
          "path": "ident",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 926567,
              "compressed_bytes": 478,
              "uncompressed_bytes": 1272,
              "header_bytes": 16,
              "num_values": 159
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 927061,
              "compressed_bytes": 64301,
              "uncompressed_bytes": 217659,
              "header_bytes": 39,
              "num_values": 264562
            }
          ]
        },
        {
          "path": "label",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 991451,
              "compressed_bytes": 19,
              "uncompressed_bytes": 10,
              "header_bytes": 13,
              "num_values": 2
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 991483,
              "compressed_bytes": 11933,
              "uncompressed_bytes": 23998,
              "header_bytes": 33,
              "num_values": 264562
            }
          ]
        }
      ],
      "row_count": 264562
    },
    {
      "columns": [
        {
          "path": "date",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 1003493,
              "compressed_bytes": 11623,
              "uncompressed_bytes": 20472,
              "header_bytes": 18,
              "num_values": 5118
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 1015134,
              "compressed_bytes": 86669,
              "uncompressed_bytes": 207107,
              "header_bytes": 39,
              "num_values": 149043
            }
          ]
        },
        {
          "path": "ident",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 1101891,
              "compressed_bytes": 474,
              "uncompressed_bytes": 1248,
              "header_bytes": 16,
              "num_values": 156
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 1102381,
              "compressed_bytes": 35985,
              "uncompressed_bytes": 121520,
              "header_bytes": 39,
              "num_values": 149043
            }
          ]
        },
        {
          "path": "label",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 1138456,
              "compressed_bytes": 19,
              "uncompressed_bytes": 10,
              "header_bytes": 13,
              "num_values": 2
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 1138488,
              "compressed_bytes": 6950,
              "uncompressed_bytes": 13438,
              "header_bytes": 32,
              "num_values": 149043
            }
          ]
        }
      ],
      "row_count": 149043
    },
    {
      "columns": [
        {
          "path": "date",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 1145514,
              "compressed_bytes": 12295,
              "uncompressed_bytes": 21660,
              "header_bytes": 18,
              "num_values": 5415
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 1157827,
              "compressed_bytes": 150064,
              "uncompressed_bytes": 367783,
              "header_bytes": 39,
              "num_values": 264562
            }
          ]
        },
        {
          "path": "ident",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 1307980,
              "compressed_bytes": 472,
              "uncompressed_bytes": 1256,
              "header_bytes": 16,
              "num_values": 157
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 1308468,
              "compressed_bytes": 64144,
              "uncompressed_bytes": 218101,
              "header_bytes": 39,
              "num_values": 264562
            }
          ]
        },
        {
          "path": "label",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 1372702,
              "compressed_bytes": 19,
              "uncompressed_bytes": 10,
              "header_bytes": 13,
              "num_values": 2
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 1372734,
              "compressed_bytes": 12070,
              "uncompressed_bytes": 24020,
              "header_bytes": 33,
              "num_values": 264562
            }
          ]
        }
      ],
      "row_count": 264562
    },
    {
      "columns": [
        {
          "path": "date",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 1384882,
              "compressed_bytes": 11563,
              "uncompressed_bytes": 20384,
              "header_bytes": 18,
              "num_values": 5096
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 1396463,
              "compressed_bytes": 86456,
              "uncompressed_bytes": 205516,
              "header_bytes": 39,
              "num_values": 148331
            }
          ]
        },
        {
          "path": "ident",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 1483008,
              "compressed_bytes": 454,
              "uncompressed_bytes": 1208,
              "header_bytes": 16,
              "num_values": 151
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 1483478,
              "compressed_bytes": 35664,
              "uncompressed_bytes": 121053,
              "header_bytes": 39,
              "num_values": 148331
            }
          ]
        },
        {
          "path": "label",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 1519232,
              "compressed_bytes": 19,
              "uncompressed_bytes": 10,
              "header_bytes": 13,
              "num_values": 2
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 1519264,
              "compressed_bytes": 6891,
              "uncompressed_bytes": 13397,
              "header_bytes": 32,
              "num_values": 148331
            }
          ]
        }
      ],
      "row_count": 148331
    },
    {
      "columns": [
        {
          "path": "date",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 1526231,
              "compressed_bytes": 12348,
              "uncompressed_bytes": 21748,
              "header_bytes": 18,
              "num_values": 5437
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 1538597,
              "compressed_bytes": 150050,
              "uncompressed_bytes": 364520,
              "header_bytes": 39,
              "num_values": 264562
            }
          ]
        },
        {
          "path": "ident",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 1688736,
              "compressed_bytes": 481,
              "uncompressed_bytes": 1264,
              "header_bytes": 16,
              "num_values": 158
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 1689233,
              "compressed_bytes": 64151,
              "uncompressed_bytes": 215372,
              "header_bytes": 39,
              "num_values": 264562
            }
          ]
        },
        {
          "path": "label",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 1753474,
              "compressed_bytes": 19,
              "uncompressed_bytes": 10,
              "header_bytes": 13,
              "num_values": 2
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 1753506,
              "compressed_bytes": 12060,
              "uncompressed_bytes": 24007,
              "header_bytes": 33,
              "num_values": 264562
            }
          ]
        }
      ],
      "row_count": 264562
    },
    {
      "columns": [
        {
          "path": "date",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 1765644,
              "compressed_bytes": 11570,
              "uncompressed_bytes": 20380,
              "header_bytes": 18,
              "num_values": 5095
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 1777232,
              "compressed_bytes": 86628,
              "uncompressed_bytes": 206800,
              "header_bytes": 39,
              "num_values": 149113
            }
          ]
        },
        {
          "path": "ident",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 1863949,
              "compressed_bytes": 454,
              "uncompressed_bytes": 1200,
              "header_bytes": 16,
              "num_values": 150
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 1864419,
              "compressed_bytes": 35929,
              "uncompressed_bytes": 121826,
              "header_bytes": 39,
              "num_values": 149113
            }
          ]
        },
        {
          "path": "label",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 1900438,
              "compressed_bytes": 19,
              "uncompressed_bytes": 10,
              "header_bytes": 13,
              "num_values": 2
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 1900470,
              "compressed_bytes": 6946,
              "uncompressed_bytes": 13543,
              "header_bytes": 32,
              "num_values": 149113
            }
          ]
        }
      ],
      "row_count": 149113
    },
    {
      "columns": [
        {
          "path": "date",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 1907492,
              "compressed_bytes": 12350,
              "uncompressed_bytes": 21752,
              "header_bytes": 18,
              "num_values": 5438
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 1919860,
              "compressed_bytes": 150814,
              "uncompressed_bytes": 368460,
              "header_bytes": 39,
              "num_values": 264562
            }
          ]
        },
        {
          "path": "ident",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 2070763,
              "compressed_bytes": 481,
              "uncompressed_bytes": 1272,
              "header_bytes": 16,
              "num_values": 159
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 2071260,
              "compressed_bytes": 64398,
              "uncompressed_bytes": 216883,
              "header_bytes": 39,
              "num_values": 264562
            }
          ]
        },
        {
          "path": "label",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 2135748,
              "compressed_bytes": 19,
              "uncompressed_bytes": 10,
              "header_bytes": 13,
              "num_values": 2
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 2135780,
              "compressed_bytes": 12138,
              "uncompressed_bytes": 24077,
              "header_bytes": 33,
              "num_values": 264562
            }
          ]
        }
      ],
      "row_count": 264562
    },
    {
      "columns": [
        {
          "path": "date",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 2147996,
              "compressed_bytes": 11573,
              "uncompressed_bytes": 20392,
              "header_bytes": 18,
              "num_values": 5098
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 2159587,
              "compressed_bytes": 86817,
              "uncompressed_bytes": 208011,
              "header_bytes": 39,
              "num_values": 148878
            }
          ]
        },
        {
          "path": "ident",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 2246493,
              "compressed_bytes": 472,
              "uncompressed_bytes": 1248,
              "header_bytes": 16,
              "num_values": 156
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 2246981,
              "compressed_bytes": 36123,
              "uncompressed_bytes": 122951,
              "header_bytes": 39,
              "num_values": 148878
            }
          ]
        },
        {
          "path": "label",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 2283194,
              "compressed_bytes": 19,
              "uncompressed_bytes": 10,
              "header_bytes": 13,
              "num_values": 2
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 2283226,
              "compressed_bytes": 7013,
              "uncompressed_bytes": 13481,
              "header_bytes": 32,
              "num_values": 148878
            }
          ]
        }
      ],
      "row_count": 148878
    },
    {
      "columns": [
        {
          "path": "date",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 2290315,
              "compressed_bytes": 12306,
              "uncompressed_bytes": 21680,
              "header_bytes": 18,
              "num_values": 5420
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 2302639,
              "compressed_bytes": 150092,
              "uncompressed_bytes": 365903,
              "header_bytes": 39,
              "num_values": 264562
            }
          ]
        },
        {
          "path": "ident",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 2452820,
              "compressed_bytes": 478,
              "uncompressed_bytes": 1272,
              "header_bytes": 16,
              "num_values": 159
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 2453314,
              "compressed_bytes": 64259,
              "uncompressed_bytes": 216406,
              "header_bytes": 39,
              "num_values": 264562
            }
          ]
        },
        {
          "path": "label",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 2517663,
              "compressed_bytes": 19,
              "uncompressed_bytes": 10,
              "header_bytes": 13,
              "num_values": 2
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 2517695,
              "compressed_bytes": 12002,
              "uncompressed_bytes": 24012,
              "header_bytes": 33,
              "num_values": 264562
            }
          ]
        }
      ],
      "row_count": 264562
    },
    {
      "columns": [
        {
          "path": "date",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 2529775,
              "compressed_bytes": 11633,
              "uncompressed_bytes": 20504,
              "header_bytes": 18,
              "num_values": 5126
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 2541426,
              "compressed_bytes": 86793,
              "uncompressed_bytes": 207327,
              "header_bytes": 39,
              "num_values": 148495
            }
          ]
        },
        {
          "path": "ident",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 2628308,
              "compressed_bytes": 481,
              "uncompressed_bytes": 1264,
              "header_bytes": 16,
              "num_values": 158
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 2628805,
              "compressed_bytes": 35995,
              "uncompressed_bytes": 122310,
              "header_bytes": 39,
              "num_values": 148495
            }
          ]
        },
        {
          "path": "label",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 2664890,
              "compressed_bytes": 19,
              "uncompressed_bytes": 10,
              "header_bytes": 13,
              "num_values": 2
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 2664922,
              "compressed_bytes": 6935,
              "uncompressed_bytes": 13371,
              "header_bytes": 32,
              "num_values": 148495
            }
          ]
        }
      ],
      "row_count": 148495
    },
    {
      "columns": [
        {
          "path": "date",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 2671933,
              "compressed_bytes": 12341,
              "uncompressed_bytes": 21736,
              "header_bytes": 18,
              "num_values": 5434
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 2684292,
              "compressed_bytes": 150442,
              "uncompressed_bytes": 366688,
              "header_bytes": 39,
              "num_values": 264562
            }
          ]
        },
        {
          "path": "ident",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 2834823,
              "compressed_bytes": 473,
              "uncompressed_bytes": 1256,
              "header_bytes": 16,
              "num_values": 157
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 2835312,
              "compressed_bytes": 64048,
              "uncompressed_bytes": 217513,
              "header_bytes": 39,
              "num_values": 264562
            }
          ]
        },
        {
          "path": "label",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 2899450,
              "compressed_bytes": 19,
              "uncompressed_bytes": 10,
              "header_bytes": 13,
              "num_values": 2
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 2899482,
              "compressed_bytes": 12077,
              "uncompressed_bytes": 23943,
              "header_bytes": 33,
              "num_values": 264562
            }
          ]
        }
      ],
      "row_count": 264562
    },
    {
      "columns": [
        {
          "path": "date",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 2911637,
              "compressed_bytes": 11637,
              "uncompressed_bytes": 20496,
              "header_bytes": 18,
              "num_values": 5124
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 2923292,
              "compressed_bytes": 86157,
              "uncompressed_bytes": 205770,
              "header_bytes": 39,
              "num_values": 148701
            }
          ]
        },
        {
          "path": "ident",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 3009538,
              "compressed_bytes": 467,
              "uncompressed_bytes": 1240,
              "header_bytes": 16,
              "num_values": 155
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 3010021,
              "compressed_bytes": 35974,
              "uncompressed_bytes": 121958,
              "header_bytes": 39,
              "num_values": 148701
            }
          ]
        },
        {
          "path": "label",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 3046085,
              "compressed_bytes": 19,
              "uncompressed_bytes": 10,
              "header_bytes": 13,
              "num_values": 2
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 3046117,
              "compressed_bytes": 6903,
              "uncompressed_bytes": 13441,
              "header_bytes": 32,
              "num_values": 148701
            }
          ]
        }
      ],
      "row_count": 148701
    },
    {
      "columns": [
        {
          "path": "date",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 3053096,
              "compressed_bytes": 12341,
              "uncompressed_bytes": 21736,
              "header_bytes": 18,
              "num_values": 5434
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 3065455,
              "compressed_bytes": 150138,
              "uncompressed_bytes": 366283,
              "header_bytes": 39,
              "num_values": 264562
            }
          ]
        },
        {
          "path": "ident",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 3215682,
              "compressed_bytes": 475,
              "uncompressed_bytes": 1264,
              "header_bytes": 16,
              "num_values": 158
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 3216173,
              "compressed_bytes": 64166,
              "uncompressed_bytes": 215673,
              "header_bytes": 39,
              "num_values": 264562
            }
          ]
        },
        {
          "path": "label",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 3280429,
              "compressed_bytes": 19,
              "uncompressed_bytes": 10,
              "header_bytes": 13,
              "num_values": 2
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 3280461,
              "compressed_bytes": 12054,
              "uncompressed_bytes": 24018,
              "header_bytes": 33,
              "num_values": 264562
            }
          ]
        }
      ],
      "row_count": 264562
    },
    {
      "columns": [
        {
          "path": "date",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 3292593,
              "compressed_bytes": 11575,
              "uncompressed_bytes": 20396,
              "header_bytes": 18,
              "num_values": 5099
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 3304186,
              "compressed_bytes": 86428,
              "uncompressed_bytes": 205403,
              "header_bytes": 39,
              "num_values": 148790
            }
          ]
        },
        {
          "path": "ident",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 3390703,
              "compressed_bytes": 470,
              "uncompressed_bytes": 1240,
              "header_bytes": 16,
              "num_values": 155
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 3391189,
              "compressed_bytes": 35934,
              "uncompressed_bytes": 121449,
              "header_bytes": 39,
              "num_values": 148790
            }
          ]
        },
        {
          "path": "label",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 3427213,
              "compressed_bytes": 19,
              "uncompressed_bytes": 10,
              "header_bytes": 13,
              "num_values": 2
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 3427245,
              "compressed_bytes": 6976,
              "uncompressed_bytes": 13590,
              "header_bytes": 32,
              "num_values": 148790
            }
          ]
        }
      ],
      "row_count": 148790
    },
    {
      "columns": [
        {
          "path": "date",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 3434297,
              "compressed_bytes": 12320,
              "uncompressed_bytes": 21704,
              "header_bytes": 18,
              "num_values": 5426
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 3446635,
              "compressed_bytes": 150885,
              "uncompressed_bytes": 366781,
              "header_bytes": 39,
              "num_values": 264562
            }
          ]
        },
        {
          "path": "ident",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 3597609,
              "compressed_bytes": 478,
              "uncompressed_bytes": 1264,
              "header_bytes": 16,
              "num_values": 158
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 3598103,
              "compressed_bytes": 64308,
              "uncompressed_bytes": 216793,
              "header_bytes": 39,
              "num_values": 264562
            }
          ]
        },
        {
          "path": "label",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 3662501,
              "compressed_bytes": 19,
              "uncompressed_bytes": 10,
              "header_bytes": 13,
              "num_values": 2
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 3662533,
              "compressed_bytes": 11976,
              "uncompressed_bytes": 23852,
              "header_bytes": 33,
              "num_values": 264562
            }
          ]
        }
      ],
      "row_count": 264562
    },
    {
      "columns": [
        {
          "path": "date",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 3674587,
              "compressed_bytes": 11545,
              "uncompressed_bytes": 20344,
              "header_bytes": 18,
              "num_values": 5086
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 3686150,
              "compressed_bytes": 86843,
              "uncompressed_bytes": 206726,
              "header_bytes": 39,
              "num_values": 148943
            }
          ]
        },
        {
          "path": "ident",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 3773082,
              "compressed_bytes": 462,
              "uncompressed_bytes": 1232,
              "header_bytes": 16,
              "num_values": 154
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 3773560,
              "compressed_bytes": 36254,
              "uncompressed_bytes": 122803,
              "header_bytes": 39,
              "num_values": 148943
            }
          ]
        },
        {
          "path": "label",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 3809904,
              "compressed_bytes": 19,
              "uncompressed_bytes": 10,
              "header_bytes": 13,
              "num_values": 2
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 3809936,
              "compressed_bytes": 7026,
              "uncompressed_bytes": 13490,
              "header_bytes": 32,
              "num_values": 148943
            }
          ]
        }
      ],
      "row_count": 148943
    },
    {
      "columns": [
        {
          "path": "date",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 3817038,
              "compressed_bytes": 12326,
              "uncompressed_bytes": 21708,
              "header_bytes": 18,
              "num_values": 5427
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 3829382,
              "compressed_bytes": 150643,
              "uncompressed_bytes": 366520,
              "header_bytes": 39,
              "num_values": 264562
            }
          ]
        },
        {
          "path": "ident",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 3980114,
              "compressed_bytes": 477,
              "uncompressed_bytes": 1264,
              "header_bytes": 16,
              "num_values": 158
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 3980607,
              "compressed_bytes": 64207,
              "uncompressed_bytes": 216712,
              "header_bytes": 39,
              "num_values": 264562
            }
          ]
        },
        {
          "path": "label",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 4044904,
              "compressed_bytes": 19,
              "uncompressed_bytes": 10,
              "header_bytes": 13,
              "num_values": 2
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 4044936,
              "compressed_bytes": 12013,
              "uncompressed_bytes": 23955,
              "header_bytes": 33,
              "num_values": 264562
            }
          ]
        }
      ],
      "row_count": 264562
    },
    {
      "columns": [
        {
          "path": "date",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 4057027,
              "compressed_bytes": 11581,
              "uncompressed_bytes": 20408,
              "header_bytes": 18,
              "num_values": 5102
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 4068626,
              "compressed_bytes": 86745,
              "uncompressed_bytes": 206617,
              "header_bytes": 39,
              "num_values": 148856
            }
          ]
        },
        {
          "path": "ident",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 4155460,
              "compressed_bytes": 470,
              "uncompressed_bytes": 1248,
              "header_bytes": 16,
              "num_values": 156
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 4155946,
              "compressed_bytes": 35966,
              "uncompressed_bytes": 121804,
              "header_bytes": 39,
              "num_values": 148856
            }
          ]
        },
        {
          "path": "label",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 4192002,
              "compressed_bytes": 19,
              "uncompressed_bytes": 10,
              "header_bytes": 13,
              "num_values": 2
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 4192034,
              "compressed_bytes": 6978,
              "uncompressed_bytes": 13480,
              "header_bytes": 32,
              "num_values": 148856
            }
          ]
        }
      ],
      "row_count": 148856
    },
    {
      "columns": [
        {
          "path": "date",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 4199088,
              "compressed_bytes": 12336,
              "uncompressed_bytes": 21728,
              "header_bytes": 18,
              "num_values": 5432
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 4211442,
              "compressed_bytes": 150475,
              "uncompressed_bytes": 366875,
              "header_bytes": 39,
              "num_values": 264562
            }
          ]
        },
        {
          "path": "ident",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 4362006,
              "compressed_bytes": 477,
              "uncompressed_bytes": 1264,
              "header_bytes": 16,
              "num_values": 158
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 4362499,
              "compressed_bytes": 64285,
              "uncompressed_bytes": 217222,
              "header_bytes": 39,
              "num_values": 264562
            }
          ]
        },
        {
          "path": "label",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 4426874,
              "compressed_bytes": 19,
              "uncompressed_bytes": 10,
              "header_bytes": 13,
              "num_values": 2
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 4426906,
              "compressed_bytes": 11938,
              "uncompressed_bytes": 23863,
              "header_bytes": 33,
              "num_values": 264562
            }
          ]
        }
      ],
      "row_count": 264562
    },
    {
      "columns": [
        {
          "path": "date",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 4438922,
              "compressed_bytes": 11687,
              "uncompressed_bytes": 20596,
              "header_bytes": 18,
              "num_values": 5149
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 4450627,
              "compressed_bytes": 86696,
              "uncompressed_bytes": 205907,
              "header_bytes": 39,
              "num_values": 148628
            }
          ]
        },
        {
          "path": "ident",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 4537412,
              "compressed_bytes": 465,
              "uncompressed_bytes": 1224,
              "header_bytes": 16,
              "num_values": 153
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 4537893,
              "compressed_bytes": 35854,
              "uncompressed_bytes": 121727,
              "header_bytes": 39,
              "num_values": 148628
            }
          ]
        },
        {
          "path": "label",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 4573837,
              "compressed_bytes": 19,
              "uncompressed_bytes": 10,
              "header_bytes": 13,
              "num_values": 2
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 4573869,
              "compressed_bytes": 6955,
              "uncompressed_bytes": 13471,
              "header_bytes": 32,
              "num_values": 148628
            }
          ]
        }
      ],
      "row_count": 148628
    },
    {
      "columns": [
        {
          "path": "date",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 4580900,
              "compressed_bytes": 12336,
              "uncompressed_bytes": 21728,
              "header_bytes": 18,
              "num_values": 5432
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 4593254,
              "compressed_bytes": 149892,
              "uncompressed_bytes": 365900,
              "header_bytes": 39,
              "num_values": 264562
            }
          ]
        },
        {
          "path": "ident",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 4743235,
              "compressed_bytes": 480,
              "uncompressed_bytes": 1264,
              "header_bytes": 16,
              "num_values": 158
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 4743731,
              "compressed_bytes": 64287,
              "uncompressed_bytes": 215980,
              "header_bytes": 39,
              "num_values": 264562
            }
          ]
        },
        {
          "path": "label",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 4808108,
              "compressed_bytes": 19,
              "uncompressed_bytes": 10,
              "header_bytes": 13,
              "num_values": 2
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 4808140,
              "compressed_bytes": 11952,
              "uncompressed_bytes": 23948,
              "header_bytes": 33,
              "num_values": 264562
            }
          ]
        }
      ],
      "row_count": 264562
    },
    {
      "columns": [
        {
          "path": "date",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 4820170,
              "compressed_bytes": 11589,
              "uncompressed_bytes": 20428,
              "header_bytes": 18,
              "num_values": 5107
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 4831777,
              "compressed_bytes": 87168,
              "uncompressed_bytes": 208270,
              "header_bytes": 39,
              "num_values": 149092
            }
          ]
        },
        {
          "path": "ident",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 4919034,
              "compressed_bytes": 474,
              "uncompressed_bytes": 1248,
              "header_bytes": 16,
              "num_values": 156
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 4919524,
              "compressed_bytes": 36175,
              "uncompressed_bytes": 122168,
              "header_bytes": 39,
              "num_values": 149092
            }
          ]
        },
        {
          "path": "label",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 4955789,
              "compressed_bytes": 19,
              "uncompressed_bytes": 10,
              "header_bytes": 13,
              "num_values": 2
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 4955821,
              "compressed_bytes": 7092,
              "uncompressed_bytes": 13760,
              "header_bytes": 32,
              "num_values": 149092
            }
          ]
        }
      ],
      "row_count": 149092
    },
    {
      "columns": [
        {
          "path": "date",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 4962989,
              "compressed_bytes": 12351,
              "uncompressed_bytes": 21752,
              "header_bytes": 18,
              "num_values": 5438
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 4975358,
              "compressed_bytes": 150592,
              "uncompressed_bytes": 367334,
              "header_bytes": 39,
              "num_values": 264562
            }
          ]
        },
        {
          "path": "ident",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 5126039,
              "compressed_bytes": 473,
              "uncompressed_bytes": 1264,
              "header_bytes": 16,
              "num_values": 158
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 5126528,
              "compressed_bytes": 64222,
              "uncompressed_bytes": 215959,
              "header_bytes": 39,
              "num_values": 264562
            }
          ]
        },
        {
          "path": "label",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 5190840,
              "compressed_bytes": 19,
              "uncompressed_bytes": 10,
              "header_bytes": 13,
              "num_values": 2
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 5190872,
              "compressed_bytes": 12050,
              "uncompressed_bytes": 24039,
              "header_bytes": 33,
              "num_values": 264562
            }
          ]
        }
      ],
      "row_count": 264562
    },
    {
      "columns": [
        {
          "path": "date",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 5203000,
              "compressed_bytes": 11534,
              "uncompressed_bytes": 20320,
              "header_bytes": 18,
              "num_values": 5080
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 5214552,
              "compressed_bytes": 86708,
              "uncompressed_bytes": 205898,
              "header_bytes": 39,
              "num_values": 148708
            }
          ]
        },
        {
          "path": "ident",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 5301349,
              "compressed_bytes": 476,
              "uncompressed_bytes": 1256,
              "header_bytes": 16,
              "num_values": 157
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 5301841,
              "compressed_bytes": 36081,
              "uncompressed_bytes": 121193,
              "header_bytes": 39,
              "num_values": 148708
            }
          ]
        },
        {
          "path": "label",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 5338012,
              "compressed_bytes": 19,
              "uncompressed_bytes": 10,
              "header_bytes": 13,
              "num_values": 2
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 5338044,
              "compressed_bytes": 7060,
              "uncompressed_bytes": 13588,
              "header_bytes": 32,
              "num_values": 148708
            }
          ]
        }
      ],
      "row_count": 148708
    },
    {
      "columns": [
        {
          "path": "date",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 5345180,
              "compressed_bytes": 12334,
              "uncompressed_bytes": 21720,
              "header_bytes": 18,
              "num_values": 5430
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 5357532,
              "compressed_bytes": 150163,
              "uncompressed_bytes": 364784,
              "header_bytes": 39,
              "num_values": 264562
            }
          ]
        },
        {
          "path": "ident",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 5507784,
              "compressed_bytes": 478,
              "uncompressed_bytes": 1264,
              "header_bytes": 16,
              "num_values": 158
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 5508278,
              "compressed_bytes": 64180,
              "uncompressed_bytes": 215922,
              "header_bytes": 39,
              "num_values": 264562
            }
          ]
        },
        {
          "path": "label",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 5572548,
              "compressed_bytes": 19,
              "uncompressed_bytes": 10,
              "header_bytes": 13,
              "num_values": 2
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 5572580,
              "compressed_bytes": 11912,
              "uncompressed_bytes": 23712,
              "header_bytes": 33,
              "num_values": 264562
            }
          ]
        }
      ],
      "row_count": 264562
    },
    {
      "columns": [
        {
          "path": "date",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 5584570,
              "compressed_bytes": 11579,
              "uncompressed_bytes": 20396,
              "header_bytes": 18,
              "num_values": 5099
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 5596167,
              "compressed_bytes": 86735,
              "uncompressed_bytes": 206633,
              "header_bytes": 39,
              "num_values": 148709
            }
          ]
        },
        {
          "path": "ident",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 5682991,
              "compressed_bytes": 471,
              "uncompressed_bytes": 1248,
              "header_bytes": 16,
              "num_values": 156
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 5683478,
              "compressed_bytes": 35796,
              "uncompressed_bytes": 121758,
              "header_bytes": 39,
              "num_values": 148709
            }
          ]
        },
        {
          "path": "label",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 5719364,
              "compressed_bytes": 19,
              "uncompressed_bytes": 10,
              "header_bytes": 13,
              "num_values": 2
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 5719396,
              "compressed_bytes": 6882,
              "uncompressed_bytes": 13292,
              "header_bytes": 32,
              "num_values": 148709
            }
          ]
        }
      ],
      "row_count": 148709
    },
    {
      "columns": [
        {
          "path": "date",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 5726354,
              "compressed_bytes": 12368,
              "uncompressed_bytes": 21784,
              "header_bytes": 18,
              "num_values": 5446
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 5738740,
              "compressed_bytes": 150123,
              "uncompressed_bytes": 366118,
              "header_bytes": 39,
              "num_values": 264562
            }
          ]
        },
        {
          "path": "ident",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 5888952,
              "compressed_bytes": 474,
              "uncompressed_bytes": 1256,
              "header_bytes": 16,
              "num_values": 157
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 5889442,
              "compressed_bytes": 64205,
              "uncompressed_bytes": 217537,
              "header_bytes": 39,
              "num_values": 264562
            }
          ]
        },
        {
          "path": "label",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 5953737,
              "compressed_bytes": 19,
              "uncompressed_bytes": 10,
              "header_bytes": 13,
              "num_values": 2
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 5953769,
              "compressed_bytes": 11949,
              "uncompressed_bytes": 23691,
              "header_bytes": 33,
              "num_values": 264562
            }
          ]
        }
      ],
      "row_count": 264562
    },
    {
      "columns": [
        {
          "path": "date",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 5965796,
              "compressed_bytes": 11459,
              "uncompressed_bytes": 20184,
              "header_bytes": 18,
              "num_values": 5046
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 5977273,
              "compressed_bytes": 85714,
              "uncompressed_bytes": 203487,
              "header_bytes": 39,
              "num_values": 148705
            }
          ]
        },
        {
          "path": "ident",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 6063076,
              "compressed_bytes": 462,
              "uncompressed_bytes": 1224,
              "header_bytes": 16,
              "num_values": 153
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 6063554,
              "compressed_bytes": 35758,
              "uncompressed_bytes": 120171,
              "header_bytes": 39,
              "num_values": 148705
            }
          ]
        },
        {
          "path": "label",
          "has_offset_index": true,
          "has_column_index": true,
          "has_bloom_filter": false,
          "pages": [
            {
              "compression": "zstd",
              "encoding": "plain",
              "page_type": "dictionary",
              "offset": 6099402,
              "compressed_bytes": 19,
              "uncompressed_bytes": 10,
              "header_bytes": 13,
              "num_values": 2
            },
            {
              "compression": "zstd",
              "encoding": "rle_dictionary",
              "page_type": "data_page_v1",
              "offset": 6099434,
              "compressed_bytes": 7044,
              "uncompressed_bytes": 13615,
              "header_bytes": 32,
              "num_values": 148705
            }
          ]
        }
      ],
      "row_count": 148705
    }
  ]
}

from polars.

kevinli1993 avatar kevinli1993 commented on June 17, 2024 1

Interesting!

This happened in the middle of my data pipeline (e.g. not reading from a parquet file). So the pattern is

ds = pl.read_parquet(....)

(
    ds
    .lazy()
    .with_columns(...)
    .join(....)
    .with_columns(...)
    .with_columns(...) # adding this line suddenly makes the collect() call slow!
    .collect()
)

In particular, there is no place in the pipeline to call .rechunk()! (e.g. LazyFrames have no .rechunk())

(It seems this behavior is not particular to n_unique() either; other functions could trigger it as well.)

from polars.

orlp avatar orlp commented on June 17, 2024 1

@kszlim I marked your comment as spam not because it is irrelevant but because it made the issue unreadable to scroll through its pages and pages of JSON. Please post that as an attachment file or a link to a gist or similar.

Regarding this snippet:

# This does not happen with all input dataframes.  
# I managed to produce a parquet file showcasing this behavior. 
# The file `weird2.parquet` is attached: https://github.com/pola-rs/polars/files/15424892/weird2.parquet.zip
ds = pl.read_parquet("weird2.parquet")

# This takes around 4 minutes to compute (!!)
print(ds.select(pl.col("label").n_unique().over("date", "ident")))

I can't reproduce on 0.20.28 on Apple M1 (query finishes instantly), but can on 0.20.29 so it's a very recent regression.

from polars.

orlp avatar orlp commented on June 17, 2024 1

The problem is that we were rechunking the entire data for each group in the aggregation. This wasn't exposed before because in the past we always rechunked by default when reading a parquet file.

from polars.

PaulRudin avatar PaulRudin commented on June 17, 2024

Just to add - I seem to have hit this with .over() using one of rolling_mean, len, shift or rank("ordinal") - not sure which yet.

from polars.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.