Giter VIP home page Giter VIP logo

parquet-maria's Introduction

parquet-maria

Current - parsing an example parquet file generated from python script below

sauce: https://arrow.apache.org/docs/python/parquet.html#reading-and-writing-single-files

import numpy as np
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq

df = pd.DataFrame({'one': [-1, np.nan, 2.5],
                   'two': ['foo', 'bar', 'baz'],
                   'three': [True, False, True]},
                   index=list('abc'))
table = pa.Table.from_pandas(df)
pq.write_table(table, 'example.parquet')

hexdump of the generated example.parquet file

00000000: 5041 5231 1504 1520 1524 4c15 0415 0012  PAR1... .$L.....
00000010: 0000 103c 0000 0000 0000 f0bf 0000 0000  ...<............
00000020: 0000 0440 1500 1512 1516 2c15 0615 1015  ...@......,.....
00000030: 0615 061c 1808 0000 0000 0000 0440 1808  .............@..
00000040: 0000 0000 0000 f0bf 1602 2808 0000 0000  ..........(.....
00000050: 0000 0440 1808 0000 0000 0000 f0bf 0000  ...@............
00000060: 0009 2002 0000 0003 0501 0302 26d8 011c  .. .........&...
00000070: 150a 1935 1000 0619 1803 6f6e 6515 0216  ...5......one...
00000080: 0616 c801 16d0 0126 4826 081c 1808 0000  .......&H&......
00000090: 0000 0000 0440 1808 0000 0000 0000 f0bf  .....@..........
000000a0: 1602 2808 0000 0000 0000 0440 1808 0000  ..(........@....
000000b0: 0000 0000 f0bf 0019 2c15 0415 0015 0200  ........,.......
000000c0: 1500 1510 1502 0000 0015 0415 2a15 2e4c  ............*..L
000000d0: 1506 1500 1200 0015 5003 0000 0066 6f6f  ........P....foo
000000e0: 0300 0000 6261 7203 0000 0062 617a 1500  ....bar....baz..
000000f0: 1514 1518 2c15 0615 1015 0615 061c 3600  ....,.........6.
00000100: 2803 666f 6f18 0362 6172 0000 000a 2402  (.foo..bar....$.
00000110: 0000 0006 0102 0324 0026 b204 1c15 0c19  .......$.&......
00000120: 3510 0006 1918 0374 776f 1502 1606 1698  5......two......
00000130: 0116 a001 26dc 0326 9203 1c36 0028 0366  ....&..&...6.(.f
00000140: 6f6f 1803 6261 7200 192c 1504 1500 1502  oo..bar..,......
00000150: 0015 0015 1015 0200 0000 1500 150e 1512  ................
00000160: 2c15 0615 0015 0615 061c 1801 0118 0100  ,...............
00000170: 1600 2801 0118 0100 0000 0007 1802 0000  ..(.............
00000180: 0006 0105 2688 061c 1500 1925 0006 1918  ....&......%....
00000190: 0574 6872 6565 1502 1606 1650 1654 26b4  .three.....P.T&.
000001a0: 053c 1801 0118 0100 1600 2801 0118 0100  .<........(.....
000001b0: 0019 1c15 0015 0015 0200 0000 1504 151e  ................
000001c0: 1522 4c15 0615 0012 0000 0f38 0100 0000  ."L........8....
000001d0: 6101 0000 0062 0100 0000 6315 0015 1415  a....b....c.....
000001e0: 182c 1506 1510 1506 1506 1c36 0028 0163  .,.........6.(.c
000001f0: 1801 6100 0000 0a24 0200 0000 0601 0203  ..a....$........
00000200: 2400 2684 081c 150c 1935 1000 0619 1811  $.&......5......
00000210: 5f5f 696e 6465 785f 6c65 7665 6c5f 305f  __index_level_0_
00000220: 5f15 0216 0616 8401 168c 0126 b607 26f8  _..........&..&.
00000230: 061c 3600 2801 6318 0161 0019 2c15 0415  ..6.(.c..a..,...
00000240: 0015 0200 1500 1510 1502 0000 0015 0419  ................
00000250: 5c35 0018 0673 6368 656d 6115 0800 150a  \5...schema.....
00000260: 2502 1803 6f6e 6500 150c 2502 1803 7477  %...one...%...tw
00000270: 6f25 004c 1c00 0000 1500 2502 1805 7468  o%.L......%...th
00000280: 7265 6500 150c 2502 1811 5f5f 696e 6465  ree...%...__inde
00000290: 785f 6c65 7665 6c5f 305f 5f25 004c 1c00  x_level_0__%.L..
000002a0: 0000 1606 191c 194c 26d8 011c 150a 1935  .......L&......5
000002b0: 1000 0619 1803 6f6e 6515 0216 0616 c801  ......one.......
000002c0: 16d0 0126 4826 081c 1808 0000 0000 0000  ...&H&..........
000002d0: 0440 1808 0000 0000 0000 f0bf 1602 2808  .@............(.
000002e0: 0000 0000 0000 0440 1808 0000 0000 0000  .......@........
000002f0: f0bf 0019 2c15 0415 0015 0200 1500 1510  ....,...........
00000300: 1502 0000 0026 b204 1c15 0c19 3510 0006  .....&......5...
00000310: 1918 0374 776f 1502 1606 1698 0116 a001  ...two..........
00000320: 26dc 0326 9203 1c36 0028 0366 6f6f 1803  &..&...6.(.foo..
00000330: 6261 7200 192c 1504 1500 1502 0015 0015  bar..,..........
00000340: 1015 0200 0000 2688 061c 1500 1925 0006  ......&......%..
00000350: 1918 0574 6872 6565 1502 1606 1650 1654  ...three.....P.T
00000360: 26b4 053c 1801 0118 0100 1600 2801 0118  &..<........(...
00000370: 0100 0019 1c15 0015 0015 0200 0000 2684  ..............&.
00000380: 081c 150c 1935 1000 0619 1811 5f5f 696e  .....5......__in
00000390: 6465 785f 6c65 7665 6c5f 305f 5f15 0216  dex_level_0__...
000003a0: 0616 8401 168c 0126 b607 26f8 061c 3600  .......&..&...6.
000003b0: 2801 6318 0161 0019 2c15 0415 0015 0200  (.c..a..,.......
000003c0: 1500 1510 1502 0000 0016 b404 1606 2608  ..............&.
000003d0: 16d0 0414 0000 192c 1806 7061 6e64 6173  .......,..pandas
000003e0: 18c9 057b 2269 6e64 6578 5f63 6f6c 756d  ...{"index_colum
000003f0: 6e73 223a 205b 225f 5f69 6e64 6578 5f6c  ns": ["__index_l
00000400: 6576 656c 5f30 5f5f 225d 2c20 2263 6f6c  evel_0__"], "col
00000410: 756d 6e5f 696e 6465 7865 7322 3a20 5b7b  umn_indexes": [{
00000420: 226e 616d 6522 3a20 6e75 6c6c 2c20 2266  "name": null, "f
00000430: 6965 6c64 5f6e 616d 6522 3a20 6e75 6c6c  ield_name": null
00000440: 2c20 2270 616e 6461 735f 7479 7065 223a  , "pandas_type":
00000450: 2022 756e 6963 6f64 6522 2c20 226e 756d   "unicode", "num
00000460: 7079 5f74 7970 6522 3a20 226f 626a 6563  py_type": "objec
00000470: 7422 2c20 226d 6574 6164 6174 6122 3a20  t", "metadata": 
00000480: 7b22 656e 636f 6469 6e67 223a 2022 5554  {"encoding": "UT
00000490: 462d 3822 7d7d 5d2c 2022 636f 6c75 6d6e  F-8"}}], "column
000004a0: 7322 3a20 5b7b 226e 616d 6522 3a20 226f  s": [{"name": "o
000004b0: 6e65 222c 2022 6669 656c 645f 6e61 6d65  ne", "field_name
000004c0: 223a 2022 6f6e 6522 2c20 2270 616e 6461  ": "one", "panda
000004d0: 735f 7479 7065 223a 2022 666c 6f61 7436  s_type": "float6
000004e0: 3422 2c20 226e 756d 7079 5f74 7970 6522  4", "numpy_type"
000004f0: 3a20 2266 6c6f 6174 3634 222c 2022 6d65  : "float64", "me
00000500: 7461 6461 7461 223a 206e 756c 6c7d 2c20  tadata": null}, 
00000510: 7b22 6e61 6d65 223a 2022 7477 6f22 2c20  {"name": "two", 
00000520: 2266 6965 6c64 5f6e 616d 6522 3a20 2274  "field_name": "t
00000530: 776f 222c 2022 7061 6e64 6173 5f74 7970  wo", "pandas_typ
00000540: 6522 3a20 2275 6e69 636f 6465 222c 2022  e": "unicode", "
00000550: 6e75 6d70 795f 7479 7065 223a 2022 6f62  numpy_type": "ob
00000560: 6a65 6374 222c 2022 6d65 7461 6461 7461  ject", "metadata
00000570: 223a 206e 756c 6c7d 2c20 7b22 6e61 6d65  ": null}, {"name
00000580: 223a 2022 7468 7265 6522 2c20 2266 6965  ": "three", "fie
00000590: 6c64 5f6e 616d 6522 3a20 2274 6872 6565  ld_name": "three
000005a0: 222c 2022 7061 6e64 6173 5f74 7970 6522  ", "pandas_type"
000005b0: 3a20 2262 6f6f 6c22 2c20 226e 756d 7079  : "bool", "numpy
000005c0: 5f74 7970 6522 3a20 2262 6f6f 6c22 2c20  _type": "bool", 
000005d0: 226d 6574 6164 6174 6122 3a20 6e75 6c6c  "metadata": null
000005e0: 7d2c 207b 226e 616d 6522 3a20 6e75 6c6c  }, {"name": null
000005f0: 2c20 2266 6965 6c64 5f6e 616d 6522 3a20  , "field_name": 
00000600: 225f 5f69 6e64 6578 5f6c 6576 656c 5f30  "__index_level_0
00000610: 5f5f 222c 2022 7061 6e64 6173 5f74 7970  __", "pandas_typ
00000620: 6522 3a20 2275 6e69 636f 6465 222c 2022  e": "unicode", "
00000630: 6e75 6d70 795f 7479 7065 223a 2022 6f62  numpy_type": "ob
00000640: 6a65 6374 222c 2022 6d65 7461 6461 7461  ject", "metadata
00000650: 223a 206e 756c 6c7d 5d2c 2022 6372 6561  ": null}], "crea
00000660: 746f 7222 3a20 7b22 6c69 6272 6172 7922  tor": {"library"
00000670: 3a20 2270 7961 7272 6f77 222c 2022 7665  : "pyarrow", "ve
00000680: 7273 696f 6e22 3a20 2239 2e30 2e30 227d  rsion": "9.0.0"}
00000690: 2c20 2270 616e 6461 735f 7665 7273 696f  , "pandas_versio
000006a0: 6e22 3a20 2231 2e35 2e33 227d 0018 0c41  n": "1.5.3"}...A
000006b0: 5252 4f57 3a73 6368 656d 6118 ec0a 2f2f  RROW:schema...//
000006c0: 2f2f 2f77 6745 4141 4151 4141 4141 4141  ///wgEAAAQAAAAAA
000006d0: 414b 4141 3441 4267 4146 4141 6741 4367  AKAA4ABgAFAAgACg
000006e0: 4141 4141 4142 4241 4151 4141 4141 4141  AAAAABBAAQAAAAAA
000006f0: 414b 4141 7741 4141 4145 4141 6741 4367  AKAAwAAAAEAAgACg
00000700: 4141 4141 4144 4141 4145 4141 4141 4151  AAAAADAAAEAAAAAQ
00000710: 4141 4141 7741 4141 4149 4141 7741 4241  AAAAwAAAAIAAwABA
00000720: 4149 4141 6741 4141 4149 4141 4141 4541  AIAAgAAAAIAAAAEA
00000730: 4141 4141 5941 4141 4277 5957 356b 5958  AAAAYAAABwYW5kYX
00000740: 4d41 414d 6b43 4141 4237 496d 6c75 5a47  MAAMkCAAB7ImluZG
00000750: 5634 5832 4e76 6248 5674 626e 4d69 4f69  V4X2NvbHVtbnMiOi
00000760: 4262 496c 3966 6157 356b 5a58 6866 6247  BbIl9faW5kZXhfbG
00000770: 5632 5a57 7866 4d46 3966 496c 3073 4943  V2ZWxfMF9fIl0sIC
00000780: 4a6a 6232 7831 6257 3566 6157 356b 5a58  Jjb2x1bW5faW5kZX
00000790: 686c 6379 4936 4946 7437 496d 3568 6257  hlcyI6IFt7Im5hbW
000007a0: 5569 4f69 4275 6457 7873 4c43 4169 5a6d  UiOiBudWxsLCAiZm
000007b0: 6c6c 6247 5266 626d 4674 5a53 4936 4947  llbGRfbmFtZSI6IG
000007c0: 3531 6247 7773 4943 4a77 5957 356b 5958  51bGwsICJwYW5kYX
000007d0: 4e66 6448 6c77 5a53 4936 4943 4a31 626d  NfdHlwZSI6ICJ1bm
000007e0: 6c6a 6232 526c 4969 7767 496d 3531 6258  ljb2RlIiwgIm51bX
000007f0: 4235 5833 5235 6347 5569 4f69 4169 6232  B5X3R5cGUiOiAib2
00000800: 4a71 5a57 4e30 4969 7767 496d 316c 6447  JqZWN0IiwgIm1ldG
00000810: 466b 5958 5268 496a 6f67 6579 4a6c 626d  FkYXRhIjogeyJlbm
00000820: 4e76 5a47 6c75 5a79 4936 4943 4a56 5645  NvZGluZyI6ICJVVE
00000830: 5974 4f43 4a39 6656 3073 4943 4a6a 6232  YtOCJ9fV0sICJjb2
00000840: 7831 6257 357a 496a 6f67 5733 7369 626d  x1bW5zIjogW3sibm
00000850: 4674 5a53 4936 4943 4a76 626d 5569 4c43  FtZSI6ICJvbmUiLC
00000860: 4169 5a6d 6c6c 6247 5266 626d 4674 5a53  AiZmllbGRfbmFtZS
00000870: 4936 4943 4a76 626d 5569 4c43 4169 6347  I6ICJvbmUiLCAicG
00000880: 4675 5a47 467a 5833 5235 6347 5569 4f69  FuZGFzX3R5cGUiOi
00000890: 4169 5a6d 7876 5958 5132 4e43 4973 4943  AiZmxvYXQ2NCIsIC
000008a0: 4a75 6457 3177 6556 3930 6558 426c 496a  JudW1weV90eXBlIj
000008b0: 6f67 496d 5a73 6232 4630 4e6a 5169 4c43  ogImZsb2F0NjQiLC
000008c0: 4169 6257 5630 5957 5268 6447 4569 4f69  AibWV0YWRhdGEiOi
000008d0: 4275 6457 7873 6653 7767 6579 4a75 5957  BudWxsfSwgeyJuYW
000008e0: 316c 496a 6f67 496e 5233 6279 4973 4943  1lIjogInR3byIsIC
000008f0: 4a6d 6157 5673 5a46 3975 5957 316c 496a  JmaWVsZF9uYW1lIj
00000900: 6f67 496e 5233 6279 4973 4943 4a77 5957  ogInR3byIsICJwYW
00000910: 356b 5958 4e66 6448 6c77 5a53 4936 4943  5kYXNfdHlwZSI6IC
00000920: 4a31 626d 6c6a 6232 526c 4969 7767 496d  J1bmljb2RlIiwgIm
00000930: 3531 6258 4235 5833 5235 6347 5569 4f69  51bXB5X3R5cGUiOi
00000940: 4169 6232 4a71 5a57 4e30 4969 7767 496d  Aib2JqZWN0IiwgIm
00000950: 316c 6447 466b 5958 5268 496a 6f67 626e  1ldGFkYXRhIjogbn
00000960: 5673 6248 3073 4948 7369 626d 4674 5a53  VsbH0sIHsibmFtZS
00000970: 4936 4943 4a30 6148 4a6c 5a53 4973 4943  I6ICJ0aHJlZSIsIC
00000980: 4a6d 6157 5673 5a46 3975 5957 316c 496a  JmaWVsZF9uYW1lIj
00000990: 6f67 496e 526f 636d 566c 4969 7767 496e  ogInRocmVlIiwgIn
000009a0: 4268 626d 5268 6331 3930 6558 426c 496a  BhbmRhc190eXBlIj
000009b0: 6f67 496d 4a76 6232 7769 4c43 4169 626e  ogImJvb2wiLCAibn
000009c0: 5674 6348 6c66 6448 6c77 5a53 4936 4943  VtcHlfdHlwZSI6IC
000009d0: 4a69 6232 3973 4969 7767 496d 316c 6447  Jib29sIiwgIm1ldG
000009e0: 466b 5958 5268 496a 6f67 626e 5673 6248  FkYXRhIjogbnVsbH
000009f0: 3073 4948 7369 626d 4674 5a53 4936 4947  0sIHsibmFtZSI6IG
00000a00: 3531 6247 7773 4943 4a6d 6157 5673 5a46  51bGwsICJmaWVsZF
00000a10: 3975 5957 316c 496a 6f67 496c 3966 6157  9uYW1lIjogIl9faW
00000a20: 356b 5a58 6866 6247 5632 5a57 7866 4d46  5kZXhfbGV2ZWxfMF
00000a30: 3966 4969 7767 496e 4268 626d 5268 6331  9fIiwgInBhbmRhc1
00000a40: 3930 6558 426c 496a 6f67 496e 5675 6157  90eXBlIjogInVuaW
00000a50: 4e76 5a47 5569 4c43 4169 626e 5674 6348  NvZGUiLCAibnVtcH
00000a60: 6c66 6448 6c77 5a53 4936 4943 4a76 596d  lfdHlwZSI6ICJvYm
00000a70: 706c 5933 5169 4c43 4169 6257 5630 5957  plY3QiLCAibWV0YW
00000a80: 5268 6447 4569 4f69 4275 6457 7873 6656  RhdGEiOiBudWxsfV
00000a90: 3073 4943 4a6a 636d 5668 6447 3979 496a  0sICJjcmVhdG9yIj
00000aa0: 6f67 6579 4a73 6157 4a79 5958 4a35 496a  ogeyJsaWJyYXJ5Ij
00000ab0: 6f67 496e 4235 5958 4a79 6233 6369 4c43  ogInB5YXJyb3ciLC
00000ac0: 4169 646d 5679 6332 6c76 6269 4936 4943  AidmVyc2lvbiI6IC
00000ad0: 4935 4c6a 4175 4d43 4a39 4c43 4169 6347  I5LjAuMCJ9LCAicG
00000ae0: 4675 5a47 467a 5833 5a6c 636e 4e70 6232  FuZGFzX3ZlcnNpb2
00000af0: 3469 4f69 4169 4d53 3431 4c6a 4d69 6651  4iOiAiMS41LjMifQ
00000b00: 4141 4141 5141 4141 436b 4141 4141 6141  AAAAQAAACkAAAAaA
00000b10: 4141 4144 7741 4141 4145 4141 4141 6650  AAADwAAAAEAAAAfP
00000b20: 2f2f 2f77 4141 4151 5551 4141 4141 4a41  ///wAAAQUQAAAAJA
00000b30: 4141 4141 5141 4141 4141 4141 4141 4551  AAAAQAAAAAAAAAEQ
00000b40: 4141 4146 3966 6157 356b 5a58 6866 6247  AAAF9faW5kZXhfbG
00000b50: 5632 5a57 7866 4d46 3966 4141 4141 7450  V2ZWxfMF9fAAAAtP
00000b60: 2f2f 2f37 442f 2f2f 3841 4141 4547 4541  ///7D///8AAAEGEA
00000b70: 4141 4142 6741 4141 4145 4141 4141 4141  AAABgAAAAEAAAAAA
00000b80: 4141 4141 5541 4141 4230 6148 4a6c 5a51  AAAAUAAAB0aHJlZQ
00000b90: 4141 414e 7a2f 2f2f 2f59 2f2f 2f2f 4141  AAANz////Y////AA
00000ba0: 4142 4252 4141 4141 4159 4141 4141 4241  ABBRAAAAAYAAAABA
00000bb0: 4141 4141 4141 4141 4144 4141 4141 6448  AAAAAAAAADAAAAdH
00000bc0: 6476 4141 5141 4241 4145 4141 4141 4541  dvAAQABAAEAAAAEA
00000bd0: 4155 4141 6741 4267 4148 4141 7741 4141  AUAAgABgAHAAwAAA
00000be0: 4151 4142 4141 4141 4141 4141 4544 4541  AQABAAAAAAAAEDEA
00000bf0: 4141 4142 7741 4141 4145 4141 4141 4141  AAABwAAAAEAAAAAA
00000c00: 4141 4141 4d41 4141 4276 626d 5541 4141  AAAAMAAABvbmUAAA
00000c10: 4147 4141 6741 4267 4147 4141 4141 4141  AGAAgABgAGAAAAAA
00000c20: 4143 4141 4141 4141 413d 0018 1f70 6172  ACAAAAAAA=...par
00000c30: 7175 6574 2d63 7070 2d61 7272 6f77 2076  quet-cpp-arrow v
00000c40: 6572 7369 6f6e 2039 2e30 2e30 194c 1c00  ersion 9.0.0.L..
00000c50: 001c 0000 1c00 001c 0000 000e 0a00 0050  ...............P
00000c60: 4152 31                                  AR1

parquet-maria's People

Contributors

sufyansuhaimi-grabtaxi avatar dolpheyn avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.