Comments (2)
Thanks for reporting, can you provide a sample file for reproducing this issue?
from hybridbackend.
Thanks for reporting, can you provide a sample file for reproducing this issue?
(1) Generate a parquet file by running following code
import numpy as np
import pandas as pd
import random
data_list = []
for i in range(1, 10000):
int_feature = random.randint(1, 100)
# float_feature = random.random()
array_feature = [random.randint(1, 10) for x in range(0, 4)]
data_list.append([int_feature, array_feature])
df = pd.DataFrame(data_list, columns=["int_feature", "array_feature"])
df.to_parquet("parquet_sample_file.parquet")
(2) Load generated parquet file by HybridBackend will reproduce this issue
filenames_ds = tf.data.Dataset.from_tensor_slices(["parquet_sample_file.parquet"])
hb_fields = []
hb_fields.append(hb.data.DataFrame.Field("int_feature", tf.int64, ragged_rank=0))
# hb_fields.append(hb.data.DataFrame.Field("float_feature", tf.float32, ragged_rank=0))
hb_fields.append(hb.data.DataFrame.Field("array_feature", tf.int64, ragged_rank=1))
iterator = filenames_ds.apply(hb.data.read_parquet(100, hb_fields, num_parallel_reads=tf.data.experimental.AUTOTUNE))
iterator = iterator.apply(hb.data.rebatch(100, fields=hb_fields)).repeat(30)
iterator = iterator.make_one_shot_iterator()
item = iterator.get_next()
with tf.Session() as sess:
print("====== start ======")
total_batch_size = 0
while True:
try:
a = sess.run(item)
except tf.errors.OutOfRangeError:
break
from hybridbackend.
Related Issues (20)
- merge embedding table
- Following the BUILD.md tutorial, something is wrong
- How to place the embeddings on gpu?
- ParquetDataset should be able to skip corrupted data
- QR code is invalid HOT 2
- Row-wise shuffling required
- EmbeddingLookupRewritingForDeepRecEV Add "part0" to op-name twice
- hb.keras.model evaluate error
- init_from_checkpoint throw Exception when using hb.keras.Model HOT 1
- hb.data.ParquetDataset will discard some data
- Failed to train with multiple GPUs in single node
- Deeprec hangs in distributed mode.
- Throughput is lower than TFRecords when there are many strings in Parquets file
- Exception occurs when call `batch` with ragged tensor
- tf.keras.layers.DenseFeatures api as the candidate of hb.feature_column.DenseFeatures can not work with tf.feature_column.shared_embedding_columns HOT 1
- How to realize gradient truncation function in HB Pkg
- DLRM model on A100 8cards training HOT 1
- Sync training with ParquetDataset, Use PS-Worker,The system may block because some worker stop early.
- ParquetDataset benchmark add tfrecord data
- How to pad a column to specific size when using hb.data.ParquetDataset ?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hybridbackend.