Comments (3)
One workaround is to convert the value using your own function, for example,
with beam.Pipeline() as pipeline:
(
pipeline
| beam.io.ReadFromCsv("input.csv", dtype=None)
| beam.Map(lambda x: [str(t) for t in x])
| beam.Map(print)
)
from beam.
That workaround code example doesn't work, because ReadFromCsv and WriteToCsv both produce/require schemas.
Even if you were to use a ._asdict() like so
( pipeline | beam.io.ReadFromCsv('/tmp/input.csv', dtype=str)
| beam.Map(lambda x: x._asdict())
| beam.Map(print)
Conversion to str element by element will result in 'None' values instead of empty strings. Values interpreted as floating point may loose precision.
{'a': 'text', 'b': 1, 'c': 21, 'd': 5945023, 'e': 376974, 'f': 0, 'g': 0, 'h': 0, 'i': 1, 'j': 2, 'k': 0, 'l': 4, 'm': None, 'n': None, 'o': None, 'p': None, 'q': None, 'r': None}
Elapsed time 0:00:00.878320
{'a': 'text', 'b': '1', 'c': '21', 'd': '5945023', 'e': '376974', 'f': '0', 'g': '0', 'h': '0', 'i': '1', 'j': '2', 'k': '0', 'l': '4', 'm': None, 'n': None, 'o': None, 'p': None, 'q': None, 'r': None}
Elapsed time 0:00:24.243182
from beam.
My example just shows that you can do the conversation without using dtype=str
. If you need to keep the schemas, you could do something like:
with beam.Pipeline() as pipeline:
(
pipeline
| beam.io.ReadFromCsv("input.csv", dtype=None)
| beam.Map(lambda x: beam.Row(a=x.a, b=str(x.b)))
| beam.io.WriteToCsv("output1.csv")
)
from beam.
Related Issues (20)
- [Bug]: SyntheticUnboundedSource missing records
- JdbcIO informix connection problem
- [Task]: Remove Flink 1.14 and cleanup
- [Failing Test]: The Build python source distribution and wheels job is permared HOT 1
- [Bug]: Beam SQL Extension raised an error when the input row contained iterable fields HOT 1
- [Bug]: IntelliJ dependencies on jars, not modules, due to shadow plugin
- The PostCommit Go VR Flink job is flaky HOT 10
- [Feature Request]: Set quota project in `beam.io.ReadFromBigQuery` HOT 7
- [Failing Test]: Some tests in tox-py38-embeddings are flaky or failing
- [Bug]: Python Pipeline Options Grandchildren Args
- [Task]: Stop using GCR in Beam
- [Bug]: PaneInfo not populated in Go SDK HOT 4
- [Failing Test]: PostCommit Java Dataflow V1 - testFhirIO_Import & SpannerWriteIT > testSequentialWrite HOT 2
- [Bug]: Cannot use python ReadFromKafka via DirectRunner in CI HOT 1
- [Feature Request]: Vertex AI Triton Inference Server Support
- [Failing Test]: PreCommit Java failures: org.apache.beam.runners.spark.CacheTest > cacheCandidatesUpdaterTest HOT 1
- [Feature Request]: Prism Support for Timer and ProcessingTime HOT 5
- [Failing Test]: TypeScript Tests continually failing
- [Bug]: Interactive runner not correctly creating flink cluster HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from beam.