Comments (8)
why mmlspark hasn't more example using scala? I'm more interesting in scala with spark in mmlspark~
from synapseml.
Dear Myasuka,
- For reading CIFAR-10 in Scala, we have a version of the dataset in a zip file hosted on our CDN. I will send you an example as soon as I have verified that it works.
- We would love to have your Scala examples! Could you please let us know how you have them implemented? Is it as a standalone application or does it use the Jupyter Scala kernel? If you could please send us a pointer, we will be happy to work with you to integrate them.
Thank you so much.
from synapseml.
Dear Myasuka,
You can get a zip version of the CIFAR test dataset here: https://mmlspark.azureedge.net/datasets/CIFAR10/test.zip.
Once you copy the zip file to HDFS (or local file system), I just confirmed, you can do something like this in Scala:
import com.microsoft.ml.spark.Readers.implicits._
val images = spark.readImages("file:///home/mmlspark/test.zip", true, 1.0, true)
images.printSchema()
/* This produces
root
|-- image: struct (nullable = true)
| |-- path: string (nullable = true)
| |-- height: integer (nullable = true)
| |-- width: integer (nullable = true)
| |-- type: integer (nullable = true)
| |-- bytes: binary (nullable = true)
*/
images.selectExpr("image.width as w", "image.height as h", "image.bytes as b").show()
/* This produces:
+---+---+--------------------+
| w| h| b|
+---+---+--------------------+
| 32| 32|[5B 65 6D 62 68 6...|
| 32| 32|[01 01 01 01 01 0...|
| 32| 32|[F8 F8 F8 F6 F6 F...|
| 32| 32|[99 98 94 5E 5C 5...|
| 32| 32|[DC DF D7 C5 CE B...|
| 32| 32|[B0 D5 F3 AE D3 F...|
| 32| 32|[6B 38 22 67 36 2...|
| 32| 32|[4C 69 64 53 65 6...|
| 32| 32|[DC A1 70 DC A1 7...|
| 32| 32|[BB CE DF BA CD D...|
| 32| 32|[22 27 29 1D 21 2...|
| 32| 32|[F5 FE FD AE BC B...|
| 32| 32|[C5 BA AA C9 BC A...|
| 32| 32|[95 77 84 95 78 8...|
| 32| 32|[91 B5 C1 93 AC B...|
| 32| 32|[7C 8C A7 72 8B A...|
| 32| 32|[31 39 4E 44 4A 6...|
| 32| 32|[34 3A 3C 28 2E 2...|
| 32| 32|[DA CF CB D9 CD C...|
| 32| 32|[27 AB 9C 25 B1 9...|
+---+---+--------------------+
only showing top 20 rows
*/
Please let me know if this helps. Thank you!
from synapseml.
@drdarshan , thanks for your help.
From your reply, MMLSpark seems can only support to read images instead of original CIFAR binary format from official site with scala, did I misunderstand? That's to say, if we download original binary format file, we need to first transform them into images one by one?
from synapseml.
Hello @Myasuka, yes, you would need to transform the images. I might be able to write a UDF to do this from Scala since looks like they have a binary format in addition to pickle and MAT. Please let me know if that would help and I can write one for you. Thanks!
from synapseml.
Really thanks for your kindness help, I already use my modified cookie-datasets to read original CIFAR10 binary format data.
BTW, I think you can also share the transform script since there maybe someone else want to try MMLSpark with scala but found hard to read original CIFAR10 binary format data without your provided https://mmlspark.azureedge.net/datasets/CIFAR10/test.zip
from synapseml.
Hello @Myasuka, here is a simple Python3 script that extracts the images from the original CIFAR Python dataset and writes them out as PNG images. You can then zip the directory and use spark.readImages
to read it. Please let me know if this is sufficient.
You might need to adapt it in order to also get the labels - please let me know if you need more help with this.
Thank you!
Sudarshan
import os
import tarfile, pickle
import PIL
with tarfile.open("cifar-10-python.tar.gz", "r:gz") as f:
for batch in [p for p in f.getnames() if "_batch" in p]:
print("Extracting: "+batch)
os.makedirs(batch, exist_ok=True)
images = pickle.load(f.extractfile(batch), encoding="latin1")
data, filenames = images["data"], images["filenames"]
for img_data, filename in zip(data, filenames):
img = PIL.Image.fromarray(img_data.reshape(3,32,32).transpose(1,2,0))
img.save(os.path.join(batch, filename))
from synapseml.
Hi @Myasuka, I'm closing this issue for now.. please reopen if you are still blocked. Thank you!
from synapseml.
Related Issues (20)
- [BUG] Synapse GPT-4, OpenAIChatCompletion, API documentation: mandatory "name" field not mentioned in documentation for "messages" HOT 1
- [BUG] Running Inference from ONNX model HOT 1
- 12 HOT 1
- [BUG] writeToAzureSearch fails when the index has custom analyzers or tokenizers since 0.11.0 HOT 1
- tt HOT 2
- Why does the knn calculation return the farthest distance HOT 1
- [BUG] could not parse main worker ipv6 host and port correctly HOT 1
- [BUG] LightGBM MLFlow autolog not logging metrics
- [BUG] synapse.ml.cognitive Detect transform error
- [BUG] Databricks 14.3 LTS usage of internal _jvm variable is no longer supported
- [BUG]can't download error :org.apache.commons#commons-math3;3.2!commons-math3.jar
- [BUG] Synapse ML Developer Docs are broken when you go into the submodules
- 1.02 lightgbm glibc error
- How to use text-embedding-3-small with different output dimensions? HOT 1
- Please provide detailed Scala documentation.
- [BUG]NoSuchMethodError breeze.linalg.SliceVector HOT 2
- [BUG] `Java heap space` when running Tabular Explainers.ipynb
- Support for Microsoft Entra authentication
- [BUG] `java.lang.NoSuchMethodError` when using `ComputeModelStatistics.transform()`
- [BUG] java.lang.NoSuchMethodError when using synapse.ml.lightgbm
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from synapseml.