Giter VIP home page Giter VIP logo

Comments (51)

StevenLColeman42 avatar StevenLColeman42 commented on August 26, 2024 4

from scio-read.

kebasaa avatar kebasaa commented on August 26, 2024 1

@JanBessai You're absolutely right, the Android app doesn't decode anything. The webservice does. It sends the data to a webservice that sends back the decoded data, which interestingly is saved in a log file on the phone. Meaning that I've managed to get some readings of raw and corresponding decoded data. I'll upload it ASAP

from scio-read.

kebasaa avatar kebasaa commented on August 26, 2024

Hi onoff0, great that you found my project.
Essentially, I haven't written the python interface to read the data from the SCIO yet. I read some data "manually", which is in the examples folder, and "read_spectrum.py" will read those files and try to extract the spectrum from them. Unfortunately, the data is encoded (as you can see in the "readme" file), and I still haven't managed to decode it. I could use some help with that. Once I know how to decode that data (which is what I need help with), writing a bluetooth interface to make the SCIO work will be easy. I would really appreciate if you have ANY idea how to proceed

Now, in order to get read_spectrum.py to work with file1.txt, I'd need to know a few more details of what exactly your errors are. Also, try with file2.txt or something like that.

from scio-read.

onoff0 avatar onoff0 commented on August 26, 2024

from scio-read.

kebasaa avatar kebasaa commented on August 26, 2024

Hi,

No problem, I am happy to explain.

I don't have a developer license. Therefore, I used bluetooth to collect the encoded raw data. Now I need to decode it. For this, I need to understand what the structure of the data is.

If you want to collect your own raw data from the SCIO, here is a step by step instruction:

  1. On Linux, install gatttool and hcitool. I'm using Ubuntu, to install:
    sudo apt-get install bluez
  1. Turn on your SCIO with a long press on the button

  2. Run hcitool to find out what your SCIO's MAC address is. It will have a name like SCiOmyScio or whatever you named it:

    sudo hcitool lescan
  1. Run gatttool with your SCIO's MAC address to collect your own data. This will store it in "file1.txt". Replace xx:xx:xx:xx:xx:xx with the MAC address you found in step 3. During the scan, the SCIO indicator light will be yellow.
    sudo gatttool -i hci0 -b xx:xx:xx:xx:xx:xx --char-write-req -a 0x0029 -n 01ba020000 --listen > file1.txt
  1. Stop saving data to your file with Ctrl+C after the indicator light of the SCIO goes back to blue.

  2. In a text editor, edit your file1.txt: Remove the first line saying "Characteristic value was written successfully" and in the beginning of each line remove "Notification handle = 0x0025 value: ". Then save the file

from scio-read.

kebasaa avatar kebasaa commented on August 26, 2024

The main problem now is to take this series of hex values, and to understand what they mean, i.e. how to extract the spectrum from them. This is what I need help with. It turns out that the first hex value of each line identifies the line, i.e. 01 is the first line, etc. Values go from 01 to 5f and repeat 3 times, although for the 3rd scan, they go from 01 to 58 only

from scio-read.

onoff0 avatar onoff0 commented on August 26, 2024

from scio-read.

kebasaa avatar kebasaa commented on August 26, 2024

Hi

Well, the developer license basically allows the android app to export a CSV or JSON file, which the server decoded from the raw hexadecimal values. The problem is that we don't know how the server decodes it, and so I thought that if I have a copy of raw hexadecimal data and of a CSV, I could try to figure it out.

What I know from people online is that the CSV or JSON that is exported supposedly contains 400 values of reflectance for each band from 700nm to 1100nm. After pre-processing the hexadecimal values, for each scans, there are 3 readings: Two times 1800 hexadecimal values, and one time 1656 hex values.

I do not think that the hexadecimal data is encrypted because the microprocessor inside the SCIO is quite weak. But the data is encoded. So the server does not send a decryption key, it only does some mathematics with the 3 readings of each scan. I think the second or third reading is related to calibration.

Do you have any idea how to proceed?

from scio-read.

onoff0 avatar onoff0 commented on August 26, 2024

from scio-read.

franklin02 avatar franklin02 commented on August 26, 2024

Please provide your email address. I can send you a spectral data (CSV file).

from scio-read.

kebasaa avatar kebasaa commented on August 26, 2024

Please provide your email address. I can send you a spectral data (CSV file).

@franklin02 I sent you an email. Will put your files in the example-data folder as soon as I have them. Thanks a lot!!!

from scio-read.

kebasaa avatar kebasaa commented on August 26, 2024

@franklin02 Thanks a lot for the file, I added it to the example-data folder. Unfortunately, this is not what the SCIO with a developer license creates (that is supposed to include the spectrum, raw irradiance of the sensor and calibration data). Rather, the file you sent is a collection of scans of different materials, containing only the reflectance spectra. I have no idea how useful it will be....

from scio-read.

kebasaa avatar kebasaa commented on August 26, 2024

@onoff0 I updated the script to properly read the scio raw data. The decoding is still a bit unclear though, but I now know that 5 hex values are one 1nm band. The conversion to a float is still a bit unclear...

from scio-read.

onoff0 avatar onoff0 commented on August 26, 2024

from scio-read.

kebasaa avatar kebasaa commented on August 26, 2024

It's an educated guess: The data comes in 3 parts. Each part has a header that contains some sort of identifier. After that, there are 1800, 1800 and 1656 hex values left. Knowing that we have 1nm bands, and 331 of them (counting from 740 to 1070nm). So I tried to divide 1800 by 4, 5 or 6 and never had success. Then, I divided 1656 by 331 and I got 5, with 1 hex value left (I assume it's a divider or something). Now, it makes sense that all the 3 parts contain the same data structure. So I did 1800 - 1656 and I got 144, which is a very nice round number indicating that it's a header or something. This means that either the data of the 2 first parts (with 1800 hex values) are either preceded or followed by a block of 144 values. I don't know which one yet.

from scio-read.

kebasaa avatar kebasaa commented on August 26, 2024

@onoff0 Did you receive my e-mail address?

from scio-read.

onoff0 avatar onoff0 commented on August 26, 2024

from scio-read.

JanBessai avatar JanBessai commented on August 26, 2024

If you are (like me) not a fan of Bluetooth you can do pretty much the same via USB:
When I connect the scio to my linux computer via USB I get

$ dmesg
[197481.871683] usb 2-1: new full-speed USB device number 18 using xhci_hcd
[197482.003419] usb 2-1: New USB device found, idVendor=0451, idProduct=16aa, bcdDevice= 0.09
[197482.003425] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[197482.003428] usb 2-1: Product: CP SCIO USB CDC
[197482.003431] usb 2-1: Manufacturer: Texas Instruments
[197482.003433] usb 2-1: SerialNumber: xxxxxxxxxxxxxxx
[197482.006466] cdc_acm 2-1:1.0: ttyACM0: USB ACM device

I can then do:

cat /dev/ttyACM0 | hexdump -C

in one console and

echo -n -e "\x01\xba\x02\x00\x00" > /dev/ttyACM0

in the other to obtain a message in the format described above.
This is a bit more lightweight since it does not require extra tools.

from scio-read.

JanBessai avatar JanBessai commented on August 26, 2024

Another hint:
Blackfin processors normally use fixedpoint arithmetic.

It is a wild guess but I'd expect samples from the device to be 16 bit and normalized to [-1; 1).

I've also written a small program to try decoding floats with 4, 8, and 16 byte, skipping 0-800 bits from the beginning and using big and little endian. My results so far were values which had to much variance (skipping from e-100 to e100 in adjacent samples) or tons of outliers (NaN, -Infinity, Infinity). I think this makes float samples less probable.

from scio-read.

kebasaa avatar kebasaa commented on August 26, 2024

If you are (like me) not a fan of Bluetooth you can do pretty much the same via USB:
When I connect the scio to my linux computer via USB I get

$ dmesg
[197481.871683] usb 2-1: new full-speed USB device number 18 using xhci_hcd
[197482.003419] usb 2-1: New USB device found, idVendor=0451, idProduct=16aa, bcdDevice= 0.09
[197482.003425] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[197482.003428] usb 2-1: Product: CP SCIO USB CDC
[197482.003431] usb 2-1: Manufacturer: Texas Instruments
[197482.003433] usb 2-1: SerialNumber: xxxxxxxxxxxxxxx
[197482.006466] cdc_acm 2-1:1.0: ttyACM0: USB ACM device

I can then do:

cat /dev/ttyACM0 | hexdump -C

in one console and

echo -n -e "\x01\xba\x02\x00\x00" > /dev/ttyACM0

in the other to obtain a message in the format described above.
This is a bit more lightweight since it does not require extra tools.

I like this very much. Beautiful. I've been working on getting bluetooth measurements to work, with a lot of difficulty, so the USB method is really helpful. Also, if we manage to decode the data, it will be useful to install on a robotic arm, for example, and not be worried about recharging all the time

from scio-read.

kebasaa avatar kebasaa commented on August 26, 2024

Another hint:
Blackfin processors normally use fixedpoint arithmetic.

It is a wild guess but I'd expect samples from the device to be 16 bit and normalized to [-1; 1).

I've also written a small program to try decoding floats with 4, 8, and 16 byte, skipping 0-800 bits from the beginning and using big and little endian. My results so far were values which had to much variance (skipping from e-100 to e100 in adjacent samples) or tons of outliers (NaN, -Infinity, Infinity). I think this makes float samples less probable.

This is interesting. Well you know, it may simply be that the measurements to vary a lot, and Scio (the company) deals with it by doing 2 things: (1) each measurement is a mean of 2 measurements, and (2) the measurements are always divided by the calibration. This means that if you divide it after your decoding, the spectrum may be right in the end. Please try it and let me know, I would be very interested to hear if it works.

from scio-read.

JanBessai avatar JanBessai commented on August 26, 2024

Please try it and let me know, I would be very interested to hear if it works.

It won't be that simple to implement:
for my previous attempt I wrote a quick and dirty Haskell program to brute force through all decoding parameter combinations and find "best" options by looking at average differences of subsequent samples or minimizing outliers in the first package. This method does not really scale for multiple readings, because I have no idea how many bytes of header information to skip in the third package. Additionally, even if the divided numbers were to cancel out producing numbers in a reasonable range, I'm not sure if that would mean anything.
Also, the numbers I got were so far out there (in the range of +-1e100), that rounding errors would introduce too much noise for any reasonable measurement.

from scio-read.

kebasaa avatar kebasaa commented on August 26, 2024

Did you look at the readme in my project ? I documented all the headers. If it's unclear I can try to explain it more. Let me k ow

from scio-read.

JanBessai avatar JanBessai commented on August 26, 2024

I was referring to the header/footer you described in

This means that either the data of the 2 first parts (with 1800 hex values) are either preceded or followed by a block of 144 values. I don't know which one yet.

which we have not decoded yet. It means that one has to guess an offset after which the real samples start or a position when they stop. This offset can be assumed to be consistent across the first two packages, but there is no information about it in the third package (yet).

from scio-read.

kebasaa avatar kebasaa commented on August 26, 2024

I see. It means that for the first 2 parts, there is a header of 144 bytes, while for the third part there is no header. I have no idea why. Knowing that the Scio flashes its light 2x, I assume that it measures 2x and makes an average. Then, that means that the last (3rd) set of values is the calibration (5x331 = 1656). This might be the easiest to decode, because we can guess that there is no header in it.

from scio-read.

kebasaa avatar kebasaa commented on August 26, 2024

@JanBessai Can you try to decode it using 5-byte or 10-byte chunks?

from scio-read.

JanBessai avatar JanBessai commented on August 26, 2024

Can you elaborate on what you mean by chunks? 5 or 10 byte floats don't exist and the blackfin fixed-point types also use 2, 4, or 8 bytes.

from scio-read.

kebasaa avatar kebasaa commented on August 26, 2024

I know that they don't exist, and that the Blackfin doesn't use them. However, we know the following, and then I made some guesses:

  • There are 331 bands
  • The third part of the scan contains 1656 bytes, that's 5*331 + 1 padding byte
  • The first two parts contain 1800 bytes, that's 5*331 + a header of 145 bytes. Alternatively, it could be that both first parts contain the measurements together as 10-byte values.

Therefore, I guess that there is some custom conversion to a decimal value using 5 or 10 bytes for each value

from scio-read.

kebasaa avatar kebasaa commented on August 26, 2024

@JanBessai It appears that the values are indeed 40bit (5-byte) integers that get divided by some large value (currently, I think it's 1000000000). That then results in a float.

from scio-read.

kebasaa avatar kebasaa commented on August 26, 2024

@JanBessai None of the values I decoded with a 40-bit reading make sense. Basically, the readings divided by the calibration should always be between 0-1 (1 being 100% reflectance, and 0 none). By testing numerous decodings, I tried to find something that would yield such a vector. Unfortunately, I have not been successful yet.

from scio-read.

earwickerh avatar earwickerh commented on August 26, 2024

Kudos on the effort everyone! Thanks for starting this project @kebasaa. This may be (is porbably) a stupid question but: wouldn't it be helpful to do a scio scan of a sample for which we already have real NIR spectral reading data (within the same range)?

from scio-read.

StevenLColeman42 avatar StevenLColeman42 commented on August 26, 2024

from scio-read.

kebasaa avatar kebasaa commented on August 26, 2024

I'm really glad someone's still interested in this. I've tried a lot of ways to crack this device's data, without success, but I'd love to get it to work with anyone's help. And yes, SCIO (the company) is not helping, unfortunately.

from scio-read.

JanBessai avatar JanBessai commented on August 26, 2024

@StevenLColeman42
Reverse engineering the right commands to send to the device from the App might be possible, but note that the app just sends the data that has been collected to some Webservice and doesn't actually decode it. At least that is what I assume from looking at the API you get from them for writing your own code.

from scio-read.

kebasaa avatar kebasaa commented on August 26, 2024

@StevenLColeman42 Totally with you there. I haven't used the device in ~6 months, so no clue if they broke the new android apps. However, older versions can be found on some APK backup sites. As I mentioned in my previous answer, I've just uploaded logs of the app that contain both raw and corresponding decoded data

Check out https://apkpure.com/scio-pocket-molecular-sensor/com.consumerphysics.consumer/versions for older versions of the app

from scio-read.

nachtschatt3n avatar nachtschatt3n commented on August 26, 2024

Well if it helps I could write a man in the middle proxy to get the requests and answers I also have the old iPhone app that still works but no dev account :(

from scio-read.

kebasaa avatar kebasaa commented on August 26, 2024

@nachtschatt3n Thanks for the suggestion. I think all the necessary information is available in the log files (check the folder "01_rawdata/log_files"), with both raw data, encoded and resulting spectrum. But I'd appreciate it if you could help crack it.

from scio-read.

celso-vitor avatar celso-vitor commented on August 26, 2024

Hello, I've been following the discussions on the topic related to Scio, I'm very interested in seeing the features that it can add to my work, today I'm just working on creating models and developing apps, but if you're still interested in trying something, I'm open to contribute.

from scio-read.

haakonstorm avatar haakonstorm commented on August 26, 2024

Consumer Physics appear to me to be the absolute worst kind of company you can possibly imagine. Their crowdfunding was a complete cash grab, theft, whatever you want to call it. I have a SCIO lying around here if anyone want it I'll ship for the price of shipping and a beer.

from scio-read.

hbsagen avatar hbsagen commented on August 26, 2024

I agree.

I would use it more if this, or another, open source project could provide me with the complete spectrums.

from scio-read.

kebasaa avatar kebasaa commented on August 26, 2024

So an older version of the SCiO app (I think 1.2 or so) actually saves log files that contain both the raw data (encoded, sent to consumer physics server) and the spectra (decoded, back from the server). I managed at some point to find an APK online and after installing that to get these log files. It's a very inconvenient method, of course

from scio-read.

kebasaa avatar kebasaa commented on August 26, 2024

The issue with this open source project is that it seems like the raw data is random, which points at encryption. And I don't have the skills to extract the encryption keys from the hardware...

from scio-read.

junziii avatar junziii commented on August 26, 2024

@ StevenLColeman42 I do have two devices of scion sensor. One 1.0 and one is 1.2 hardware version. I am sorry to say that I am pretty sure, that the bandits from ConsumerPhysics pulled a switch to get first the old devices out of workcycle...
After an App (Scio iOS App) update, my old scion was sent to "try to calibrate" loop for all time. Some Month after that, they tried in addition to that to take down my second device. It was in the same problem loop, till I get angry to them, now and without changing anything in soft or hardware my newer one is back functional again. Its a big scam all in all...

@kebasaa First thanks for your Projekt! I am one of the thousand customers, not happy with what they told and what they sold after that... So I would really like to see that project going to success for all the technical ambitionists... I have limited skills in reading out data from some chips, I have an Buspirate board and I would like to support your project in this process.

Does anybody tips or a good clue, on which chip the "so I got the problem" where the info to the encryption is placed or some marking on it so I can try to read out the related data?

from scio-read.

StevenLColeman42 avatar StevenLColeman42 commented on August 26, 2024

from scio-read.

kebasaa avatar kebasaa commented on August 26, 2024

@StevenLColeman42 @junziii
Thanks for your interest, offer to help and great project ideas. I haven't been inactive myself, just unsuccessful...

First off, let me go through a few things:

Temperature, and even version information and device serial number are all available by sending specific commands to the Scio through USB. These fields are not in the scan data, so when the scio performs a scan, it sends multiple commands and reads the responses.

I have made a script that uses the Python struct package to attempt to convert the bytes to some sort of reasonable data structure, like int, float or even strings, using all kinds of stuctures and headers. No success (but maybe I'm doing something wrong...)

Then, I scanned the same material multiple times. As @StevenLColeman42 said, we should see similar spikes in the bytes. However, the readings are no correlated to each other in any way, which implies encryption.

Now I'm not an encryption expert in any way. I assume that a company like Consumer Physics would have done something fairly simple, like using the serial number to encrypt/decrypt the data. Alternatively, they might have hard-coded the key in the device firmware.

Now there are 2 possible approaches: (1) Try to decrypt the data in all kinds of ways using the serial number as a key. I don't know how to do that. (2) create a firmware dump of the device. This is what @junziii is generously offering, but note that this is destructive. The Scio is glued and can't easily be reassembled... I suggest to do this after trying to decrypt using the serial number.

The chip layout and disassembly instructions are available from Sparkfun: https://learn.sparkfun.com/tutorials/scio-pocket-molecular-scanner-teardown-/all
The encryption chip is most likely the Texas Instruments the CC2540F256, or the blackfin processor

@junziii I suggest that you search for older versions of the app apk online, maybe you'll get more lucky with using the device

from scio-read.

StevenLColeman42 avatar StevenLColeman42 commented on August 26, 2024

from scio-read.

kebasaa avatar kebasaa commented on August 26, 2024

@StevenLColeman42 I have dramatically re-worked the code base, to make it easy to scan and collect data. But I'm still stuck with decrypting the data. Whether it is encrypted or not is unclear to me.
All the data so far seems to be little-endian (e.g. temperature), except that one user in another issue suggest big-endian for the scan data.

Along with the scan data, the SCiO app sends the following information to the server:
{'device_id': '8032AB45611198F1', 'sampled_at': '2020-06-04T14:37:21.253+03:00', 'sampled_white_at': '2020-06-04T14:32:46.187+03:00', 'scio_edition': 'scio_edition', 'mobile_mac_address': '38:78:62:02:7B:33',

So there is a timestamp, but also a device ID. That could be some sort of decryption key, but I'm not sure.

At this point, I'm failing at the following:
sample has 1800 bytes, sample_dark has 1800, sample_gradient has 1656 bytes length. We know that 331 floats are return by the server. So the question is how that could add up. 331*4 (for integers) is 1324 bytes, so too short. Any ideas?

from scio-read.

celso-vitor avatar celso-vitor commented on August 26, 2024

It's difficult to say for sure without more information about the data format and encryption used, but here are some possibilities to consider:

1-Data compression: The data may be compressed before being sent to the server, which could explain why the number of bytes returned by the server is less than expected. In this case, you would need to first decompress the data before attempting to decrypt it.

2-Variable-length data: The data returned by the server may not be a fixed-length array of floats. It's possible that there are additional bytes in the data that encode variable-length metadata or other information. In this case, you would need to carefully analyze the data format to determine how to properly parse and decrypt it.

3-Different data types: It's possible that not all of the data is stored as floats, which would affect the total number of bytes needed to represent the data. For example, there could be integers or other data types mixed in with the floats. You may need to examine the data format in more detail to determine what types of data are present.

4-Regarding the encryption of the data, the device ID may indeed be a decryption key, or it could be used in combination with other factors (such as the timestamp) to generate a decryption key. Without more information about the encryption scheme used, it's difficult to say for sure.

I would recommend trying to gather more information about the data format and encryption scheme, if possible. This could involve examining the code that generates and processes the data, as well as any documentation or other resources related to the data format.

from scio-read.

StevenLColeman42 avatar StevenLColeman42 commented on August 26, 2024

from scio-read.

kebasaa avatar kebasaa commented on August 26, 2024

@StevenLColeman42 @celso-vitor I will need help with this, it's not my area of expertise at all...

I have updated the Readme, it has all the information I could find (including the processor type etc.. I have also included some manuals for these chips in the folder documentation).

With regards to your other comments:

  1. Compression: The data sent by the SCiO (and the app, which merely converts bytes to a Base64 string) is always the same number of bytes. I doubt that it is compressed, unless compression will always produce the same length
  2. Length of data reported by the server: As far as I can tell the server always returns a JSON containing 331 floats.
  3. Different data types: I don't know how to analyse the byte data more. True, it may not be floats (probably isn't floats), but I've tried ints and doubles and come up empty. It is most likely not signed as negative values do not make sense
  4. Potential encryption: No clue what encryption could have been used. All the data I have available was extracted from the logs. Please use it and try to play around with it.
  5. Bit-packing: How would you detect bit-packing, and decode the data then?
  6. Firmware access: The USB interface and commands I have collected do allow to update the firmware through USB and/or BLE, by sending commands and firmware files to the device as bytes. I don't know how to download the firmware though in order to analyse it.

Data from the device: I may be repeating myself, but I invariably get the same data structures from the device and from the logs:

  • sample has 1800 bytes
  • sample_dark has 1800
  • sample_gradient has 1656 bytes
  • The server returns a JSON containing 331 floats

from scio-read.

kebasaa avatar kebasaa commented on August 26, 2024

@onoff0 I want to address the question asked when you opened this issue in 2019: I believe my new code should make connecting and scanning with the SCiO through USB quite easy and straightforward. This should resolve the current issue

However, I'm leaving the issue open because I think the discussion here is quite important for future steps in decoding the raw data. Note that the functionning of the device is being discussed in another parallel issue, which contains more detailed information on the hardware measurement principles etc. I hope that based on this, we can further discuss how to use the available data structures here.

from scio-read.

Related Issues (6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.