This example shows a simple way of leveraging some of the most widely used Machine Learning libraries available in Python.
The DApp generates a linear regression model using scikit-learn, NumPy and pandas, and then uses m2cgen (Model to Code Generator) to transpile that model into native Python code with no dependencies. This approach is inspired by Davis David's Machine Learning tutorial, and is useful for a Cartesi DApp because it removes the need of porting all those Machine Learning libraries to the Cartesi Machine's RISC-V architecture, making the development process easier and the final back-end code simpler to execute.
The practical goal of this application is to predict a classification based on the Air Quality Dataset, which contains the responses of a gas multisensor device deployed on the field in an Italian city. Hourly response averages are recorded along with gas concentration references from a certified analyzer.
The model currently takes into account several variables for predicting the AQI(Air quality index):
-
PT08.S1(CO): This represents the sensor response to Carbon Monoxide (CO) levels in the air.
-
PT08.S2(NMHC): This represents the sensor response to Non-Methane Hydrocarbons (NMHC) in the air. For instance, the value 1046 is the sensor reading, which can be correlated to the actual concentration of NMHC in micrograms per cubic meter (µg/m³).
-
PT08.S3(NOx): This represents the sensor response to Nitrogen Oxides (NOx) in the air. For instance, the value 1056 is the sensor reading, which can be correlated to the actual concentration of NOx in parts per billion (ppb).
-
PT08.S4(NO2): This represents the sensor response to Nitrogen Dioxide (NO2) in the air. For instance, the value 1692 is the sensor reading, which can be correlated to the actual concentration of NO2 in micrograms per cubic meter (µg/m³).
-
PT08.S5(O3): This represents the sensor response to Ozone (O3) in the air. For instance, the value 1268 is the sensor reading, which can be correlated to the actual concentration of O3 in micrograms per cubic meter (µg/m³).
-
T: This represents the temperature in degrees Celsius. For instance, the value 21.6 is the measured temperature.
-
RH: This represents the Relative Humidity in percentage. For instance, the value 13.6 is the measured relative humidity.
-
AH: This represents Absolute Humidity, which is the total water content in the air. For instance, the value 0.76 could be the absolute humidity in grams per cubic meter (g/m³).
As such, inputs to the DApp should be given as a JSON string such as the following:
{"PT08.S1(CO)": 1360, "PT08.S2(NMHC)": 1046, "PT08.S3(NOx)": 1056, "PT08.S4(NO2)": 1692, "PT08.S5(O3)": 1268, "T": 21.6, "RH": 13.6, "AH": 0.76}
We have two main ways to interact with the dapp: using the frontend-web-cartesi application, or using the sunodo send command.
Clone the repository in the above link.
After that, go to a separate terminal window and switch to the frontend-web-cartesi
directory:
cd frontend-web-cartesi
Run the following commands to run the frontend web in your localhost:
yarn
yarn codegen
yarn start
Runs the app in the development mode. Open http://localhost:3000 to view it in the browser.
Please keep in mind that you should import one of the local wallets to the metamask.With that in place, also add the sunodo token to your wallet.
**Note: you must deposit some Sunodo tokens to the wallet inside the dApp to be able to use the AQI prediction fuction. **
This DApp was implemented in a rather generic way and, as such, it is possible to easily change the target dataset as well as the predictor algorithm.
To change those, open the file airquality/model/build_model.py
and change the following variables defined at the beginning of the script:
model
: defines the scikit-learn predictor algorithm to use. While it currently usessklearn.linear_model.LinearRegression
, many other possibilities are available, from several types of linear regressions to solutions such as support vector machines (SVMs).train_csv
: a URL or file path to a CSV file containing the dataset. It should contain a first row with the feature names, followed by the data.