Giter VIP home page Giter VIP logo

Comments (8)

iho avatar iho commented on August 30, 2024

Does anybody work on it? Can I take it?

from nearcore.

frol avatar frol commented on August 30, 2024

@iho go for it, just make sure you measure the performance gains as it was just a theory I had in mind after learning about that issue

from nearcore.

iho avatar iho commented on August 30, 2024

I just did some experiments, and on my machine, removing the call to near_config_utils::strip_comments_from_json_reader and passing mmap to let genesis = serde_json::from_slice::<Genesis>(&mmap) improved loading time from 26-27 seconds to 7 seconds. I think that using this tool for removing comments in JSON file slowdowns loading Genesis from JSON file.

What motivation for having comments in JSON? Without this feature, we can have lower startup time.

from nearcore.

nikurt avatar nikurt commented on August 30, 2024

@iho Does your genesis file contain all the records?
The easiest way to make startup of testnet nodes instant is to move the records to a separate records.json file.
Use these files: https://s3.us-west-1.amazonaws.com/build.nearprotocol.com/nearcore-deploy/testnet/fast/config.json.xz

Something like this should give you a node that starts instantly (haven't tested this): neard init --chain-id testnet --download-genesis --download-config --download-genesis-url 'https://s3.us-west-1.amazonaws.com/build.nearprotocol.com/nearcore-deploy/testnet/fast/genesis.json.xz' --download-config-url 'https://s3.us-west-1.amazonaws.com/build.nearprotocol.com/nearcore-deploy/testnet/fast/config.json.xz' --download-records-url 'https://s3.us-west-1.amazonaws.com/build.nearprotocol.com/nearcore-deploy/testnet/fast/records.json.xz'

@marcelo-gonzalez have the instructions been published anywhere yet?

from nearcore.

marcelo-gonzalez avatar marcelo-gonzalez commented on August 30, 2024

@iho Does your genesis file contain all the records? The easiest way to make startup of testnet nodes instant is to move the records to a separate records.json file. Use these files: https://s3.us-west-1.amazonaws.com/build.nearprotocol.com/nearcore-deploy/testnet/fast/config.json.xz

Something like this should give you a node that starts instantly (haven't tested this): neard init --chain-id testnet --download-genesis --download-config --download-genesis-url 'https://s3.us-west-1.amazonaws.com/build.nearprotocol.com/nearcore-deploy/testnet/fast/genesis.json.xz' --download-config-url 'https://s3.us-west-1.amazonaws.com/build.nearprotocol.com/nearcore-deploy/testnet/fast/config.json.xz' --download-records-url 'https://s3.us-west-1.amazonaws.com/build.nearprotocol.com/nearcore-deploy/testnet/fast/records.json.xz'

@marcelo-gonzalez have the instructions been published anywhere yet?

Ah no instructions publicly posted anywhere, that would be a good idea.. The command you posted is a valid one though, can confirm. And then the node starts instantly only when you pass --unsafe-fast-startup so that it actually skips loading the genesis records (e.g. neard --unsafe-fast-startup run). Also, if you want to make an exisiting testnet home dir faster this way, you'd have to sort of manually replace the genesis file with the two separate genesis.json and records.json, and then set the genesis_records_file param in config.json. I can try coming up with a script for that migration and testing it

from nearcore.

iho avatar iho commented on August 30, 2024

@marcelo-gonzalez can we have JSON without comments records.json? I can change logic to store it records.json as cache. We even can use binary file format for storing data.

from nearcore.

iho avatar iho commented on August 30, 2024

For some reason I can't upload patсh

diff --git a/Cargo.lock b/Cargo.lock
index 28e7bd264..1834ab4aa 100644
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -3505,6 +3505,7 @@ dependencies = [
  "anyhow",
  "chrono",
  "derive_more",
+ "memmap2",
  "near-config-utils",
  "near-crypto",
  "near-o11y",
@@ -3645,6 +3646,7 @@ version = "0.0.0"
 dependencies = [
  "anyhow",
  "json_comments",
+ "memmap2",
  "thiserror",
  "tracing",
 ]
diff --git a/core/chain-configs/Cargo.toml b/core/chain-configs/Cargo.toml
index 1c59b650c..d0c99ff7a 100644
--- a/core/chain-configs/Cargo.toml
+++ b/core/chain-configs/Cargo.toml
@@ -14,6 +14,7 @@ anyhow.workspace = true
 chrono.workspace = true
 derive_more.workspace = true
 num-rational.workspace = true
+memmap2.workspace = true
 once_cell.workspace = true
 serde.workspace = true
 serde_json.workspace = true
@@ -31,10 +32,6 @@ nightly_protocol = [
   "near-o11y/nightly_protocol",
   "near-primitives/nightly_protocol",
 ]
-nightly = [
-  "nightly_protocol",
-  "near-o11y/nightly",
-  "near-primitives/nightly",
-]
+nightly = ["nightly_protocol", "near-o11y/nightly", "near-primitives/nightly"]
 default = []
 metrics = ["near-o11y"]
diff --git a/core/chain-configs/src/genesis_config.rs b/core/chain-configs/src/genesis_config.rs
index 1df019aa6..d56fa53a2 100644
--- a/core/chain-configs/src/genesis_config.rs
+++ b/core/chain-configs/src/genesis_config.rs
@@ -531,33 +531,45 @@ impl Genesis {
         path: P,
         genesis_validation: GenesisValidationMode,
     ) -> Result<Self, ValidationError> {
-        let mut file = File::open(&path).map_err(|_| ValidationError::GenesisFileError {
+        use std::time::Instant;
+        let now = Instant::now();
+        let file = File::open(&path).map_err(|_| ValidationError::GenesisFileError {
             error_message: format!(
                 "Could not open genesis config file at path {}.",
                 &path.as_ref().display()
             ),
         })?;
+        use memmap2::MmapOptions;
+        let mmap = unsafe {
+            MmapOptions::new().map(&file).map_err(|_| ValidationError::GenesisFileError {
+                error_message: format!(
+                    "Could not mmap genesis config file at path {}.",
+                    &path.as_ref().display()
+                ),
+            })?
+        };

-        let mut json_str = String::new();
-        file.read_to_string(&mut json_str).map_err(|_| ValidationError::GenesisFileError {
-            error_message: format!("Failed to read genesis config file to string. "),
-        })?;
+        // let striper = near_config_utils::strip_comments_from_mmap(&mmap);

-        let json_str_without_comments = near_config_utils::strip_comments_from_json_str(&json_str)
-            .map_err(|_| ValidationError::GenesisFileError {
-                error_message: format!("Failed to strip comments from genesis config file"),
-            })?;
+        // let genesis =
+        //     serde_json::from_reader::<near_config_utils::StripComments<&[u8]>, Genesis>(striper)
+        //         .map_err(|_| ValidationError::GenesisFileError {
+        //             error_message: format!("Failed to deserialize the genesis records."),
+        //         })?;
+        let genesis = serde_json::from_slice::<Genesis>(&mmap).map_err(|_| {
+            ValidationError::GenesisFileError {
+                error_message: format!("Failed to deserialize the genesis records."),
+            }
+        })?;

-        let genesis =
-            serde_json::from_str::<Genesis>(&json_str_without_comments).map_err(|_| {
-                ValidationError::GenesisFileError {
-                    error_message: format!("Failed to deserialize the genesis records."),
-                }
-            })?;
+        let elapsed = now.elapsed();
+        println!("Elapsed: {:.2?}", elapsed);

         genesis.validate(genesis_validation)?;
         Ok(genesis)
     }
+    /// Reads Genesis from a single JSON file, the file can't be JSON with comments
+    /// This function will collect all errors regarding genesis.json and push them to validation_errors

     /// Reads Genesis from config and records files.
     pub fn from_files<P1, P2>(
diff --git a/utils/config/Cargo.toml b/utils/config/Cargo.toml
index 6fd62fc75..f987fbe22 100644
--- a/utils/config/Cargo.toml
+++ b/utils/config/Cargo.toml
@@ -14,4 +14,4 @@ anyhow.workspace = true
 json_comments.workspace = true
 thiserror.workspace = true
 tracing.workspace = true
-
+memmap2.workspace = true
diff --git a/utils/config/src/lib.rs b/utils/config/src/lib.rs
index a8f15c61c..990d59edd 100644
--- a/utils/config/src/lib.rs
+++ b/utils/config/src/lib.rs
@@ -1,6 +1,7 @@
 use std::io::Read;

-use json_comments::StripComments;
+pub use json_comments::StripComments;
+use memmap2::Mmap;

 // strip comments from a JSON string with comments.
 // the comment formats that are supported: //, /* */ and #.
@@ -16,6 +17,9 @@ pub fn strip_comments_from_json_str(json_str: &String) -> std::io::Result<String
 pub fn strip_comments_from_json_reader(reader: impl Read) -> impl Read {
     StripComments::new(reader)
 }
+pub fn strip_comments_from_mmap<'a>(reader: &'a Mmap) -> StripComments<&'a [u8]> {
+    StripComments::new(reader.as_ref())
+}

 /// errors that arise when loading config files or config semantic checks
 /// config files here include: genesis.json, config.json, node_key.json, validator_key.json

Version with mmap and without striping comments loads in 7-8 seconds and consume 16GB of RAM
Version with serialization with serde from StripComments that wraps mmap(you need to uncomment that in patсh) loads in 25 seconds and consume 28GB of RAM

Baseline is 28 seconds and 28 GB of RAM

from nearcore.

frol avatar frol commented on August 30, 2024

@iho Thank you for the research and the patch!

It seems that unless we do something about stripping the comments (e.g. dropping that support from genesis file), there is nothing we can do here, unfortunately. The version that does not support comments in genesis file is a major improvement from RAM and load time perspective, but I won't die on this hill and would rather leave it alone, so I am closing this issue.

from nearcore.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.