Giter VIP home page Giter VIP logo

tezos-baking's Introduction

Tezos baking tools to ease a bakers life
Focus on uptime and ease of use / monitoring

If you own Tezos tokens (XTZ), you want them to work for the Tezos network and ecosystem. It is a fun challenge and you will be rewarded for doing it. The more individual bakers we have in the Tezos ecosystem the more decentralized and resilient Tezos will become.

Granted, it takes a little bit of work - but good guides exist and I think you will find it worth while. This repository focuses on getting you maximal uptime and ease of use once you have installed your node(s) and gotten your ledger to work.

To spin up a node use this excellent guide: http://doc.tzalpha.net/introduction/howtoget.html#build-from-sources

To get your Ledger Nano S to work with Tezos follow this excellent guide: https://github.com/obsidiansystems/ledger-app-tezos/blob/master/README.md

Using systemd to control and monitor your node, baker, endorser and accuser

If you are not familiar with "services" or systemd there is a good intro here: https://www.digitalocean.com/community/tutorials/how-to-use-systemctl-to-manage-systemd-services-and-units

Briefly, what we want is a stable system with maximal uptime in return for minimal intervention. As such we need a tezos node that starts itself once the system boots and a baker/endorser/accuser that is always on and ready to bake/endorse/accuse. systemd can easily help us achieve this.

Below you'll find the config files for such a system together with some explanation of how it works.

Basically, for a full bakery we want to configure and (auto-)run four services

  • Tezos Node
  • Tezos Baker
  • Tezos Endorser
  • Tezos Accuser

For a node only, the bottom of this page has instructions on how to use systemd to run a non-baking node. It is super easy. If you only need that then you can read the tezos-node.service part and then skip to the bottom of the document.

The individual files for a full bakery are outlined below. You can copy and paste them into the paths/files mentioned in each section.

For monitoring the daemons you can use journalctl which is a powerful tool to monitor services both real-time and after the fact. There is separate section on this towards the end of this document; go there now by clicking this link

tezos-node.service

First we want to set up the service to run our Tezos node. The below example includes running over VPN - you dont have to, just remove the openvpn-client@<vpnprovider>.service under the [Unit] both for Wants and After.

At the ExecStart line you can place whatever command you normally use to start your node - just dont use nohub etc. You can (& should) change your userand group to whatever you use on your system. I control my nodes in a simple config.json - but the below should work with default settings all over.

# The Tezos Node service (part of systemd)
# file: /etc/systemd/system/tezos-node.service 

[Unit]
Description     = Tezos Node Service
Documentation   = http://tezos.gitlab.io/betanet/
Wants           = network-online.target openvpn-client@<vpnprovider>.service
After           = network-online.target openvpn-client@<vpnprovider>.service 

[Service]
User            = baker
Group		= baker
WorkingDirectory= /home/baker/
ExecStart	= /home/baker/tezos/tezos-node run --bootstrap-threshold=1
Restart         = on-failure

[Install]
WantedBy	= multi-user.target
RequiredBy	= tezos-baker.service tezos-endorser.service tezos-accuser.service

To install your tezos node as a service that loads at boot time you use systemctl:
sudo systemctl enable tezos-node.service

Notice that we use After to tell systemd that this service should be loaded after networking is established. We allow the tezos-node service to load without network by using Wants as we might establish network later. You can make sure the tezos node would only load once network is established by chainging 'Wants' to 'Requires' but I would recommend against doing so, as it limits your flexibility. Same with the VPN service

To start your tezos node service you would use:
sudo systemctl start tezos-node.service Obviously, you rarely use this command unless you 1) havent rebooted your system after installing the service or 2) you have actively shut down the service

To start your tezos node service I strongly recommend you use the reload-or-restart instead, as this would ensure (via systemd) that the service is started if not running already and reloaded if already running:
sudo systemctl reload-or-restart tezos-node.service

To see the status of your tezos node service you would use:
sudo systemctl status tezos-node.service

To stop your tezos node service you would use:
sudo systemctl stop tezos-node.service

If for some reason you don't want the node to start at boot anymore, simply do:
sudo systemctl disable tezos-node.service

tezos-baker.service

Now that the node is up and running we want to run the baker, endorser and accuser the same way.

# The Tezos Baker service (part of systemd)
# file: /etc/systemd/system/tezos-baker.service 

[Unit]
Description     = Tezos Baker Service
Wants           = network-online.target openvpn-client@<vpnprovider>.service 
BindsTo		= tezos-node.service
After           = tezos-node.service

[Service]
User            = baker
Group		= baker
WorkingDirectory= /home/baker/
ExecStartPre	= /bin/sleep 1
ExecStart       = /home/baker/tezos/tezos-baker-004-Pt24m4xi run with local node /home/baker/.tezos-node ledger_bakerone_ed_0_0
Restart         = on-failure

[Install]
WantedBy	= multi-user.target

We know that these services require a tezos node and therefore we require the node to be running first - the hardest form of requirement is binding - this means that this service will only start if the service it BindsTo is successfully started and running. Also, if the service (Tezos Node) this service (Tezos Baker) binds to crashes this service will be stopped.

You should replace the ExecStart command with whatever command you want to run your baker with. Also, replace the ledger_bakerone_ed_0_0 with whatever alias your baking key has. Notice the ExecStartPre: It is a little hackish, but I found that introducing a one second delay between starting the node and the baker, endorser and accuser would make the service run smoothly. Else systemd will start them too closely together. There are ways to adjust this using sytemd, but to keep things simple, we simply sleep for a second prior to executing the command to fire up the baker.

We also know that it would probably be good to reload the baker, endorser and accuser should the node ever reload and therefore we use BindsTo to bind these services to the node. This effectively means, that all four services will restart if you restart the node and that you can restart each of the baker, endorser and accuser services seperately, should you need to.

Same commands as for node to enable, reload/start, stop and get status on the baker:

  • sudo systemctl enable tezos-baker.service
  • sudo systemctl reload-or-restart tezos-baker.service
  • sudo systemctl stop tezos-baker.service
  • sudo systemctl status tezos-baker.service

Note: You must have the Tezos baking app open on your Ledger Nano S when you (re)start your baker and endorser.

tezos-endorser.service

Now we simply do the same with the endorser and accuser daemons.

# The Tezos Endorser service (part of systemd)
# file: /etc/systemd/system/tezos-endorser.service 

[Unit]
Description     = Tezos Endorser Service
Wants           = network-online.target openvpn-client@<vpnprovider>.service 
BindsTo		= tezos-node.service
After           = tezos-node.service

[Service]
User            = baker
Group		= baker
WorkingDirectory= /home/baker/
ExecStartPre	= /bin/sleep 1
ExecStart       = /home/baker/tezos/tezos-endorser-004-Pt24m4xi run ledger_bakerone_ed_0_0
Restart         = on-failure

[Install]
WantedBy	= multi-user.target

Same commands as for node to enable, reload/start, stop and get status on the baker:

  • sudo systemctl enable tezos-endorser.service
  • sudo systemctl reload-or-restart tezos-endorser.service
  • sudo systemctl stop tezos-endorser.service
  • sudo systemctl status tezos-endorser.service

Note: You must have the Tezos baking app open on your Ledger Nano S when you (re)start your baker and endorser.

tezos-accuser.service

# The Tezos Accuser service (part of systemd)
# file: /etc/systemd/system/tezos-accuser.service 

[Unit]
Description     = Tezos Accuser Service
Wants           = network-online.target openvpn-client@<vpnprovider>.service 
BindsTo		= tezos-node.service
After           = tezos-node.service

[Service]
User            = baker
Group		= baker
WorkingDirectory= /home/baker/
ExecStartPre	= /bin/sleep 1
ExecStart       = /home/baker/tezos/tezos-accuser-004-Pt24m4xi run
Restart         = on-failure

[Install]
WantedBy	= multi-user.target

Same commands as for node to enable, reload/start, stop and get status on the accuser:

  • sudo systemctl enable tezos-accuser.service
  • sudo systemctl reload-or-restart tezos-accuser.service
  • sudo systemctl stop tezos-accuser.service
  • sudo systemctl status tezos-accuser.service

Combining all the services to get a nice status page for your Tezos operations

sudo systemctl status tezos-node.service tezos-baker.service tezos-endorser.service tezos-accuser.service

Or alternatively, shorter but less ordered: sudo systemctl status 'tezos-*.service'

You can restart all four services by restarting the node (because we bound the baker/endorser/accuser to the node):
sudo systemctl reload-or-restart tezos-node.service

Similarly all services will stop upon:
sudo systemctl stop tezos-node.service

And you can stop the baker/endorser/accuser individually if you want.

  • sudo systemctl stop tezos-baker.service
  • sudo systemctl stop tezos-endorser.service
  • sudo systemctl stop tezos-accuser.service

Using front-end nodes and a private baker

The above configurations work for both front end nodes and for baking/endorsing/accusing nodes. If you want to use this (I recommend you do) for your front-end nodes too, simply remove the line RequiredBy = tezos-baker.service tezos-endorser.service tezos-accuser.service from your tezos-node.service and do not install the baker/accuser/endorser services.

Restarting your Tezos operations automatically

Above, we use Restart:on-failure. You could use Restart:always - I just haven't found it neccessary and there is a slight risk that - if combinded with loose restart settings - could exhaust your system. But feel free to try it out if it'll make you sleep better at night. Now that you have all your Tezos operations 'servicified' you can indeed start sleeping at night again, without loosing out on your baking and endorsing slots.

Setting environment variables in e.g. the baker and endorser to closely monitor your ledger

Just include the following in your [Service] section Environment = TEZOS_LOG="client.signer.ledger -> debug"

If you need to pass a lot of environment variables, use EnvironmentFile instead and place one variable per line here. EnvironmentFile should point to your file, e.g. /home/baker/tezosenvironmentvariables.

Using journalctl to monitor the node, baker, endorser and accuser

Sometimes it is neccessary to go through log files to identify root causes for different events and sometimes it is just fun to follow the your Tezos services (=daemons) live. systemd has a very powerful tool to do this - it is called journalctl and a few examples on how to use it are given below.

To simply follow your node's output real-time:
journalctl --follow --unit=tezos-node.service

You dont really need to add the .service - but I'll keep doing it here for clarity

Similarly with the baker, endorser and accuser:

  • journalctl --follow --unit=tezos-baker.service
  • journalctl --follow --unit=tezos-endorser.service
  • journalctl --follow --unit=tezos-accuser.service

You can also get the output formatted to suit your needs. Try for example:
journalctl --follow --unit=tezos-endorser.service --output=json-pretty

Tezos runs its time by the universal timezone 'UTC' to get journalctl to output your log in utc simply add --utc:
journalctl --follow --unit=tezos-endorser.service --utc

By now you've probably understood that the possibilities are almost endless and the flexibility is second to none. Try for example to get your log for the endorser after a given timestamp or between two timestamps by doing these:

  • journalctl --unit=tezos-endorser.service --since=yesterday
  • journalctl --unit=tezos-endorser.service --since=today
  • journalctl --unit=tezos-endorser.service --since='2018-08-01 00:00:00' --until='2018-08-10 12:00:00'

Or find your bakes since last boot:
journalctl --unit=tezos-baker.service --boot=-0 | grep candidate

If you have installed/compiled journalctl with pattern matching functionality you can do:
journalctl --unit=tezos-baker.service --boot=-0 --grep=candidate

And on and on....

See more using man journalctl

Forget about cron jobs etc - systemd has you covered for some happy hands-off baking...

I've now been asked repeatedly for a donation address. Donations are not expected. If you feel you want to anyway you can use: tz1a2oGa6yTXGuS9d9DTckQm5vrh12qYqCqL

Enjoy!

tezos-baking's People

Contributors

etomknudsen avatar tingham avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tezos-baking's Issues

Tezos-node-cpr error msg: line 122 expecting integer

Node OK | Block "BLj9v...VUzWW" | Level 127704 | Priority 1 | 12 secs ago | Traffic: 29.35 kB/s
./cpr.sh: Zeile 122: [: null: Ganzzahliger Ausdruck erwartet.
Node OK | Block "BLj9v...VUzWW" | Level 127704 | Priority 1 | 48 secs ago | Traffic: 15.48 kB/s
./cpr.sh: Zeile 122: [: null: Ganzzahliger Ausdruck erwartet.
./cpr.sh: Zeile 122: [: null: Ganzzahliger Ausdruck erwartet.
./cpr.sh: Zeile 122: [: null: Ganzzahliger Ausdruck erwartet.
./cpr.sh: Zeile 122: [: null: Ganzzahliger Ausdruck erwartet.
./cpr.sh: Zeile 122: [: null: Ganzzahliger Ausdruck erwartet.
./cpr.sh: Zeile 122: [: null: Ganzzahliger Ausdruck erwartet.
Node OK | Block "BMU29...w1U2h" | Level 127705 | Priority 5 | 10 secs ago | Traffic: 30.46 kB/s

Line 35 causing traceback error in Node CPR script

Hello,

This part of line #35 in particular throws an error on my local and cloud nodes when I run the node cpr script:
curl -s $RPC_HOST:$RPC_PORT/network/stat | python3 -c "import sys, json; array = json.load(sys.stdin); print(int(array['total_recv'])+int(array['total_sent']))"

The same piece of code I enter at command line:
curl -s 127.0.0.1:8732/network/stat | python3 -c "import sys, json; array = json.load(sys.stdin); print(int(array['total_recv'])+int(array['total_sent']))"

Both running the script and the code at command line give the following error:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.5/json/__init__.py", line 268, in load
    parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
  File "/usr/lib/python3.5/json/__init__.py", line 319, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.5/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.5/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

This line at the command line works fine:
curl -s 127.0.0.1:8732/network/stat

and returns:
{"total_sent":"1287292","total_recv":"1562390","current_inflow":66,"current_outflow":63}

I found that changing this part of line 35 to the below fixes the issue and properly reports back the values, specifically json.loads(sys.stdin.read()):
curl -s $RPC_HOST:$RPC_PORT/network/stat | python3 -c "import sys, json; array = json.loads(sys.stdin.read()); print(int(array['total_recv'])+int(array['total_sent']))"

The script runs fine with the error but I suspect that those using it are not actually getting the correct p2p data and don't know it!

Potential bugs in script functions

I believe there are bugs both in the getTimeSinceLastBlock and the getTotalTxp2p functions. The script is configured to restart the node after 180s when no block arrived during this time. However, it fails to do so, as indicated, for instance, by the following line:

2019-03-23T06:55:56Z Waiting. Last block was 515 secs ago

I observed that the moment when I manually restart the node, strangely the script also fires. The log then shows an error in line 33, which is the getTimeSinceLastBlock function:

Looking for p2p avtivity - will wait for max 90 secs
Waiting. Last block was 510 secs ago)
Waiting. Last block was 515 secs ago)
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/lib/python3.6/json/__init__.py", line 299, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
date: invalid date '+%s'
/usr/local/bin/tezos-node-cpr.sh: 33: /usr/local/bin/tezos-node-cpr.sh: arithmetic expression: expecting primary: "1553324161-"
Network OK. No p2p activity and/or too long ( secs since last block. Restarting node!

As reported by @nickman602 there has also been an error in the getTotalTxp2p function, which I had solved by following his suggestions. However, since the most recent node update (commit hash: 366f64f3..) also this functions does not seem to work any longer, as the node seems to receive and send some data also when it is not connected to any other peer. This is indicated by the following:

2019-03-23T06:55:35Z Waiting. Last block was 494 secs ago)
2019-03-23T06:55:40Z Found p2p activity
2019-03-23T06:55:40Z Looking for p2p avtivity - will wait for max 90 secs
2019-03-23T06:55:40Z Waiting. Last block was 499 secs ago)
2019-03-23T06:55:45Z Found p2p activity
2019-03-23T06:55:45Z Looking for p2p avtivity - will wait for max 90 secs
2019-03-23T06:55:45Z Waiting. Last block was 504 secs ago)
2019-03-23T06:55:51Z Found p2p activity

Both issues result in the script not automatically restarting the node any longer.

running the node service results in Permission denied

I've created a node service with the following config:

# The Tezos Node service (part of systemd)
# file: /etc/systemd/system/tezos-node.service 

[Unit]
Description     = Tezos Node Service
Documentation   = http://tezos.gitlab.io/mainnet/
Wants           = network-online.target
After           = network-online.target 

[Service]
User            = my-tzbaker
Group		= my-tzbaker
WorkingDirectory= /home/my-tzbaker/
ExecStart	= /home/my-tzbaker/tezos/tezos-node run --bootstrap-threshold=1
Restart         = on-failure

[Install]
WantedBy	= multi-user.target

I only changed the user/group and removed the bakery/endorser lines, because currently I only have a node.
The service fails and running journalctl --follow --unit=tezos-node.service shows:

-- Logs begin at Fri 2018-06-22 14:11:49 IDT. --
Nov 07 16:03:17 hostname systemd[1290]: tezos-node.service: Failed to execute command: Permission denied
Nov 07 16:03:17 hostname systemd[1290]: tezos-node.service: Failed at step EXEC spawning /home/my-tzbaker/tezos/tezos-node: Permission denied
Nov 07 16:03:17 hostname systemd[1]: tezos-node.service: Main process exited, code=exited, status=203/EXEC
Nov 07 16:03:17 hostname systemd[1]: tezos-node.service: Failed with result 'exit-code'.
Nov 07 16:03:17 hostname systemd[1]: tezos-node.service: Service RestartSec=100ms expired, scheduling restart.
Nov 07 16:03:17 hostname systemd[1]: tezos-node.service: Scheduled restart job, restart counter is at 5.
Nov 07 16:03:17 hostname systemd[1]: Stopped Tezos Node Service.
Nov 07 16:03:17 hostname systemd[1]: tezos-node.service: Start request repeated too quickly.
Nov 07 16:03:17 hostname systemd[1]: tezos-node.service: Failed with result 'exit-code'.
Nov 07 16:03:17 hostname systemd[1]: Failed to start Tezos Node Service.

I've tried running chmod +x ~/tezos/tezos-node but it doesn't help. Any thoughts?

tezos-cpr: networkon() runs twice causing false positives and unnecessary network resets, with my setup

The default script will regularly give the output

"LAN required but not functional --> Starting LAN ($NETWORK_INTERFACE)"

without there actually being an issue.
I think the reason is that networkon() runs twice.

networkon(){ [ $(cat /sys/class/net/$NETWORK_INTERFACE/operstate) == "up" ] && return 0 || return 1; }
pingconnected(){ networkon && ping -q -c 1 -W 2 8.8.8.8 >/dev/null && return 0 || return 1; }
if $FORCE_LAN && ! networkon || $FORCE_LAN && ! pingconnected; then
	logred "LAN required but not functional --> Starting LAN ($NETWORK_INTERFACE)"
	nmcli networking on ; $ALLOW_WIFI && nmcli radio all on ; ! $ALLOW_WIFI && nmcli radio all off ;  # Toggling wifi 

Removing the networkon() funtion from pingconnected() resolves the issue.

tezos-node run --rpc-addr 127.0.0.1:8732 Fails

If I try and run the node with rpc enabled I get this, not sure if this is expected behavior.

ExecStart=/home/don/tezos/tezos-node run --rpc addr 127.0.0.1:8732 (code=exited, status=1/FAILURE)

But thanks for the files.

Baker, Endorser and Accuser fail to restart

May 05 21:53:40 leo-NUC8i7BEH systemd[1]: Stopping Tezos Baker Service...
May 05 21:53:40 leo-NUC8i7BEH systemd[1]: tezos-baker.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
May 05 21:53:40 leo-NUC8i7BEH systemd[1]: tezos-baker.service: Failed with result 'exit-code'.
May 05 21:53:40 leo-NUC8i7BEH systemd[1]: Stopped Tezos Baker Service.

I tried 'always' option for restart as well.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.