Giter VIP home page Giter VIP logo

clone_pseudo_fs's Introduction

                            clone_pseudo_fs
                            ===============

Introduction
------------
As the long package name and executable therein suggest, this utility is
aimed at cloning (or copying) a pseudo file system (e.g. sysfs under /sys
in Linux) to a normal storage file system (e.g. ext4 or btrfs in Linux).
This utility defaults to copying /sys to /tmp/sys .

For brevity "file system" will be abbreviated to FS and "pseudo file system"
to PFS. The plural of both will have "_s" appended.

The problem
-----------
But rsync already does clone/copy, doesn't it? Various "features" of PFS_s
may trip up find, tar or rsync . Here are a few:

    - while most PFS_s are generally speaking "read-only", possibly apart
      from a small number of writable attribute files. Files including
      directories cannot be created or deleted. But that "read only"
      limitation only applies to the user space! Rather than storing
      information for later retrieval, PFS_s tend to be an interface between
      some aspect of the OS (e.g. some of its drivers) and the user space.
      An intermittent cable to a USB hub can cause USB devices to disappear
      and re-appear in quick succession. And due to the device enumeration
      policy in Linux, those devices may re-appear with a slightly different
      name. The software to do a recursive directory scan of such a dynamic
      PFS needs to be robust. This includes only failing when the iteration
      cannot complete (e.g. when 'umount /sys' succeeds while /sys is being
      scanned). Counting and categorizing errors, which the iteration can
      continue from, is a better policy for PFS cloning than sending each
      error report to stderr.

    - reads of regular files in PFS_s can be quite irregular.
      struct stat::st_size cannot be relied on (e.g. in sysfs (under /sys)).
      Its value is 0 or 4096 irrespective of the amount of data that can
      actually be read. Some regular files in PFS_s have "waiting" reads
      (e.g. /proc/kmsg) that will hang a recursive directory scan unless
      handled properly. Other regular files (e.g. /proc/cpuinfo) need
      multiple reads, a bit like reading from a socket.

    - some special options available in tar and find, specifically -xdev ,
      are best to be defaults when cloning a PFS. The tracefs PFS
      (typically mounted under /sys/kernel/trace) needs the --wait=MS_R
      option to handle its waiting read(2)s while sysfs should clone
      "out of the box" with no options. Cloning is often faster and
      creates a smaller destination tree when performed by a non-root
      user. And the cloned data may be sufficient to be processed by the
      "ls*" family of utilities.

    - tar(1) doesn't like spaces in filenames and sysfs contains lots of
      such filenames. There are solutions using find(1) piped into xargs(1)
      which invokes tar. If this is attempted on /sys it will still yield
      a fraction of the files due to command line length truncation by
      xargs.

Why
---
The author wrote the lsscsi utility which has a --sysfsroot=PATH option that
has been useful in testing. Now the author is working on lsucpd to collect
USB-C PD (power delivery) information. That depends on the ucsi_acpi driver
which is currently very "fragile" (sometimes rmmod followed by modprobe
unlocks it). An alternative is to setup various USB-C PD pieces of equipment
and when everything is stable, clone /sys . Then further testing can be done
against the stable clone/copy. Also the author foresees users reporting that
lsucpd gives incorrect output when my xxxxx is connected to my yyyyy. Getting
a clone/copy of that user's /sys will be helpful in debugging that issue
(security be damned).

lsblk and lscpu have a --sysroot=PATH option that is used in the util-linux
test suite. That code can be found at
https://github.com/util-linux/util-linux .

Implementation
--------------
The clone_pseudo_fs utility is written in C++ after various attempts with
bash scripts. C++20 support was chosen because:
   - C++17 introduced the std::filesystem library which both g++ and clang++
     seem to have well supported
   - std::filesystem has a recursive_directory_iterator class which is exactly
     what is needed for scanning and copying FS trees
   - recursive_directory_iterator conveniently ignores obscure objects such as
     the so-called "magic" symlinks in /proc
   - C++20 adds std::ranges and a second set of standard (STL) algorithms that
     are ranges aware and that are (slightly) better than the originals (due
     to 20 years experience with the originals). So, for example, the process
     of sorting all the exclude paths, making sure they are unique (i.e.
     removing duplicates) and later doing a binary search on it, is shorter
     to code than this sentence.
   - std::filesystem supports both throwing exceptions and yielding errors via
     an error_code object. The implementer can choose either and this one chose
     the error_code approach. The error_code approach seems to lead to finer
     grain error reporting but does lead to some non-obvious code, for example
     ++itr is replaced by itr.increment(ec) .


Building package
----------------
Installation instructions are in the INSTALL file.
Note: g++ version 13 or clang++ version 16 or later versions are required.
Both g++ version 12 or clang++ version 15 can be supported if 'libfmt' is
available. On Debian/Ubuntu use 'apt install libfmt-dev' to install the
libfmt library and its associated header file.

Various options can be given to the ./configure script. Those
specific to this package are:

  --enable-debug          Turn on debugging


The build sequence is now:
  ./autogen.sh ; ./configure ; make ; make install
or
  ./bootstrap ; ./configure ; make ; make install

Note that the final 'make install' will usually require root permissions
and will place binaries and scripts in the /usr/local/bin directory.

Instruction for using cmake are in the INSTALL file.

Douglas Gilbert
13th December 2023 [rev: 25]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.