doug-gilbert / clone_pseudo_fs Goto Github PK
View Code? Open in Web Editor NEWclone pseudo file systems in Linux, primarily sysfs but also devfs and procfs
License: Other
clone pseudo file systems in Linux, primarily sysfs but also devfs and procfs
License: Other
clone_pseudo_fs =============== Introduction ------------ As the long package name and executable therein suggest, this utility is aimed at cloning (or copying) a pseudo file system (e.g. sysfs under /sys in Linux) to a normal storage file system (e.g. ext4 or btrfs in Linux). This utility defaults to copying /sys to /tmp/sys . For brevity "file system" will be abbreviated to FS and "pseudo file system" to PFS. The plural of both will have "_s" appended. The problem ----------- But rsync already does clone/copy, doesn't it? Various "features" of PFS_s may trip up find, tar or rsync . Here are a few: - while most PFS_s are generally speaking "read-only", possibly apart from a small number of writable attribute files. Files including directories cannot be created or deleted. But that "read only" limitation only applies to the user space! Rather than storing information for later retrieval, PFS_s tend to be an interface between some aspect of the OS (e.g. some of its drivers) and the user space. An intermittent cable to a USB hub can cause USB devices to disappear and re-appear in quick succession. And due to the device enumeration policy in Linux, those devices may re-appear with a slightly different name. The software to do a recursive directory scan of such a dynamic PFS needs to be robust. This includes only failing when the iteration cannot complete (e.g. when 'umount /sys' succeeds while /sys is being scanned). Counting and categorizing errors, which the iteration can continue from, is a better policy for PFS cloning than sending each error report to stderr. - reads of regular files in PFS_s can be quite irregular. struct stat::st_size cannot be relied on (e.g. in sysfs (under /sys)). Its value is 0 or 4096 irrespective of the amount of data that can actually be read. Some regular files in PFS_s have "waiting" reads (e.g. /proc/kmsg) that will hang a recursive directory scan unless handled properly. Other regular files (e.g. /proc/cpuinfo) need multiple reads, a bit like reading from a socket. - some special options available in tar and find, specifically -xdev , are best to be defaults when cloning a PFS. The tracefs PFS (typically mounted under /sys/kernel/trace) needs the --wait=MS_R option to handle its waiting read(2)s while sysfs should clone "out of the box" with no options. Cloning is often faster and creates a smaller destination tree when performed by a non-root user. And the cloned data may be sufficient to be processed by the "ls*" family of utilities. - tar(1) doesn't like spaces in filenames and sysfs contains lots of such filenames. There are solutions using find(1) piped into xargs(1) which invokes tar. If this is attempted on /sys it will still yield a fraction of the files due to command line length truncation by xargs. Why --- The author wrote the lsscsi utility which has a --sysfsroot=PATH option that has been useful in testing. Now the author is working on lsucpd to collect USB-C PD (power delivery) information. That depends on the ucsi_acpi driver which is currently very "fragile" (sometimes rmmod followed by modprobe unlocks it). An alternative is to setup various USB-C PD pieces of equipment and when everything is stable, clone /sys . Then further testing can be done against the stable clone/copy. Also the author foresees users reporting that lsucpd gives incorrect output when my xxxxx is connected to my yyyyy. Getting a clone/copy of that user's /sys will be helpful in debugging that issue (security be damned). lsblk and lscpu have a --sysroot=PATH option that is used in the util-linux test suite. That code can be found at https://github.com/util-linux/util-linux . Implementation -------------- The clone_pseudo_fs utility is written in C++ after various attempts with bash scripts. C++20 support was chosen because: - C++17 introduced the std::filesystem library which both g++ and clang++ seem to have well supported - std::filesystem has a recursive_directory_iterator class which is exactly what is needed for scanning and copying FS trees - recursive_directory_iterator conveniently ignores obscure objects such as the so-called "magic" symlinks in /proc - C++20 adds std::ranges and a second set of standard (STL) algorithms that are ranges aware and that are (slightly) better than the originals (due to 20 years experience with the originals). So, for example, the process of sorting all the exclude paths, making sure they are unique (i.e. removing duplicates) and later doing a binary search on it, is shorter to code than this sentence. - std::filesystem supports both throwing exceptions and yielding errors via an error_code object. The implementer can choose either and this one chose the error_code approach. The error_code approach seems to lead to finer grain error reporting but does lead to some non-obvious code, for example ++itr is replaced by itr.increment(ec) . Building package ---------------- Installation instructions are in the INSTALL file. Note: g++ version 13 or clang++ version 16 or later versions are required. Both g++ version 12 or clang++ version 15 can be supported if 'libfmt' is available. On Debian/Ubuntu use 'apt install libfmt-dev' to install the libfmt library and its associated header file. Various options can be given to the ./configure script. Those specific to this package are: --enable-debug Turn on debugging The build sequence is now: ./autogen.sh ; ./configure ; make ; make install or ./bootstrap ; ./configure ; make ; make install Note that the final 'make install' will usually require root permissions and will place binaries and scripts in the /usr/local/bin directory. Instruction for using cmake are in the INSTALL file. Douglas Gilbert 13th December 2023 [rev: 25]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.