Giter VIP home page Giter VIP logo

unique's Introduction

Unique

A unique(1) pipeline filter

Installation

Because I don't want to change the name to something lame, this package is not published on crates.io. To install, run the following command.

cargo install --git https://github.com/archer884/unique

In all likelihood, your computer already has a command that does this, and the odds are you can probably get by with either of them. Read on to discover if unique can be useful to you.

Windows

PowerShell (pwsh) offers the Select-Object cmdlet which, called with the -Unique flag, does almost exactly what unique does. I imagine that Select-Object is using the CLR's GetHashCode() mechanism, but I am far too lazy to dig into this to find out. This command operates on object streams rather than text streams, which makes it more versatile than unique. For working with text streams, however, you can expect unique to offer slightly better performance characteristics due to the innate inefficiency of string handling in pwsh and in CLR languages generally.

macOS

Of course, pwsh is also available on macOS, but it isn't installed by default. Considerations are identical.

If instead you use a shell like bash, your built-in option is uniq(1), which is more efficient than both Select-Object and unique, but which will also fail to filter non-consecutive repetitions. As a demonstration, try the following command using the test file in this repository: cat ./resource/test-file.txt | uniq. The string "One" will appear twice in the output. This is by design: uniq is written for maximum efficiency and, as such, does not retain the full text being filtered in memory, which is necessary to discover non-consecutive repetitions.

The unique advantage

unique differs slightly from both of these commands.

Efficient operation on text streams

In comparison to the CLR-based Select-Object, unique offers more efficient operation on text streams.

This is because unique reads the entirety of stdin at once before printing only unique lines. Only a single input buffer is ever allocated, with the resulting individual lines being represented in memory as slices of that buffer rather than as discrete strings, as would be necessary in the CLR.

Note: Yes, I know that the CLR now has Span<T>. Whether or not this technology has been incorporated into Select-Object is not known to me. Feel free to look it up and let me know. :)

Successful detection of non-consecutive repetitions

Allowing for the necessary reduction in sheer efficiency, unique offers an advantage over the built-in command uniq in that it will faithfully exclude repeated lines even when those repetitions are not back to back. The primary difference you'll see as a user is that your pipelines can be shorter (uniq must often be combined with sort to address this shortcoming) and you can expect your input and output strings to appear in the same order (because, obviously, you didn't have to sort them). Compare the following pipelines:

cat foo.txt | sort | uniq
cat foo.txt | unique

You will save five entire characters. Your children will thank you.

unique's People

Contributors

archer884 avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.