Giter VIP home page Giter VIP logo

pff-tools's Introduction

pff-tools

I have some Microsoft Outlook OST files lying around that I needed to look at from time to time. It felt like too much of a hassle to have to boot into Windows and setup Outlook and then load the OST files into it just to search for one mail. Turns out there's a pretty good OSS library called libpff that knows how to parse PST/OST files. I of course, want everything in Rust, so I generated a Rust binding for libpff, wrote a safe wrapper library and then a CLI tool for dealing with the files.

  • The pff-sys crate has the Rust bindings for libpff.
  • The pff crate is the safe and hopefully idiomatic Rust wrapper for pff-sys.
  • The pff-cli crate is the CLI tool.

CLI Commands

The pff-cli tool supports the following commands.

Index mails

You can give it a PST/OST file and have it index all the mails (optionally including the message body) with a Meilisearch server. Here are the usage instructions.

pff-cli-index
Index all emails

USAGE:
    pff-cli --pff-file <PFF_FILE> index [OPTIONS] --server <SERVER> --index-name <INDEX_NAME>

OPTIONS:
    -a, --api-key <API_KEY>
            Search server API key (if any)

    -b, --include-body
            Should the message body be included in the index?

    -f, --progress-file <PROGRESS_FILE>
            File to save progress to so we can resume later [default: progress.csv]

    -h, --help
            Print help information

    -i, --index-name <INDEX_NAME>
            Index name

    -s, --server <SERVER>
            Search server URL in form "ip:port" or "hostname:port"

Note that including the message body in the index, depending on the size of your PST/OST file, can result in a large index size in Meilisearch. If you have the disk space, go for it.

Export a mail as JSON

Once you have searched for the message you're looking for on the search server you'll have a message ID of the form 8354_8514_32866_32930_2667556, i.e., the search results identify each message with a string like this. This is a sequence of folder and message IDs that uniquely identify an item in the PST/OST file. Once you have this, you can export the message in JSON form using the export-message command. Here are the usage instructions.

pff-cli-export-message
Export a single message as JSON

USAGE:
    pff-cli --pff-file <PFF_FILE> export-message --id <ID>

OPTIONS:
    -h, --help       Print help information
    -i, --id <ID>    The ID of the message to export. The ID must be given as as a sequence '_'
                     delimited numbers. For example, 8354_8514_8546_7029316. This ID can be fetched
                     from the Meilisearch server search results. Note that this message ID path must
                     not include the root folder's ID which is what you get by default if you
                     indexed your emails using the `pff-cli index` command

Here's an example of how you might run this command.

pff-cli --pff-file /path/to/file.ost export-message --id 8354_8514_32866_32930_2667556

You can route the output through the jq tool to have the JSON nicely formatted.

pff-cli --pff-file /path/to/file.ost export-message --id 8354_8514_32866_32930_2667556 | jq

{
  "id": "2667556",
  "subject": "Subject here",
  "sender": {
    "name": "Alice",
    "email": "[email protected]"
  },
  "recipients": [
    {
      "name": "Bob",
      "email": "[email protected]"
    },
    {
      "name": "Pam",
      "email": "[email protected]"
    }
  ],
  "body": {
    "type": "html",
    "value": "... lots of HTML here ..."
  },
  "send_time": "2020-11-05T20:00:30",
  "delivery_time": "2020-11-05T20:00:39"
}

You can export the body into a file that you can then view in a browser like so.

pff-cli --pff-file /path/to/file.ost export-message --id 8354_8514_32866_32930_2667556 | jq -r '.body.value' > /tmp/mail.html

Building the code

Linux

In order to build you'll need Rust (duh!) and a working installation of libpff. See the libpff documentation for learning how to build it. It's fairly straightforward. In my case, on my Ubuntu box, the following worked great.

sudo apt install git autoconf automake autopoint libtool pkg-config libclang-dev
git clone https://github.com/libyal/libpff.git
cd libpff/
./synclibs.sh
./autogen.sh
./configure
make -j `nproc`
sudo make install

The binaries will by default get installed in /usr/local. To have the libpff.so file appear in the Linux library cache you may to run the following post install.

sudo ldconfig

macOS

I have been able to get this to work on macOS as well. You just have to follow the build instructions on the libpff wiki.

pff-tools's People

Contributors

avranju avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

lfern privadomy

pff-tools's Issues

`ProgressTracker` should periodically flush state

The ProgressTracker is meant to make it easy to resume a long running indexing task in case it gets interrupted by having the progress saved in file. But right now, it saves the state only at the end of the run which defeats the purpose. We should instead periodically flush the current state to file.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.