rust-syndication / atom Goto Github PK

View Code? Open in Web Editor NEW

81.0 2.0 22.0 183 KB

Library for serializing the Atom web content syndication format https://crates.io/crates/atom_syndication

License: Apache License 2.0

Rust 100.00%

rust atom parser feed

atom's Introduction

atom

Library for serializing the Atom web content syndication format.

Documentation

This crate requires Rustc version 1.57.0 or greater.

Usage

Add the dependency to your Cargo.toml.

[dependencies]
atom_syndication = "0.12"

Or, if you want Serde include the feature like this:

[dependencies]
atom_syndication = { version = "0.12", features = ["with-serde"] }

The package includes a single crate named atom_syndication.

extern crate atom_syndication;

Reading

A feed can be read from any object that implements the BufRead trait or using the FromStr trait.

use std::fs::File;
use std::io::BufReader;
use atom_syndication::Feed;

let file = File::open("example.xml").unwrap();
let feed = Feed::read_from(BufReader::new(file)).unwrap();

let string = "<feed></feed>";
let feed = string.parse::<Feed>().unwrap();

Writing

A feed can be written to any object that implements the Write trait or converted to an XML string using the ToString trait.

Note: Writing a feed does not perform any escaping of XML entities.

Example

use std::fs::File;
use std::io::{BufReader, sink};
use atom_syndication::Feed;

let file = File::open("example.xml").unwrap();
let feed = Feed::read_from(BufReader::new(file)).unwrap();

// write to the feed to a writer
feed.write_to(sink()).unwrap();

// convert the feed to a string
let string = feed.to_string();

Invalid Feeds

As a best effort to parse invalid feeds atom_syndication will default elements declared as "required" by the Atom specification to an empty string.

License

Licensed under either of

Apache License, Version 2.0, (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

atom's People

Contributors

Stargazers

Watchers

atom's Issues

secDNS

--- 4 янв. 2023 г.

Errors should be convertable to failure::Error

The atom_syndication::Error type can't be automatically converted to failure::Error by the failure crate. I'm not entirely sure why; the compile error is below if it helps. I don't have this issue with the RSS crate, which is already using failure for its errors. Currently I have to work around this by explicitly converting the error to a string before using the ? operator.

    |
183 |     let atom_feed : AtomFeed = text.parse()?;
    |                                ^^^^^^^^^^^^^ `std::error::Error + std::marker::Send + 'static` cannot be shared between threads safely
    |
    = help: the trait `std::marker::Sync` is not implemented for `std::error::Error + std::marker::Send + 'static`
    = note: required because of the requirements on the impl of `std::marker::Sync` for `std::ptr::Unique<std::error::Error + std::marker::Send + 'static>`
    = note: required because it appears within the type `std::boxed::Box<std::error::Error + std::marker::Send + 'static>`
    = note: required because it appears within the type `std::option::Option<std::boxed::Box<std::error::Error + std::marker::Send + 'static>>`
    = note: required because it appears within the type `error_chain::State`
    = note: required because it appears within the type `quick_xml::errors::Error`
    = note: required because it appears within the type `atom_syndication::Error`
    = note: required because of the requirements on the impl of `failure::Fail` for `atom_syndication::Error`
    = note: required because of the requirements on the impl of `std::convert::From<atom_syndication::Error>` for `failure::Error`
    = note: required by `std::convert::From::from`

I'd be happy to submit a pull request if you'd like, but I think that might require a breaking change.

Replace String with ::chrono::DateTime<::chrono::FixedOffset>

It seems there are no typed date or time in this crate, for example, Feed#updated, Entry#updated and Entry#published.

Should we replace String with ::chrono::DateTime<::chrono::FixedOffset>? with typed field, there will be less confusion.

I'm going to do this, I think this crate is excellent, It is my honor to participate in this project.

I'm not native speaker, I believe that there are many errors in my narrative, please forgive me.

entry title might need `type="html"`

I want to use html entities in entry titles, but this requires type="html" as an attribute of the title tag, e.g.:

<title type="html"><![CDATA[ theses and objectives about decentralized systems &mdash; a collection ]]></title>

https://www.facebook.com/profile.php?id=100015363047007

لانه تم تغيير معلومات حسابي فيس بوك ولا استطيع الوصول الى بريد الالكتروني الجديد ولكن لدي رسائل البريد الالكتروني القديم الخاص بالحساب
https://www.facebook.com/profile.php?id=100015363047007
| مراجع

#56 #59 لانه تم تغيير معلومات حسابي فيس بوك ولا استطيع الوصول الى بريد الالكتروني الجديد ولكن لدي رسائل البريد الالكتروني القديم الخاص بالحساب https://www.facebook.com/profile.php?id=100015363047007 | مراجع

Should `content` be escaped?

Using HTML within a Content seems to include it verbatim in the output. This causes validation warnings at best, and broken XML at worst (e.g. .value("</content>".to_owned())).

I would expect the library to escape the content as text, unless I'm misunderstanding the Atom spec

Feeds starting with <?xml version="1.0" encoding="UTF-8"?>

For example https://headcrab.rs/feed.xml

Implement Serde's Serialize/Deserialize traits (per C-SERDE guideline)

I think this would make sense for stuff like Feed, Entry, and others. Would a PR for this be accepted?

Feed element

No stop no

100015363047007

تم اختراق حسابي على فيس بوك

Should `xmlns="http://www.w3.org/2005/Atom"` be emitted on the `<feed>` element?

Currently it's possible to set namespaces, but not the default namespace, which also isn't emitted on the <feed> element.

It seems like a lack of the default namespace attribute gives Firefox trouble when detecting Atom feeds.

Expose API for building Atom feeds without excessive allocation

Building an Atom feed currently incurs many many allocations. Nearly every field of the type contains a String which mandates it be a separate and unique heap allocation. What's more, since all the .build() methods take &self they mandate cloning every single field in the builder. I think it would be good if atom_syndicator exposed a separate *Writer API for enabling feed-building without allocations. A sketch of what the API could look like is below:

pub struct FeedWriter<W> { /* ... */ }

impl<W: Write> FeedWriter<W> {
    pub fn from_writer(writer: W) -> Self { /* ... */ }

    pub fn title(&mut self) -> Result<TextWriter<'_, W>, Error> { /* set up the opening of a text tag */ }
    pub fn id<Id: Display>(&mut self, id: Id) -> Result<&mut Self, Error> { /* ... */ }
    // etc

    // Automatically runs in the destructor of `FeedWriter`, but can be called here to handle errors.
    pub fn finish(self) -> Result<W, Error> { /* ... */ }
}

pub struct TextWriter<'a, W: Write> { /* ... */ }

impl<W: Write> TextWriter<'_, W> {
    pub fn set<Value: Into<Text>>(self, text: Value) -> Result<(), Error> { /* ... */ }
    pub fn value<Value: Display>(&mut self, value: Value) -> Result<&mut Self, Error> { /* ... */ }
    pub fn base<Base: Display>(&mut self, base: Base) -> Result<&mut Self, Error> { /* ... */ }
    // etc
}

impl<W: Write> Drop for TextWriter<'_, W> {
    fn drop(&mut self) {
        // finish writing out the text tag

        // In lieu of fallible destructors, errors from here can be pushed
        // up into the FeedWriter and returned on the next call to any one
        // of its methods.
    }
}

Such an API would be usable like so:

let mut feed = FeedWriter::new(Vec::new());
feed.title()?.value("My example feed")?;
feed.id("http://example.com/")?;
feed.generator()?.value("My generator")?.version("1.0.0")?;
for entry in ["entry 1", "entry 2"] {
    feed.entry()?.content()?.value(entry)?;
}
let bytes = feed.finish()?;

As an alternative design, it's possible to collate all errors within the FeedWriter so that each individual method doesn't return a Result, but instead the first error is returned at .finish(). However that would mean it could force the user to do a lot more work than they need to do, since early returns on error conditions would not be possible.

Would such an API be possible to implement?

Recently released version 0.7.0 was not tagged

https://www.facebook.com/profile.php?id=100015363047007

قامو باختراق حسابي الفيس وتغيرت المعلومات https://www.facebook.com/profile.php?id=100015363047007

please disable default features of chrono

(at least disable the clock feature, because of chronotope/chrono#499, which makes auditing more cumbersome, checking if a feature is disabled is easier than needing to grep all the code)

e.g.

[dependencies.chrono]
version = "0.4"
default-features = false
features = ["alloc"]

Introduce and maintain a changelog

Allow to set xml:base on any element

RFC 4287 §2 says:

Any element defined by this specification MAY have an xml:base attribute [W3C.REC-xmlbase-20010627]. When xml:base is used in an Atom Document, it serves the function described in section 5.1.1 of [RFC3986], establishing the base URI (or IRI) for resolving any relative references found within the effective scope of the xml:base attribute.

However, this crate doesn't let me set xml:base on <content>. It'd be nice to have such capability.

Is this crate abandoned?

Issues and pull-requests are staying unaddressed for months. The same with rss crate.

`Error` exposes `quick_xml::Error` in the public API

atom_syndication::Error current has a variant Xml containing a quick_xml::Error. This has three negative impacts on the versioning of this library:

atom_syndication can't upgrade the version of quick_xml used internally without making a public API breaking change.
atom_syndication can't change which library it uses for XML parsing without making a public API breaking change.
As per C-STABLE, atom_syndication can never release 1.0.0 until quick_xml has also released 1.0.0.

The From<quick_xml::Error> for Error impl needs to go as well to avoid this problem.

To avoid this issue, atom_syndication should provide its own XmlError type that newtypes quick_xml::Error:

pub struct XmlError(quick_xml::Error);

Alternatively, XmlError can be an enum that has all the same variants as quick_xml::Error which would allow for easier introspection, but is quite verbose to define.

Support of language attribute through xml:lang

Hi, nice project. I was wondering if it would be possible to either support user-defined attributes on the <feed> and <entry> tags, such as xml:lang="en", or maybe it would be best to have a "lang" ("language"?) field somewhere (in Feed and Entry).

The xml:lang attribute is mentioned in Introduction to Atom - About this document which links to 2.12 Language Identification. Static Site Generators such as Zola also make use of this attribute (but I'd like to move away from it), so it would be nice if this crate somehow offered the possibility of setting the document's language. In the future, we could also decide what to do with the hreflang attribute.

I took a look at both the documentation and the code, but as far as I can tell, I'm only able to define namespaces and not actually arbitrary attributes for the <feed> tag. Thank you.

Add `*_mut` getters

Fields of types like Feed have:

Immutable getters (like fn title(&self) -> &str), and
Setters (like fn set_title(&mut self, title: ...))

They works fine most of the time, but when you want to get an owned String from a Feed, you have to clone a &str returned by the setter, rather than just taking the ownership out of the Feed.

If Feed had mutable getters like fn title_mut(&mut self) -> &mut String, you could take the field without cloning:

use std::mem;
use atom_syndication::Feed;

let mut feed = Feed::default();
feed.set_title("Feed Title");
let title = mem::take(feed.title_mut());
assert_eq!(title, "Feed Title");
assert_eq!(feed.title(), "");

If the idea seems fine to you, I'd love to make a pull request (to this repository and rust-syndication/rss).

Regression in last version

The crate stopped to parse feeds with non-conformant date fields.
Example feed

There was a possibility to loosely parse date fields with custom code (or just skip them). But now the whole feed is discarded with error WrongDatetime.

XHTML elements are not serialized correctly

After the merge of #37, XHTML in Text elements are deserialized correctly (or at least seem to be), but the serialization does not handle XHTML specifically, which, other than HTML, does not need to be escaped, and shouldn't be, thus, the serialization for elements with type TextType::Xhtml is incorrect.

Implementation of `Display` for `Error` displays the source of the error, not the error message

The current implementation of Display for Error forwards to the inner type's implementation of Display if there is one. However, it is preferable not to do this, for two reasons:

if the user wants the string representation of the inner error, they can always call .source()?.to_string() instead.
When formatted with tools such as Anyhow, the error message ends up duplicating the same information twice, e.g.:

Error: failed to refresh syndication feeds

Caused by:
0: incomplete utf-8 byte sequence from index 5
1: incomplete utf-8 byte sequence from index 5

It would be better if the error displayed what went wrong instead of why it went wrong, producing more helpful traces like this:

Error: failed to refresh syndication feeds

Caused by:
0: failed to parse Atom feed
1: incomplete utf-8 byte sequence from index 5

Now the error message tells a very clear and unambiguous story of exactly what happened. It makes it much easier to debug and is more friendly for users.

Note that this applies even to variants other than Error::Utf8 and Error::Xml. For example, the trace of an Error::Eof would better look like this:

Error: failed to refresh syndication feeds

Caused by:
0: failed to parse Atom feed
1: unexpected end of input

Than what it currently looks like:

Error: failed to refresh syndication feeds

Caused by:
0: unexpected end of input

What had an unexpected end of input? The error doesn't tell you.

The improved trace can be implemented by having source() return a new (potentially private) type EofError which Displays as "unexpected end of input".