tomprogrammer / rust-ascii Goto Github PK
View Code? Open in Web Editor NEWASCII-only equivalents to `char`, `str` and `String`.
License: Apache License 2.0
ASCII-only equivalents to `char`, `str` and `String`.
License: Apache License 2.0
These implementations should also exist for AsciiString
.
I'm unsure if even the non-mut AsciiStr <-> [AsciiChar] slice transmutes like this one here are sound, since you generally can't assume that the outer type is compatible with the inner type without using #[repr(transparent)]:
Lines 350 to 353 in 296c3a8
We could implement std::str::pattern::Pattern
to be able to utilize many methods of the standard library.
This issue is waiting for stabilization of rust-lang/rust#27721.
Just like &str
implements Into<String>
.
This would involve copying the string, but would allow writing APIs as:
fn foo<S: IntoAsciiString>(s: S) {
let s = s.into_ascii_string();
....
}
etc, and have those work seamlessly with string and bytestring literals.
I assume "foo\nbar"
producing ["foo"]
is a bug,
But the lines_iter()
test assert that AsciiStr::from("\n").lines()
produces nothing, which is not what I expected, and also differs from str.lines()
(which produces a single empty slice).
str.lines()
handles "\r\n"
and trailing newline, so could the (currently rather complex) implementation be replaced with a simple forwarding implementation?
EDIT on 2018-09-11: fixed typo "foo/nbar"
and corrected .split()
to .lines()
.
ascii_char.rs has this typo:
/// `'_'`
Caret = 94,
Obviously, that should be '^'.
Ideally the APIs should be the same, except the UTF-related bits (like str's utf16_units
) and porting programs should be very easy. Currently lots of methods are missing in AsciiStr and AsciiString. (like str.lines()
, str.bytes()
. str.parse()
...)
Cargo features are supposed to be additive, which means the current no_std
feature is backwards. This is an issue if you're writing a library, because there is currently no way to pass a negative switch (e.g. if I have a use_std
flag, there is no way to say 'enable the no_std
flag if it's not enabled.
The current system should be switched, so that there is a use_std
feature, which is enabled in the default features.
Hi. I want to use AsciiStr::split()
, which is on the document. However, I realized split
is not in the latest version (0.9.1) of ascii crate published to crates.io.
Can you please publish next version to crate.io if possible?
I noticed in https://github.com/tomprogrammer/rust-ascii/blob/master/Cargo.toml#L12 that we're specifying quickcheck 0.4.1, not a general "^0.4" or "^0.4.1". Would it make sense to specify that instead?
(Sorry, didn't realize that while looking at it earlier!)
It seems that this crate's "Start of Text" member (ASCII '\x02') is mislabeled as SOX, whereas everything else on the web (including the linked-to Wikipedia page) references it as STX.
Since this would be a breaking change, I'm not sure how that would fit within the versioning, but I figured I'd report it anyway since I haven't seen any comments for this on the Issues/PRs for this repo.
Add characters code from 128 to 255.
I'm working with a C API which requires me to use NUL-terminated, no-interior-NUL, ascii strings. Since the standard library contains CStr
and CString
types it might be nice if this library contained the ascii equivalents of those as well.
They allow writing non-ASCII values to an AsciiStr which when read out as an AsciiChar will produce values outside the valid niche.
These impls were added by me in 4fbd050, so 0.9, 0.8 and 0.7 are affected.
Here's an example using these impls to create out-of-bounds array indexing in safe code (when compiled in release mode):
let mut buf = [0u8; 1];
let ascii = buf.as_mut_ascii_str().unwrap();
let byte_view = <&mut[u8] as From<&mut AsciiStr>>::from(ascii);
let arr = [0b11011101u8; 128];
byte_view[0] = 180;
assert_ne!(arr[ascii[0] as u8 as usize], 0b11011101);
I don't see any good way to tell users of the crate to stop using these impls:
Deprecation notices on trait impls are ignored (by both Rust 1.38 and Rust 1.9).
Changing the impls to panic or return an empty slice could break working code (that never writes non-ASCII values) at run-time.
The only fix we could make appears to be to remove the impls, telling users of them to do the unsafe pointer casting explicitly. On one hand this will make any accidental users of it aware of the problem when they update Cargo.lock, but it will also break any use that happened to be OK with a minor release, and any reverse dependencies of these uses.
On the other hand doing nothing and hoping nobody accidentally uses these impls feels irresponsible. What do you think @tomprogrammer?
In any case I don't think we need to fix 0.7 and 0.8, as Rust didn't backport the security fix in 1.29.1 to previous affected versions.
Hi --
Looks like rust-ascii is available under the MIT and Apache 2.0 licenses. Could you include the respective license files in the repo?
Thanks!
See http://burntsushi.net/rustdoc/quickcheck/trait.Arbitrary.html for a description.
Quickcheck is a randomized testing tool that lets you check general properties. To make your type part of the Quickcheck ecosystem, you need to implement Arbitrary
, which involves two methods:
(1) generate a random input
(2) shrink a failing input
This should be relatively easy. See http://burntsushi.net/rustdoc/src/quickcheck/arbitrary.rs.html#391 for how it is done for String
and char
-- it would probably be straightforward to port that logic to AsciiString
and AsciiChar
.
This can be behind a feature gate to make sure that if you aren't using quickcheck
otherwise, you don't have to pull it in.
(I might do this or get someone else from Facebook to do it -- filing it to keep track :) )
Dear all,
Is there any evidence if this crate is faster than Rust std String (for ASCII)?
Did anybody do benchmark or something?
Generally I think there should be a section in README about why should somebody use this crate, for ASCII strings.
The quickcheck
crate has been updated to 0.8 since the last release of ascii
which depends on 0.6.
In Debian, we usually package the latest version of crates except if there is a compelling reason to package earlier versions in addition.
See also Debian Bug #927314.
A String
is usually preferred to a Box<str>
. However, there is one good reason to prefer the Boxed variant to the growable String, which is that String requires more memory. See https://users.rust-lang.org/t/use-case-for-box-str-and-string/8295/4
Similarly, Box<AsciiStr>
uses less memory than AsciiString
, so it would be helpful if there were ways to convert a AsciiString
to a Box<AsciiStr>
, just as it is possible to do so for String
and Box<str>
.
On my machine, Box<AsciiStr>
uses 16 bytes compared to 24 bytes for AsciiString
.
Hello,
While packaging the latest crate for my distribution, I noticed that all files in the published crate have executable bits. This cause issue with one of our packaging script and seems to be an error. Could you remedy this?
Thank you,
/Users/coreyf/.cargo/git/checkouts/rust-ascii-d57360998fa8e9eb/master/src/lib.rs:126:1: 130:2 error: the impl does not reference any types defined in this crate; only traits defined in the current crate can be implemented for arbitrary types [E0117]
/Users/coreyf/.cargo/git/checkouts/rust-ascii-d57360998fa8e9eb/master/src/lib.rs:126 impl fmt::Display for Vec<Ascii> {
/Users/coreyf/.cargo/git/checkouts/rust-ascii-d57360998fa8e9eb/master/src/lib.rs:127 fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
/Users/coreyf/.cargo/git/checkouts/rust-ascii-d57360998fa8e9eb/master/src/lib.rs:128 fmt::Display::fmt(&self[..], f)
/Users/coreyf/.cargo/git/checkouts/rust-ascii-d57360998fa8e9eb/master/src/lib.rs:129 }
/Users/coreyf/.cargo/git/checkouts/rust-ascii-d57360998fa8e9eb/master/src/lib.rs:130 }
/Users/coreyf/.cargo/git/checkouts/rust-ascii-d57360998fa8e9eb/master/src/lib.rs:132:1: 136:2 error: the impl does not reference any types defined in this crate; only traits defined in the current crate can be implemented for arbitrary types [E0117]
/Users/coreyf/.cargo/git/checkouts/rust-ascii-d57360998fa8e9eb/master/src/lib.rs:132 impl fmt::Display for [Ascii] {
/Users/coreyf/.cargo/git/checkouts/rust-ascii-d57360998fa8e9eb/master/src/lib.rs:133 fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
/Users/coreyf/.cargo/git/checkouts/rust-ascii-d57360998fa8e9eb/master/src/lib.rs:134 fmt::Display::fmt(self.as_str(), f)
/Users/coreyf/.cargo/git/checkouts/rust-ascii-d57360998fa8e9eb/master/src/lib.rs:135 }
/Users/coreyf/.cargo/git/checkouts/rust-ascii-d57360998fa8e9eb/master/src/lib.rs:136 }
/Users/coreyf/.cargo/git/checkouts/rust-ascii-d57360998fa8e9eb/master/src/lib.rs:188:1: 224:2 error: the impl does not reference any types defined in this crate; only traits defined in the current crate can be implemented for arbitrary types [E0117]
/Users/coreyf/.cargo/git/checkouts/rust-ascii-d57360998fa8e9eb/master/src/lib.rs:188 impl AsciiExt for [Ascii] {
/Users/coreyf/.cargo/git/checkouts/rust-ascii-d57360998fa8e9eb/master/src/lib.rs:189 type Owned = Vec<Ascii>;
/Users/coreyf/.cargo/git/checkouts/rust-ascii-d57360998fa8e9eb/master/src/lib.rs:190
/Users/coreyf/.cargo/git/checkouts/rust-ascii-d57360998fa8e9eb/master/src/lib.rs:191 #[inline]
/Users/coreyf/.cargo/git/checkouts/rust-ascii-d57360998fa8e9eb/master/src/lib.rs:192 fn is_ascii(&self) -> bool {
/Users/coreyf/.cargo/git/checkouts/rust-ascii-d57360998fa8e9eb/master/src/lib.rs:193 true
...
/Users/coreyf/.cargo/git/checkouts/rust-ascii-d57360998fa8e9eb/master/src/lib.rs:226:1: 238:2 error: the impl does not reference any types defined in this crate; only traits defined in the current crate can be implemented for arbitrary types [E0117]
/Users/coreyf/.cargo/git/checkouts/rust-ascii-d57360998fa8e9eb/master/src/lib.rs:226 impl OwnedAsciiExt for Vec<Ascii> {
/Users/coreyf/.cargo/git/checkouts/rust-ascii-d57360998fa8e9eb/master/src/lib.rs:227 #[inline]
/Users/coreyf/.cargo/git/checkouts/rust-ascii-d57360998fa8e9eb/master/src/lib.rs:228 fn into_ascii_uppercase(mut self) -> Vec<Ascii> {
/Users/coreyf/.cargo/git/checkouts/rust-ascii-d57360998fa8e9eb/master/src/lib.rs:229 self.make_ascii_uppercase();
/Users/coreyf/.cargo/git/checkouts/rust-ascii-d57360998fa8e9eb/master/src/lib.rs:230 self
/Users/coreyf/.cargo/git/checkouts/rust-ascii-d57360998fa8e9eb/master/src/lib.rs:231 }
Currently the documentation only lists items which are available using std
. There should also be online documentation for the core
-only feature set.
Thanks for the library. It'd be really awesome if we could add:
from_byte(u8) -> Ascii
<- this one is really necessary as there's no way to implement this without exposed Ascii
constructor.trim(&AsciiStr) -> &AsciiStr
just like String.trim.If you like the idea I can add these myself and send a pull request.
Since mem::transmute
will be const-stable in 1.56.0, it would be nice if there was a way to construct an AsciiStr
(and AsciiString
) at compile-time. Since the from_str
function is in a trait, and thus cannot be const
yet, we would probably need to add a new method for this.
Preliminary
While str
guarantees statically that data of its type is valid UTF-8, the type Ascii
guarantees ASCII-conformance. Therefore the types [Ascii]
and str
and their owned counterparts Vec<Ascii>
and String
should behave similar.
Topic of this Issue
Ascii
provides functions like to_uppercase()
and to_lowercase()
which can be applied to single ascii-characters. Currently such operations are not implemented on owned or borrowed strings of ascii-characters. As the types Vec<Ascii>
and [Ascii]
should be opaque manually implementing the iteration isn't recommended because it is a implementation detail of these types.
Example:
error: type `&[Ascii]` does not implement any method in scope named `to_uppercase`
let _ = "abcXYZ".to_ascii().unwrap().to_uppercase();
^~~~~~~~~~~~~~~~~~~~
The types String
and str
provide functionality for converting to uppercase and lowercase with their implementations of the traits std::ascii::{AsciiExt, OwnedAsciiExt}
. These traits are intended for "[ā¦] ASCII-subset only operations on string slices" and owned strings. Of course Vec<Ascii>
and [Ascii]
are subsets of ascii, they are equivalent so it's valid to implement them for the ascii only string types.
Implement the traits:
impl AsciiExt<Vec<Ascii>> for [Ascii]
impl AsciiExt for Ascii
impl OwnedAsciiExt for Vec<Ascii>
The implementations use functionality present in Ascii
if possible.
std::ascii::{AsciiExt, OwnedAsciiExt}
are marked experimental. This shouldn't be a real issue as I expect this crate to follow the way conversions are done in the standard library.std::ascii::{AsciiExt, OwnedAsciiExt}
carry the infix ascii
which is redundant in the case the traits are implemented on Vec<Ascii>
, [Ascii]
and Ascii
. This redundancy must be tolerated to achieve the goals described above.Ascii
implements the same functionality in the functions to_uppercase()
/ to_ascii_uppercase()
and to_lowercase()
and to_ascii_lowercase()
.to_uppercase()
and to_lowercase()
on Ascii
in favour of their equivalents in AsciiExt
. That removes the duplication mentioned in the drawbacks.Iām not really interested in maintaining this crate. @tomprogrammer, how do you feel about moving it to https://github.com/tomprogrammer/rust-ascii/ ?
See title. I'm not an expert on what subset of const functions are stabilized yet, but I think a lot could already be const fn
(think AsciiChar::as_byte
).
I'm using this crate in a no_std
environment by enabling the alloc feature as described in the readme. However, this feature is not included in the 1.0.0 version as the readme seems to imply, so maybe it would be good to publish a new version that includes the alloc feature and any other recent additions?
Hi, I am scanning the ascii in the latest version with my own static analyzer tool.
Unsafe conversion found at: src/ascii_str.rs:533:27: 533:62.
macro_rules! widen_box {
($wider: ty) => {
#[cfg(feature = "alloc")]
impl From<Box<AsciiStr>> for Box<$wider> {
#[inline]
fn from(owned: Box<AsciiStr>) -> Box<$wider> {
let ptr = Box::into_raw(owned) as *mut $wider;
unsafe { Box::from_raw(ptr) }
}
}
};
}
This unsound implementation would create a misalignment issues, if the ty is not properly handled like it's other random types.
This would potentially cause undefined behaviors in Rust. If we further manipulate the problematic converted types, it would potentially lead to different consequences. I am reporting this issue for your attention.
I'm a rust beginner, so maybe I'm just doing it wrong. But I can't seem to find a nice way to convert an array or slice of AsciiChar into a str (via &AsciiStr).
The best I could come up with is this, but it still seems overly verbose:
let mut s = String::new();
s.push_str(AsRef::<AsciiStr>::as_ref(&[AsciiChar::A, AsciiChar::B, AsciiChar::C].as_ref()).as_str());
AsciiExt
and Error
are not in core, but some of their methods might still be useful.
We can add feature-gated inherent impls for those methods, but should we?
description()
for ToAsciiCharError
and AsAsciiStrError
eq_ignore_case()
for AsciiChar
and AsciiStr
to_ascii_{upper,lower}case()
for AsciiChar
make_ascii_{upper,lower}case()
for AsciiStr
In AsciiCast
impl unconstrained lifetimes appear which were forbidden. (rust-lang/rust#24461)
Feel free to submit a PR.
Why are versions 0.7.0-0.9.2 deleted/yanked from crates.io? Causing build errors for downstream users of this project. E.g.: tiny-http/tiny-http#162
The lines
and split
methods currently return impl DoubleEndedIterator<Item = &AsciiStr>
. It would be nice if they returned a concrete type, similar to std::str::Lines
and std::str::Split
. A concrete type allows the the iterator to be easily stored in a struct.
If I'm not missing something, it is required only for tests, so it should be in the [dev-dependencies]
section
A declarative, efficient, and flexible JavaScript library for building user interfaces.
š Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ššš
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ā¤ļø Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.