Giter VIP home page Giter VIP logo

messageformat.net's Introduction

MessageFormatter for .NET

- better UI strings.

Build & Test

This is an implementation of the ICU Message Format in .NET. For official information about the format, go to: http://userguide.icu-project.org/formatparse/messages

Quickstart

var mf = new MessageFormatter();

var str = @"You have {notifications, plural,
              zero {no notifications}
               one {one notification}
               =42 {a universal amount of notifications}
             other {# notifications}
            }. Have a nice day, {name}!";
var formatted = mf.FormatMessage(str, new Dictionary<string, object>{
  {"notifications", 4},
  {"name", "Jeff"}
});

//Result: You have 4 notifications. Have a nice day, Jeff!

Or, if you don't like dictionaries, and don't mind a bit of reflection..

var formatted = mf.FormatMessage(str, new {
  notifications = 0,
  name = "Jeff"
});

//Result: You have no notifications. Have a nice day, Jeff!

You can use a static method, too:

var formatted = MessageFormatter.Format(str, new {
  notifications = 1,
  name = "Jeff"
});

//Result: You have one notification. Have a nice day, Jeff!

Installation

Either clone this repo and build it, or install it with NuGet:

Install-Package MessageFormat

Features

  • It's fast. Everything is hand-written; no parser-generators, not even regular expressions.
  • It's portable. The library is targeting .NET Standard 2.0.
  • It's (relatively) small. For a .NET library, ~25kb is not a lot.
  • It's very white-space tolerant. You can structure your blocks so they are more readable - look at the example above.
  • Nesting is supported. You can nest your blocks as you please, there's no special structure required to do this, just ensure your braces match.
  • Adding your own formatters. I don't know why you would need to, but if you want, you can add your own formatters, and take advantage of the code in my base classes to help you parse patterns. Look at the source, this is how I implemented the built-in formatters.
  • Exceptions make at least a little sense. When exceptions are thrown due to a bad pattern, the exception should include useful information.
  • There are unit tests. Run them yourself if you want, they're using XUnit.
  • Built-in cache. If you are formatting messages in a tight loop, with different data for each iteration, and if you are reusing the same instance of MessageFormatter, the formatter will cache the tokens of each pattern (nested, too), so it won't have to spend CPU time to parse out literals every time. I benchmarked it, and on my monster machine, it didn't make much of a difference (10000 iterations).
  • Built-in pluralization formatters. Generated from the CLDR pluralization rule data.

Performance

If you look at MessageFormatterCachingTests, you will find a "with cache" and "without cache" test.

My machine runs on a Core i7 3960x, and with about 100,000 iterations with random data (generated beforehand), it takes about 2 seconds (1892ms) with the cache, and about 3 seconds (3236ms) without it. These results are with a debug build, when it is in release mode the time taken is reduced by about 40%! :)

Supported formats

MessageFormat.NET supports the most commonly used formats:

  • Select Format: {gender, select, male{He likes} female{She likes} other{They like}} cheeseburgers
  • Plural Format: There {msgCount, plural, zero {are no unread messages} one {is 1 unread message} other{are # unread messages}}. (where # is the actual number, with the offset (if any) subtracted).
  • Simple variable replacement: Your name is {name}
  • Numbers: Your age is {age, number}
  • Dates: You were born {birthday, date}
  • Time: The time is {now, time}

You can also specify a predefined style, for example {birthday, date, short}. The supported predefined styles are:

  • For the number format: integer, currency, percent
  • For the date format: short, full
  • For the time format: short, medium

These are currently mapped to the built-in .NET format specifiers. This package does not ship with any locale data beyond the pluralizers that are generated based on CLDR data, so if you wish to provide your own localized formatting, read the section below.

Customize formatting

If you wish to control exactly how number, date and time are formatted, you can either:

  • Derive CustomValueFormatter and override the format methods
  • Instantiate a CustomValueFormatters and assign a lambda to the desired properties Then pass it in as the customValueFormatter parameter to new MessageFormatter.

Example: A custom formatter that allows the use of .NET's formatting tokens. This is for illustration purposes only and is not recommended for use in real apps.

// This is using the lambda-based approach.
var custom = new CustomValueFormatters
{
    // The formatter must set the `formatted` out parameter and return `true`
    // If the formatter returns `false`, the built-in formatting is used.
    Number = (CultureInfo _, object? value, string? style, out string? formatted) =>
    {
        formatted = string.Format($"{{0:{style}}}", value);
        return true;
    }
};

// Create a MessageFormatter with the custom value formatter.
var formatter = new MessageFormatter(locale: "en-US", customValueFormatter: custom);

// Format a message.
var message = formatter.FormatMessage("{value, number, $0.0}", new { value = 23 });
// "$23.0"

Adding your own pluralizer functions

Since MessageFormat 5.0, pluralizers based on the official CLDR data ship with the package, so this is no longer needed.

Same thing as with MessageFormat.js, you can add your own pluralizer function. The Pluralizers property is a IDictionary<string, Pluralizer>, so you can remove the built-in ones if you want.

var mf = new MessageFormatter();
mf.Pluralizers.Add("<locale>", n => {
  // ´n´ is the number being pluralized.
  if(n == 0)
    return "zero";
  if(n == 1)
    return "one";
  return "other";
});

There's no restrictions on what strings you may return, nor what strings you may use in your pluralization block.

var mf = new MessageFormatter(true, "en"); // true = use cache
mf.Pluralizers["en"] = n =>
{
    // ´n´ is the number being pluralized.
    if (n == 0)
        return "zero";
    if (n == 1)
        return "one";
    if (n > 1000)
        return "thatsalot";
    return "other";
};

mf.FormatMessage("You have {number, plural, thatsalot {a shitload of notifications} other {# notifications}}", new Dictionary<string, object>{
  {"number", 1001}
});

Escaping literals

Simple - the literals are {, } and # (in a plural block). If literals occur in the text portions, then they need to be quoted by enclosing them in pairs of single quotes ('). A pair of single quotes always represents one single quote ('' -> '), which still applies inside quoted text. (This '{isn''t}' obviousThis {isn't} obvious)

Anything else?

There's not a lot - Alex Sexton of MessageFormat.js did a great job documenting his library, and like I said, I wrote my implementation so it would be (somewhat) compatible with his.

Bugs / issues

If you have issues with the library, and the exception makes no sense, please open an issue and include your message, as well as the data you used.

Author

I'm Jeff Hansen, a software developer who likes to fiddle with string parsing when it is not too difficult. I also do a lot of ASP.NET Web API back-end development, and quite a bit of web front-end stuff.

You can find me on Twitter: @jeffijoe.

messageformat.net's People

Contributors

apjones6 avatar erictheswift avatar gitter-badger avatar jeffijoe avatar kostya9 avatar meoblast001 avatar ramziyassine avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

messageformat.net's Issues

Port to .net core

Hello,

are you think of porting that library to .net core?

Thanks
Damian

Newline between selectors not trimmed

Description

When taking the sample pattern from https://unicode-org.github.io/icu/userguide/format_parse/messages/ there's a leading \n which shouldn't be there:

var pattern = """
                      {gender_of_host, select, 
                      female {
                      {num_guests, plural, offset:1 
                      =0 {{host} does not give a party.}
                      =1 {{host} invites {guest} to her party.}
                      =2 {{host} invites {guest} and one other person to her party.}
                      other {{host} invites {guest} and # other people to her party.}}}
                      male {
                      {num_guests, plural, offset:1 
                      =0 {{host} does not give a party.}
                      =1 {{host} invites {guest} to his party.}
                      =2 {{host} invites {guest} and one other person to his party.}
                      other {{host} invites {guest} and # other people to his party.}}}
                      other {
                      {num_guests, plural, offset:1 
                      =0 {{host} does not give a party.}
                      =1 {{host} invites {guest} to their party.}
                      =2 {{host} invites {guest} and one other person to their party.}
                      other {{host} invites {guest} and # other people to their party.}}}}
                      """;
        var message = MessageFormatter.Format(
            pattern,
            new Dictionary<string, object?>
            {
                { "gender_of_host", "male" },
                { "host", "Fritz" },
                { "guest", "Frieda" },
                { "num_guests", 5 }
            });

Expected outcome

message.Should().Be("Fritz invites Frieda and 4 other people to his party.");

Actual outcome

There's a leading newline:

"\nFritz invites Frieda and 4 other people to his party."

Line breaks for static text inside 'select' are ignored

Library version: 6.2.
Sample message:

"Single text which will not change.\nSummary:{acceptedData, select, NONE {} other {\nAccepted Data:{acceptedData}}}"

Message arguments:

{
    "acceptedData": "\n-X\n-Y\n-Z"
}

MessageFormatter usage:

var messageFormatProvider = new MessageFormatter(true, "en");
var result = messageFormatProvider.FormatMessage(message, messageArguments)

Current result:

"Single text which will not change.\nSummary:Accepted Data:\n-X\n-Y\n-Z"

Current result from VisualStudio Text Visualizer:
image
Expected result:

"Single text which will not change.\nSummary:\nAccepted Data:\n-X\n-Y\n-Z"

Stylistic whitespace breaks plural format

Unit test:

diff --git a/src/Jeffijoe.MessageFormat.Tests/MessageFormatterFullIntegrationTests.cs b/src/Jeffijoe.MessageFormat.Tests/MessageFormatterFullIntegrationTests.cs
index 5148af6..29af464 100644
--- a/src/Jeffijoe.MessageFormat.Tests/MessageFormatterFullIntegrationTests.cs
+++ b/src/Jeffijoe.MessageFormat.Tests/MessageFormatterFullIntegrationTests.cs
@@ -334,6 +334,13 @@ namespace Jeffijoe.MessageFormat.Tests
                         new Dictionary<string, object?> { { "count", 3 } },
                         "You and 2 others added this to their profiles."
                     };
+                yield return
+                    new object[]
+                    {
+                        "{ count, plural, one {1 thing} other {# things} }",
+                        new Dictionary<string, object?> { { "count", 2 } },
+                        "2 things"
+                    };
             }
         }

Result:

Failed Jeffijoe.MessageFormat.Tests.MessageFormatterFullIntegrationTests.FormatMessage(source: "{ count, plural, one {1 thing} other {# things} }", args: [[count, 2]], expected: "2 things") [1 ms]
  Error Message:
   Assert.Equal() Failure
           ↓ (pos 1)
Expected: 2 things
Actual:   2
           ↑ (pos 1)

Would be nice if whitespace after the opening brace was allowed 🙂 Other MessageFormat implementations that we are using support it, and I would like to not care about/enforce a specific style.

Empty sub-message doesn't work

Great work! I started replacing my brute-force formatting code with your MessageFormatter.Format method. I found one case where I want an empty string in the zero case, so I tried this:

"{nbrAttachments, plural, zero {} one {{nbrAttachmentsFmt} attachment} other {{nbrAttachmentsFmt} attachments}}"

There is a runtime exception for this, but if I replace the {} with { }, it returns a space and doesn't throw.

I want no display if there are no attachments, but the proper plural display for all others. If I simply omit the "zero {}" part, it returns "0 attachments".

Thread safety issues in MessageFormatter.cs

Overview

I use your library in the realm of internationalization. It is straightforward, and supports the ICU message format. 👍 However I believe there is a threading issue in MessageFormatter mainly around the use of the cache https://github.com/jeffijoe/messageformat.net/blob/master/src/Jeffijoe.MessageFormat/MessageFormatter.cs#L387message that sometimes manifest as the following exceptions

Type=System.ArgumentException
Message=An item with the same key has already been added.
...
Stack Trace:
   at System.ThrowHelper.ThrowArgumentException(ExceptionResource resource)
   at System.Collections.Generic.Dictionary`2.Insert(TKey key, TValue value, Boolean add)
   at Jeffijoe.MessageFormat.MessageFormatter.ParseRequests(String pattern, StringBuilder sourceBuilder) in d:\Projects\MessageFormat.NET\src\Jeffijoe.MessageFormat\MessageFormatter.cs:line 390
   at Jeffijoe.MessageFormat.MessageFormatter.FormatMessage(String pattern, IDictionary`2 args) in d:\Projects\MessageFormat.NET\src\Jeffijoe.MessageFormat\MessageFormatter.cs:line 230

I created a console app and soak test your API (multiple threads re using the same MessageFormatter instance) in hopes to try to reproduce this issue but as many threading problems it was a challenge to get a consistent failure.

Proposal

  • I know we can use the static method as it has an inherit lock but I was thinking more closely along the line of changing MessageFormatter to use a concurrent dictionary and allow the instances of this class to be shared across threads. I will be more than happy to help create patch and submit a PR if you agree with that mentality

IDictionary interface support

6.0.1
public string FormatMessage(string pattern, IReadOnlyDictionary<string, object?> args) does not accept dictionary as IDictionary
This causes public string FormatMessage(string pattern, object args) this method to be executed

This can be fixed by direct cast, but it breaking change

Number, date, time, duration formatters

Hi!

How about other formatters supported by messageformat.net and defined by the spec? Especially number, because otherwise there's no way for plural to work as it is supposed to, since now it just spits out the number "as is" in the Invariant Culture, without respect to the regional decimal separator and without any thousand separators.

Although I've looked at https://github.com/andyearnshaw/Intl.js/blob/master/src/11.numberformat.js and it looks pretty hopeless to faithfully translate that into C#.

Signed assembly

Hi,
Would it be possible to sign the assembly so it can easily be used in other signed assemblies?

NullRef @ VariableFormatter if any parameter value is null

This one throws NullRef:

MessageFormatter.Format("{maybeCount}", new {maybeCount = (int?)null});

from VariableFormatter.cs#L58

Is VariableFormatter not null-safe by design?
Currently I'm using custom variable formatter as a workaround.

Missing Variable handler

Currently any missing variable throws an exception and stops the formatting.

I have a custom "iif" formatter which I have been using in an angular project that I'm converting to blazor and it basically does this
{SomeVariable, iif, true, false}
in my iif formatter, i check if its a string, if its not empty, if its a boolean if its true, else its not null etc.

So I request the MessageFormatter expose a "MissingVariableHandler" or a "DontThrowOnMissingVariable". a function "MissingVariableHandler" could covert that to string.empty or a dontthrow could treat that variable as just null.

Support for IReadOnlyDictionary

IReadOnlyDictionary<string, object?> dictionary
    = new Dictionary<string, object?>() { { "maximumLength", 100 } }.AsReadOnly();

Console.WriteLine(mf.FormatMessage(
    "Article titles must not contain more than {maximumLength} characters.",
    dictionary));

Unhandled exception. Jeffijoe.MessageFormat.Formatting.VariableNotFoundException: The variable 'maximumLength' was not found in the arguments collection.

Edit: It works if you use ReadOnlyDictionary (not the interface). We may decide not to use read-only dictionaries (at least via the interface), but it should perhaps be supported anyway?

Bool True vs true

We encountered an incompatibillity with messageformat.js. When you use a boolean für a select, then messageformat.js converts it to true/false for the key, but messageformat.net converts it to True/False.

I think lowercase true/false is the better option an it should be fixed here. (Maybe an option should be introduced, then it could be fixed while maintaining backwards compatibility.)

# not working after variable

Given this message (taken from the ICU website):

var message = @"{gender_of_host, select, 
  female {
    {num_guests, plural, offset:1 
      =0 {{host} does not give a party.}
      =1 {{host} invites {guest} to her party.}
      =2 {{host} invites {guest} and one other person to her party.}
      other {{host} invites {guest} and # other people to her party.}}}
  male {
    {num_guests, plural, offset:1 
      =0 {{host} does not give a party.}
      =1 {{host} invites {guest} to his party.}
      =2 {{host} invites {guest} and one other person to his party.}
      other {{host} invites {guest} and # other people to his party.}}}
  other {
    {num_guests, plural, offset:1 
      =0 {{host} does not give a party.}
      =1 {{host} invites {guest} to their party.}
      =2 {{host} invites {guest} and one other person to their party.}
      other {{host} invites {guest} and # other people to their party.}}}}";


var mf = new MessageFormatter();
mf.FormatMessage(message, new
{
    gender_of_host = "female",
    num_guests = "5",
    host = "Mary",
    guest = "Bob"
}).Dump();

I would expect to get "Mary invites Bob and 4 other people to her party.".
But instead got "Mary invites Bob and # other people to her party."

Support InvariantGlobalization / default to InvariantCulture

When enabling InvariantGlobalization (which I believe is now the default in API/gRPC/Worker project templates), the library fails with:

System.Globalization.CultureNotFoundException : Only the invariant culture is supported in globalization-invariant mode. See https://aka.ms/GlobalizationInvariantMode for more information. (Parameter 'name')

en is an invalid culture identifier.

   at System.Globalization.CultureInfo..ctor(String name, Boolean useUserOverride)
   at Jeffijoe.MessageFormat.Formatting.Formatters.VariableFormatter.GetCultureInfo(String locale)
   at Jeffijoe.MessageFormat.Formatting.Formatters.VariableFormatter.Format(String locale, FormatterRequest request, IReadOnlyDictionary`2 args, Object value, IMessageFormatter messageFormatter)
   at Jeffijoe.MessageFormat.MessageFormatter.FormatMessage(String pattern, IReadOnlyDictionary`2 args)

Would it make sense to default to the invariant culture (empty string "") instead of "en"?

This affects static usage – with a custom instance you could simply specify the locale/culture yourself.

It's a breaking change, but worth it in my opinion.

An alternative would be to allow the locale on the default instance to be set.

Supporting choice format

I am porting some code over to .NET from Java and it uses the JDK's MessageFormat class to format some strings. I am having 2 issues with compatibility here as far as support goes:

  1. messageformat.net doesn't support the choice format
  2. The args parameter that MessageFormat expects is an object array (which is what I have), and MessageFormatter expects a dictionary.

Inputs

Pattern: "{0,choice,0#|1#{1}|2#{1} to {2}}"
Args: object[] args = new object[] { 2, "Any", "Hex Escape" };

Attempt

I think I have worked around the second issue by converting the object array to a dictionary like so:

object[] args = new object[] { 2, "Any", "Hex Escape" };
var dic = new Dictionary<string, object>();
for (int j = 0; j < args.Length; j++)
{
	dic.Add(j.ToString(), args[j]);
}

But that causes MessageFormatter to throw:

format.FormatMessage(pattern, dic)' threw an exception of type 'Jeffijoe.MessageFormat.FormatterNotFoundException'
Message: "Format 'choice' could not be resolved.\r\nLine 1, position 1\r\nSource literal: '0,choice,0#|1#{1}|2#{1} to {2}'"
Source: "Jeffijoe.MessageFormat"
StackTrace: " at Jeffijoe.MessageFormat.Formatting.FormatterLibrary.GetFormatter(FormatterRequest request)\r\n at Jeffijoe.MessageFormat.MessageFormatter.FormatMessage(String pattern, IDictionary`2 args)"

Questions

  1. First of all, is there a workaround for this with messageformat.net?
  2. Is this something that can be supported by messageformat.net, or should I be considering porting MessageFormat from the JDK?
  3. If it can be supported, is this something you would be willing to support?

Throw exception when variables do not exist in given dictionary.

Currently I've got a sexy AssertVariableExists in the Base formatter that does one thing, and that is make the tests look good.

In other words, it's not being used, but it should. But I am thinking I shouldn't make each formatter be responsible for throwing, when I can do it in the MessageFormatter.FormatMessage before the actual formatter is being called.

Essentially, we want to know when we supplied bad data.

Other/custom locales

I know, that the only built-in locale is currently "en".
Is there a guide on how to implement and use "custom" locales?

I would imagine distributing those as individual NuGet-s for each locale (e.g. "MessageFormat.jp", "MessageFormat.sp", "MessageFormat.pl") but would need to first understand how would one create those, without altering the core library? Is it possible, or does adding a new one is only possible withing this project (by a PR submission or a fork)? If it's the latter, having an example/guideline would also help.

For now, I have only found how to provide a pluralizer specific to a culture/locale.
Is it because there's no such support? E.g. I have found nothing on support for "number" and "date" with "standard" styles ("long", "short", "medium", "full") and/or skeletons even for "en" (although it should be possible to do it in relatively general way), so how would I go about providing it for other locales?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.