Giter VIP home page Giter VIP logo

Comments (5)

flobernd avatar flobernd commented on July 19, 2024

Hi @DR9885,

most of the times you want enums to be stored as string, because otherwise reordering them or adding new members inbetween will mess up the complete mapping.

As we are using System.Text.Json as our JSON serializer, you could simply register a custom converter for this specific enum:

public class JsonOrdinalEnumConverter :
	JsonConverter<Enum>
{
	public override Enum? Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options) =>
		(Enum)Enum.ToObject(typeToConvert, reader.GetUInt64());

	public override void Write(Utf8JsonWriter writer, Enum value, JsonSerializerOptions options) =>
		writer.WriteNumberValue(Convert.ToUInt64(value));
}

It's important to use the converter attribute on the property like this:

public class Movie
{
    [JsonConverter(typeof(JsonOrdinalEnumConverter))]
    public Rating Rating { get; init; }
}

and not on the type (sadly, the default serializer always takes precedence):

[JsonConverter(typeof(JsonOrdinalEnumConverter))]
public enum Rating
{
    ...
}

Does that solve your issue?

from elasticsearch-net.

DR9885 avatar DR9885 commented on July 19, 2024

Good to know, we can leverage the System.Text features.

However, these changes have other impacts:

  • Range Queries are not supported. (as stated above)
  • For Everyone, Upgrading from Nest to 8, All Term Queries will no longer work, and will have to go to ".keyword"
  • Text & Keyword is much larger than Int, so when having a large search db with a bunch of enums. This will now increase the size and slow down the index speed.
  • Responses will also be slower, with larger text to be sent over the network. some enum names can be really long.
  • Also, C# has flags, which are designed to store attributes in the smallest format, which would break the purpose of this.
    https://dev.to/ayodejii/enum-flags-in-c-20a6
  • most nosql stores keep enums as int format.

Example of some Enum Values we use, that would be too wordy to store as strings.

        SuccessorAdditionalOrChangeInTrustee = 52,
        UnscheduledDrawCreditEnhancement = 53,
        SubstitutionCreditLiquidityProvider = 54 ,
        FinancialObligationIncurrence = 55,
        FinObligEventReflectingFinancialDifficulties = 56 , 
        FinancialObligationIncurrenceDebtObligation  = 57 ,
        FinancialObligIncurrenceGuaranteeDebtOblig = 58 ,
        FinanObligIncurrenceGuaranteeDerivativeInstr = 59,
        FinancialObligIncurrenceDerivativeInstrument = 60,
        FinanObligEventReflectFinDifficultiesDefault = 61,
        FinObligEventReflectFinDiffEventOfAccel = 62,
        FinObligEventReflectFinDiffTerminationEvent = 63,
        FinanObligEventReflectFinDiffModifOfTerms = 64 ,
        FinanObligEventReflectFinanDiffOther = 65,
        InitialAssetBackedSecuritiesDisclosure = 66 ,
        QuarterlyAssetBackedSecuritiesDisclosure = 67 ,
        AnnualAssetBackedSecuritiesDisclosure = 68,
        OtherAssetBackedSecuritiesDisclosure = 69 ,
        BankLoanAlternativeFinancingFilings = 70 

I think it would make more sense to have compiled languages like C# and Java store it as an integer. And javascript store it as a string.

from elasticsearch-net.

flobernd avatar flobernd commented on July 19, 2024

The new default behavior was decided before I joined the team and there probably are as well good arguments in favor of this change. Maybe @Mpdreamz @stevejgordon can remember these?

Range Queries are not supported. (as stated above)

Range queries on enum fields is a nieche use-case and probably even indicates a wrong type. Enums are not intended to cover a continuous range of values.

For Everyone, Upgrading from Nest to 8, All Term Queries will no longer work, and will have to go to ".keyword"

Your point is completely valid. This is a breaking change and could mess with existing data, if not carefully handled (like e.g. using the workaround I mentioned above). I have to check, if this point is part of the migration guide..

The other arguments do all boil down to some advances/disadvantages of the underlaying data type. I still think the current default is good as it minimizes surprises (especially for rather inexperienced developers or new users who don't think about the underlaying types). For the more advances users there still is the option to customize this behavior like using a custom converter like mentioned above.

Maybe I could make the default configurable to make it a little bit easier to switch back to ordinal values.

from elasticsearch-net.

Mpdreamz avatar Mpdreamz commented on July 19, 2024

I remember we did this because we got a lot of bug reports where this caught people off guard.

For sure [Flag] enums should be serialised as integers (if they aren't already).

With https://github.com/elastic/elastic-ingest-dotnet we don't default to writing enums as strings because System.Text.Json also doesn't.

The one argument for sending them as integers is to allow people to rename enums values provided they maintain the enum ordering.

We should align https://github.com/elastic/elastic-ingest-dotnet with the client behaviour though.

from elasticsearch-net.

DR9885 avatar DR9885 commented on July 19, 2024

@Mpdreamz there are also performance benefits to using int over string.

  • less network overhead
  • less serialization
  • faster indexing (text & keyword need larger indexes partial/tokenized matching). Integers only need to be indexed for terms.

I have a few objects where we save with over 100 different enums in each object, with batches of 10k models. The change to will impact performance heavily.

Also updating each and every field that uses an enum would take a lot of developer time. Can we just make this configurable by the framework as a whole?

from elasticsearch-net.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.