Giter VIP home page Giter VIP logo

archived-geh-aggregations's People

Contributors

bjarkemeier avatar djorgensendk avatar dstenroejl avatar johevemi avatar jonasdmoeller avatar kpeen avatar kristianschneider avatar lasrinnil avatar lasseklitgaard-energinet avatar madsbloendandersen avatar mknic avatar prtandrup avatar renetnielsen avatar sondergaard avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

archived-geh-aggregations's Issues

Refactor method PrepareMessages

It would be a nice touch if the implementation could be done in a way so the method doesn't state a return type of IEnumerable but instead implement the method to return IEnumerable where T is IOutboundMessage. In that way it would be possible to return IEnumerable as AggregatedExchangeResultMessage is of type IOutboundMessage.

[Spike] Find telemetry SDK to take over from application insights python SDK

The currently used telemetry SDK (application insights) in python is no longer maintained or supported by Microsoft and we have thereby removed it from our repository.

We need to find another SDK for telemetry and come up with a set of tasks to get telemetry implemented in our current aggregation code.

It looks like the following SDK is the suggested one:
https://docs.microsoft.com/en-us/azure/azure-monitor/app/opencensus-python

Modify timeseries dataset with numeric values

For all string representations on types of data eg. settlement method, connections state, quality etc. we need a numeric representation of those values.

We need numeric values for easier aggregation in pyspark

Er det kun quality der skal med eller er der andre værdier der skal konverteres til numeriske værdier?

Calculation based on hourly tariff for flex metering points

Calculate:
Energy quantity per hourly tariff (kWh)
Price per tariff (DKK/kWh/day)
Amounts (DKK) (Energy Quantity * price)

Filter on the following

  • Daily tariffs+ChargeOwner per flex MP per energy supplier per grid area

Energy quantity and prices is measured in hourly resolution. Each price per hour is multiplied with the respective hourly energy quantity.

IF the MP is PT15M,
THEN each 15 min is summed up into one hour, which is then, multiplies with the hourly price.

Accept Criterias

  1. Input and output data is present in the excel validation sheet
  2. Output of the calculation has been verified
  3. "What should the result look like?"
  4. Unit test on prices should reflect precision 8 decimals
  5. Unit test cover the expected rounding rules (see page 287 in RSM guide)

Fetch metering point data registered as GridLoss and SystemCorrection in aggregation engine

We need to fetch information about the metering points registered as GridLoss and SystemCorrection.

An SQL view has been created in the MasterData database called MeteringPointsRegisteredAsGridLossOrSystemCorrections.

We need to make an API that can fetch this data from our aggregation engine.

We should not connect directly to the SQL server.

er denne del ikke allerede tegnet ind på arkitekturdiagrammet, og dermed også taget beslutning om dette?

image.png

This task is about the architecture and implementation around fetching the data about GridLoss and SystemCorrection metering points and storing it somewhere, so we are able to consume the data from within the aggregation domain.

This can be solved in numerous ways:
Through a data factory, where a job runs every now and then to fetch data and store it in csv format so we are able to query it from the aggregation engine as a dataframe.
Through an event based solution, where an event is published on update of data, and the aggregation domain subscribes to the event and stores the data as csv or some other format.
Some other solution I can't come up with at the moment.

Ligner den her opgaven noget du har siddet med i forbindelse med hele opsætningen af test data som du er igang med? 😊

Discussion points:

Architecture:
Should we make the API in the form of an Azure Function with a http-trigger?
Infrastructure:
How should this be deployed?

Open discussion:
Is this a case where we should use protobuf?

[Cancelled] Integration between aggregation and metering points

Description

The aggregation domain needs specific data from the metering points domain.
Therefore there must be established relevant integration points between these two domain.

The metering points domain must publish events - and the aggregation domain must subscribe to these events, containing following data:

  • MeteringPointMrid
  • MPType
  • SettlementMehod
  • MeteringMethod
  • MeterReadingPeriodicity
  • MeteringGridArea
  • ConnectionState
  • NetSettlementGroup
  • InMeteringGridArea
  • OutMeteringGridArea
  • Parent_MeteringPoint_mRID (Only relevant for child metering points)
  • Occurrence

Acceptance criteria

  1. Contracts between metering points and aggregation domain has been made, and describes the necessary data that the aggregation domain needs
  2. The aggregation domain subscribes to events published by the metering point domain, when it concerns any of the data described above
  3. When one of these data are updated in the metering point domain, this update must be published, to be picked up by the aggregation domain and stored as the new truth

Update MetaData when aggregation or wholesale job are created

As a datahub user
i want to be able to filter on the following parameters in job search:

  • ProcessType
  • JobID
  • GridArea
  • Process period
  • Execution date (Start and end date)
  • Username of the person who triggered the job
  • Job status (Scheduled, Running, Finished)
  • JobType (simulation/the right one)

Fill out all metadata when a databricks job is created:
https://github.com/Energinet-DataHub/geh-aggregations/blob/2682e0b4b74418683f2ffc555bd628cfd3872708/source/coordinator/GreenEnergyHub.Aggregation.Domain/DTOs/MetaData/Job.cs

TODO's with reference to this issue #199 should be solved in this issue.

AC

  1. Metadata for databricks jobs correspond to the list defined in task description.
  2. Metadata for databricks jobs are saved based on input from function

Modify timeseries dataset with numeric values

For all string representations on types of data eg. settlement method, connections state, quality etc. we need a numeric representation of those values.

We need numeric values for easier aggregation in pyspark

Er det kun quality der skal med eller er der andre værdier der skal konverteres til numeriske værdier?

Test

Is your feature request related to a problem? Please describe.

Describe the solution you'd like.

Describe alternatives you've considered.

Teachability, Documentation, Adoption, Migration Strategy

Calculation based on fees per flex consumption metering points

Calculate:
Quantity per fee (kWh) (number of the same fee connected to one metering point)
Price per fee(DKK/kWh/day)
Amounts (DKK) (Quantity )

Filter on the following

  • fees+ChargeOwner per flex MP per energy supplier per grid area

Fees are calculated on days where fees has been linked on the metering point. One time payments

e. g
A grid operator has connected your physical meter at your house on date X, at a cost Y.
This Y (fee price) will then be present on date X

Accept Criterias

  1. Input and output data is present in the excel validation sheet
  2. Output of the calculation has been verified
  3. The following parameters must be part of the results
  • ChargeCode (ChargeID)
  • ChargeOwner
  • EnergySupplier
  • GridArea
  • fee quantity (number of fees)
  • prices per fees per day
  • amounts (fee quantity per day * fee price per day)
  • Metering Point Type
  • Settlement method
  • ChargeType

Store meta data for wholesale settlement job

Who ran the job
When was the job triggered
JobStatus
start date for job
end date for job
Process type
Process variant
etc

Ask khatozen for a complete list of meta data that needs to be stored

Wholesale settlement V2 - Calculate settlement for all metering point types (BRS-027)

Description

This feature is a continuation of Wholesale settlement V1, where we calculate settlement for flex consumption metering points for all charge types.

This feature will take into account hourly consumption metering points, production metering points, and all child metering points.

The purpose of this feature is to finish the calculation engine, used for wholesale settlement.

Acceptance criterias

  1. Perform wholesale settlement (tariffs, fees & Subscriptions) for all metering point types
  • Aggregate quantities (Energy and quantity) per charge per energy supplier per type of metering point
  • Get prices per charge
  • Calculate amounts
  1. It must be possible to trigger a Wholesale fixing (D05) and Correction settlement (D32) process, with different process variants
    a) Wholsale fixing: Process variant = 1st (D01)
    b) Correction settlement: Process variant = 1st (D01), 2nd (D02), 3rd (D03)
  2. Store results from the calculations
  3. Store basis data from the calculations
  4. Register grid loss and system correction metering points
  5. Test data is updated with new metering points

Estimate

0,75 PI

Store basis data for wholesale settlement job

Description
We need to store basis data for a wholesale settlement process (Both D05 and D32_D01/D02/D03). Datahub needs this data for market actor to request them, and to handle any disputes with market actors.

We need to rewrite the way we store basis data for aggregation process so that we store each dataframe individually before joining data from different sources.
We will use the same approach as with storing results from calculations.

Acceptance criteria

  1. Store dataframes individually before joining data
  2. Paths to dataframes should have the following format: delta/basis-data/{jobId}/{filename}
  3. Method to store dataframes should be moved to shared and should be used by aggregation and wholesale
  4. Rename "result-id" to "job-id" from "coordinator" down to "trigger-base-arguments"

Calculation based on subscriptions per flex consumption

Calculate:
Quantity per subscription (number of the same subscription connected to one metering point)

e.g
MP1 has 2 subscription

Sub1

  • Quantity = 5
    Sub2
  • Quantity = 1

Price per subscription (DKK/kWh/month)
Amounts (DKK) (Quantity * Price)

Filter on the following

  • subscriptions(+ChargeOwner) per flex MP per energy supplier per grid area

The price of a subscription is monthly and needs to be distributed across all days in the calculated period.

IF the calculating period is crossing over the 1st of a month,
THEN you will get two different days values, one for each month (unless there are equal number of days in the two months, including in the period)

e.g
Sub price = 100 DKK/month
January = 31 days
february = 28 days

calculation period = 15 jan to 15 feb

Daily values for subscription
Jan = 100/31
Feb = 100/28

Accept Criterias

  1. Input and output data is present in the excel validation sheet
  2. Output of the calculation has been verified
  3. The following parameters must be part of the results
  • ChargeCode (ChargeID)
  • ChargeOwner
  • EnergySupplier
  • GridArea
  • subscription quantity (number of subscriptions)
  • prices per subscription per day
  • daily amounts (subscription quantity * subscription price per day)
  • Metering Point Type
  • Settlement method
  • ChargeType
  1. Unit test on prices should reflect precision 8 decimals
  2. Unit test cover the expected rounding rules (see rules in RSM-guide p. 287)

Add files with ref to main repo

Following files need to be added and include a reference to it's "parent" file in the geh main repo

  • Security.md - Ref to file in geh_repo
  • License.md - Ref to file in geh_repo
  • Code of conduct.md - Ref to file in geh_repo
  • Community.md - Ref to file in geh_repo
  • Contribution.md
    -- Depending the file content in geh_repo => refer

Register grid loss and system correction as special metering points

Once the grid loss and system correction metering point is created, these metering points needs to be registered as special metering points

E17/D01 (Calculated) = Grid loss
E18 (Calculated) = System correction

Each grid area have their own grid loss and system correction metering point.

AC1: Grid loss metering point has been registered with a metering point with following criteria's:
MeteringPointType = E17
SettlementMethod = D01
MeteringPointSubTypeCode = Calculated
PhysicalStatusCode = Connected

AC2: System correction metering point has been registered with a metering point with following criteria's
MeteringPointType = E18
MeteringPointSubTypeCode = Calculated
PhysicalStatusCode = Connected

AC3: When registering either grid loss or system correction metering point and the Metering point used does not fulfill the described criteria's in AC1 or AC2, the registration should be rejected.

AC4: Grid loss and system correction MP should be assigned to a specific grid area, and can only be assigned to one at the same time. (if Closed down)
AC5: If trying to register a Grid loss or system correction MP when one is already registered, this request should be rejected.
AC6: All grid areas must have a grid loss and system correction metering point

[Cancelled] Wholesale settlement V1 - For flex consumption metering points (hourly and fixed price) - (BRS-027)

Description

The purpose of the wholesale settlement process is to financially settle the electricity market.

The energy supplier (ES) is the single point of contact for end-consumers, to the electricity market. Therefore, ES's are also in charge of billing the total cost of electricity to consumers. See the following illustration below.

This means that ES's invoice the end consumer on behalf of the grid company and government, for their related tariffs/taxes, fees, and subscriptions on specific metering points.

The wholesale settlement process calculates how much the ES "owes" to the respective market participants.

This is done by categorizing tariffs, subscriptions and fees per ES per grid area and this information is sent to the ES and the charge owner.

The process consists of three parts:
Part 1: Aggregate quantities
Part 2: Collect prices
Part 3: Calculate prices

Domain
Aggregation

SME's
KWQ
IRS
PHQ

Acceptance criteria

  1. Perform wholesale settlement (Tariffs, Fees and Subscriptions) for flex consumption metering points
    i. Aggregate quantities per charge per energy supplier
    ii. Get prices per charge
    iii. Calculate amounts
  2. Trigger the following processes
    a. D05
    b. D32
    i. D01
    ii. D02
    iii. D03
  3. Store results from the calculation
  4. Store basis data from the calculation
  5. Sent results of the mentioned steps to market actors. (Own interpretation) (on hold until post office is more defined)

Brainstorm databricks questions for Q&A session

As a team, we want to compile a set of questions to provide databricks the opportunity to answer them at the Q&A session which is in the end of August

AC1: Provide Martin with a list of questions

Add test.md file

  • Add: Referral to QA statement in geh_repo

Optional additions:

  • section: For any other relevant domain related tests, specific functional or performance tests
  • section: Test data examples

Aggregated quality on results

Quality on timeseries dataframe needs to be aggregated for every aggregation step.

We must do this because the current implementation of aggregated quality only supports aggregation on grid area, metering point type and resolution.

But in reality we need to aggregate quality for all types of grouping we work with.

Examples of groupings:
step 1 - IN grid area, OUT grid area, Resolution
step 2 - IN grid area, Resolution and OUT grid area, Resolution
Grid area, Resolution
Grid area, BRP, Resolution
Grid area, Energy Supplier, Resolution
Grid area, BRP, Energy Supplier, Resolution

One way to solve this would be to create a function which corresponds to each grouping and aggregate quality in those steps.

UPDATE:
We should refactor table to hold numeric values where it's applicable, eg. Quality, ConnectionState etc.
Maybe we should just add more columns, so we keep the normal column representation besides the numeric column.

Quality is aggregated per grouping stated in the description
Rules for aggregated quality based on quality must comply with rules stated here #118045

Refactor check precision of test data corresponding to real data

Our unit tests of aggregations use a decimal precision and scale of Decimal(38, 10). Is this intentional or does it differ from precision in real data (energy quantity)?

image.png

We need to ensure that input data in our unit tests has the correct precision and datatype, matching real life data

Solution:
According to RSM guide:
RSM-014: EnergyQuantity page 93 and 96, there is a maximum of 18 digits and a maximum of 3 decimals. so DecimalType(18, 3)

Note:
Rounding rules are defined in the RSM guide on page 287
Image
Information on rounding in pyspark:
https://www.educba.com/pyspark-round/

wip https://github.com/Energinet-DataHub/geh-aggregations/tree/feature/127863-refactor_precision_of_test_data

Store wholesale settlement results in blob

Description
Store results of the wholesale settlement for both tariffs, subscriptions and fees.

Quantity: Can either be hourly EnergyQuantity, or number of subscriptions or fees
Prices on all charges to a given time
and amounts (Quantity * price) to a given time

Path to result blob must be in this format: Results/{JobID}/{FileName}

Acceptance criteria

  1. Results from BRS-027 are store with the results from BRS-023

Convert DDX CIM message in Cosmos DB to NBS format

This user story is a part of the MessageShipper.

AC1: Convert DDX CIM message stored i Cosmos DB to NBS format
AC2: eSett role is able to deque message from the que
AC3: eSett can only deque message related to their role.

Add content to Aggregations road map

Add content to domain road map (currently planned and perhaps future work if it makes sense).

If nothing is planned, use this text:
"No work planned in current program increment."

Resolution should be configurable

Resolution of time series could potentially change over time (May 22 - 2023), so we need to make the resolution configurable.

A possible solution is to add the resolution as a parameter to the coordinator and send it along to the aggregation job, so we are able to aggregate over different windows instead of hard coding the time window to one hour.

What happens if we use different resolutions across a period. For example 1-22 may in 1H and 22-31 may in 15M

Trigger for wholesale settlement processes

Should be separate databricks job attached to existing cluster
Create azure http trigger function
Paramter list:
From date
To date
Process variant (Correction settlement only)
Process type
Etc.

Update readme

Add to intro section:

What does this project do?
• Include a high-level context description
• Add referral(s) to added wiki page(s) containing the business workflows handle or supported by the repo, e.g. Receive time series
• Add domain specific NFRs, e.g. x time series values per y (ambitions + what has been achieved(?))
• Domain road map (What is the plan for the repo now and in the future)
(Perhaps just a referral to the solution road map, that Martin creates)

How do I get started?
• Installation… TBD

Where can I get more help, if I need it?
• Code owners? Gitter versus Slack? Referral to geh_repo?

Add/Update section: Architectural diagram

Add new section: Dataflows diagram between domains (Martin has the diagram)

[Spike] Upgrade third party libraries, python version, spark etc strategy

We need a strategy in the domains using databricks, python, spark etc. to know when and how to upgrade things like:

  • Python
  • Spark
  • Databricks
  • PyPi packages
  • Maven packages (installed on the databricks cluster)
  • Etc.

We need to ensure that upgrades in pipelines should be reflected in local dev environment as well (Docker image) to make sure we are running the same versions locally as we do in our pipelines.

Calculation based on daily tariff for flex metering points

Calculate:
Energy quantity per daily tariff (kWh)
Price per tariff (DKK/kWh/day)
Amounts (DKK)

Filter on the following

  • Daily tariffs+ChargeOwner per flex MP per energy supplier per grid area (Charge Owners)

Energy quantity is measured in hourly resolution, while the price is daily. Therefore, energy quantity must be summed up per day and multiplies with the daily price.

Accept Criterias

  1. Input and output data is present in the excel validation sheet
  2. Output of the calculation has been verified
  3. The following parameters must be part of the results
  • ChargeCode (ChargeID)
  • ChargeOwner
  • EnergySupplier
  • GridArea
  • EnergyQuantity Per daily tariff
  • Daily prices per tariff
  • Daily amounts (Energy Quantity * Daily tariff price)
  • Metering Point Type
  • Settlement method
  • Charge Type

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.