Giter VIP home page Giter VIP logo

fednlp's Introduction

FedNLP

Code and Data for GPT Deciphering Fedspeak: Quantifying Dissent Among Hawks and Doves, a paper published at Findings of EMNLP 2023.

Setup

This section shows you how to setup your codebase and OpenAI key in order to run code in this repository.

  1. Run pip install -r requirements.txt
  2. Create a .env file at root level (same level as this README) and put in it OPENAI_KEY=sk-REST_OF_KEY. Replace sk-REST_OF_KEY with your actual OpenAI API key.

Data

results/statements_scores is generated by 0-shot prompting GPT-4 to label each statement as one of ("dovish", "mostly dovish", "neutral", "mostly hawkish", "hawkish")

results/minutes_scores is generated by 0-shot prompting GPT-4-32K to label each minutes file as one of ("dovish", "mostly dovish", "neutral", "mostly hawkish", "hawkish")

results/statements_scores_by_sentence.json is generated by 0-shot prompting GPT-4 to label each sentence of each statement as one of ("dovish", "mostly dovish", "neutral", "mostly hawkish", "hawkish")

None means that GPT-4 did not return one of the five labels. "TOO SHORT" means that the sentence was less than 5 characters, and was thus not analyzed.

results/transcripts_full_score.json is generated by 0-shot prompting GPT-4-32K to examine all of each speakers speech, and provide each speaker a single score for each transcript, one of ("dovish", "mostly dovish", "neutral", "mostly hawkish", "hawkish").

results/statements_scores_by_sentence_few_shot is generated by few-shot prompting GPT-4 to examine sentences in each statement, score each sentence as one of ("dovish", "mostly dovish", "neutral", "mostly hawkish", "hawkish"). Null means the sentence did not properly receive a label.

Citation

@article{peskoff2023gpt,
  title={Deciphering Fedspeak: Quantifying Dissent Among Hawks and Doves},
  author={Peskoff, Denis and Visokay, Adam and Schulhoff, Sander V and Wachspress, Benjamin and Blinder, Alan and Stewart, Brandon M},
  journal={Findings of EMNLP},
  volume={2023},
  year={2023},
  month={October},
  day={20},
  keywords={FOMC, Fed, GPT, LLM},
  abstract={Markets and policymakers around the world hang on the consequential monetary policy decisions made by the Federal Open Market Committee (FOMC). Publicly available textual documentation of their meetings provide insight into members' attitudes about the economy. We use GPT-4 to quantify dissent among members on the topic of inflation. We find that transcripts and minutes reflect the diversity of member views about the macroeconomic outlook in a way that is lost or omitted from the public statements. In fact, diverging opinions that shed light upon the committee's "true" attitudes are almost entirely omitted from the final statements. Hence, we argue that forecasting FOMC sentiment based solely on statements will not sufficiently reflect dissent among the hawks and doves.},
  type={Regular Short Paper},
  track={NLP Applications},
  track2={Computational Social Science and Cultural Analytics},
}

fednlp's People

Contributors

avisokay avatar bjw6 avatar denispeskoff avatar trigaten avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

fednlp's Issues

Experiments to run

Statements -> each sentence gets a score
-> intest entire statement for score
Minutes -> ingest entire meeting for score
Transcripts -> identify speaker and give each a single score based on ingesting all of their speech at once

Statements: By Sentence

For all available statements, predict EACH sentence as -1, -.5, 0, .5, 1. Keep separated by statement.

Transcript level analysis

For each speaker in each meeting, take all of their sentences and give each one a score, then report by sentence scores and averages.

Task 8: Manual annotation of ALL statements

We need to go through all the statements we are sending to GPT and rank them -1, -.5, 0, .5, 1 the same way Alan did in the email to create a gold label.

Following this guidance
image

Please optimizing accuracy first then speed second, classify all statements.
If you are unsure then mark is as ? and we can escalate to Alan for grey area decisions.

Also keep an eye peeled for anything especially notable/worth commenting on. (the 1 out of 100 statement)

Alan's thoughts were:
Nov. 1994 is a definite +1, not so much for the wording (though that fits your scale) as for the huge 75 bp move.
Dec. 2008 is way beyond a -1. The Fed threw the kitchen sink at the failing economy then—big rate cut, forward guidance,…

Minutes: Entire Minutes

For every available meeting minutes, ingest the entire minutes and predict -1, -.5, 0, .5, 1

Transcripts: By Speaker

For each available transcript, calculate BY SPEAKER a prediction of -1, -.5, 0, .5, 1. Keep this separated across unrelated transcripts.

Example:
Joe: sentence 1. sentence 2. sentence 3
Adam: sentence 1
Joe sentence 1. sentence 2

Should be predicted as:
Joe: .5
Adam: 0
Joe: 1

Run 4 sentences (used for our figure) through GPT

Run each one as a separate sentence prediction:

  1. “The gradual increase in core inflation over the past year is a concern to me.”
  2. “I do not believe that there is a very great risk of an unmanageable outbreak of inflation during the relevant policy horizon.”
  3. “I think we should not slow our pace of easing moves at this meeting.”

(this one is a statement but don't think that changes anything)
4.“The Committee continues to believe that against the background of its long-run goals of price stability and sustainable economic growth and of the information currently available, the risks are weighted mainly toward conditions that may generate economic weakness in the foreseeable future.”

Task 7: finalizing figure

We need a beautiful figure to put either in the top right of the second column on the first page that summarizes our task.

Or one on the top of the second page (spanning the entire width).

(Or likely both, with one explaining the task, and one presenting the results)

Task 6: Speaker Level Transcript

@sanderschulhoff Next step is to process the transcripts. The main difference from the statements is we need to produce a hawk/dove score for each speaker, for each meeting. So the first new task is, for each speaker, create a list of all of their sentences. We will then iterate and score those speaker sentences.

For each transcript from 1994-2016:
Get all sentences for each speaker:
For each sentence by speaker:
Measure each sentence on scale of -1, -0.5, 0, 0.5, 1 (according to Category definitions from above)
Record average measurement for each speaker
Return average measurement for each transcript

For example output: {19940204: {greenspan:0.46, yellen:-0.25, geitner:0.65, ... , avg:0.26},
{19940204: {greenspan:0.1, yellen:-0.38, geitner:0.45, ... , avg:-0.15}, etc}

Task 1: recreate sentence level classification

For starters, here is the paper I referenced earlier today where researchers at the Fed tested GPT against other traditional NLP methods. Let's start by recreating this using the statements in our dataset from 1994-2016. Specifically, we should prompt GPT-4 to characterize each sentence exactly as they did in their paper, from -1 to 1 (photo attached). Once we have all of the sentences marked, we can take an unweighted average of the scores for all sentences within a meeting to give us a "score" for that date. Let me know if you have questions, but I think starting here makes sense as it is more straightforward than working with the minutes/transcripts.

Task 9: Prepare transcript data by speaker for GPT processing

We want to create data that lumps everything said by a speaker within a transcript together:

John: sentence 1, sentence 2, sentence 6, sentence 7
Sally: sentence 3, sentence 4
Joe: sentence 5

We will want to be able to 1) evaluate the hawk/dove stance for a speaker 2) the aggregate stance of all speakers in the transcript 3) how the same speaker changes throughout different years of transcripts

Re-Run statements with few-shot

Adam will provide 10 examples in json (text:prediction). Re-run all statements (double check with Adam but I think it should be ENTIRE statement, not by sentence).

Task 3: Record Descriptive Stats from Sentence Level Processing

@sanderschulhoff in addition to the unweighted average, it would be helpful to also have access to the raw counts for each sentence level classification. For the task you have already completed as well as for the upcoming sentence-level work on transcripts.

For example output: {19940204: {-1: 10, -0.5: 2, 0:25, 0.5:4, 1:10, avg:0.4},
19940322: {-1: 4, -0.5: 0, 0:18, 0.5:3, 1:6, avg:0.3333333333333333}, etc}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.