Description

This repository contains the source code to generate data from Jespersen's Cycle in Middle English using the second edition of the Penn Parsed Corpus of Middle English (PPCME2).

The query and codes included are based on work done by Aaron Ecay and Meredith Tamminga. I'd like to thank them for sharing their time and expertise in understanding the queries and code. Note that the results generated here are distinct.

Instructions

Data

Starting in the 12th century we see a change in how sentential negation is expressed. We initially observe purely pre-verbal negation.

Ic ne secge

This is followed by a period of bipartite negation where we observe both pre- and post-verbal negative markers.

I ne seye not

Finally, we observe purely post-verbal negation.

I say not

This change is often referred to as Jespersen's Cycle, following Dahl (1979), due to Jespersen's (1917) observation that:

The history of negative expressions in various languages makes us witness the following curious fluctuation: the original negative adverb is first weakened, then found insufficient and therefore strengthened, generally through some additional word, and this in its turn may be felt as the negative proper and may then in course of time be subject to the same developments as the original word

Code

To run the code either download the files as a ZIP, or clone the repository:

git clone https://github.com/christopherahern/jespersens-cycle-middle-english.git

Change directories to the cloned repository and create a symbolic link to the root directory of your copy of your local copy of the PPCME2:

ln -s <location of PPCME2> corpus

Now run the make script to output the data to data/neg-data.csv:

./make.sh

As a point of reference, make.sh takes less than two minutes to run on a laptop:

time ./make.sh

real	1m24.266s
user	1m48.135s
sys	0m2.013s

Output

The data will be output to data/neg-data.csv with the following columns:

exclude : tokens we might want to exclude for various reasons
ne : whether ne appears and whether it is contracted
not : whether not appears and whether it is before or after the verb
clausetype : details about whether the token appears in a matrix or relative clause
never.posn : whether never appears and whether it is before or after the verb
finite : whether or not the clause is finite
id : unique id of the sentence containing the token
year : year of the document containing the token
document : name of the document containing the token
stage : (1) ne..., (2) ne...not, (3) ...not

Note that these are all defined by the queries in coding.c and the script data.R.

The dates of each document can be found in the description of the corpus and are summarized in data/document-dates.csv.

Citation

If you use this repository to generate data, please cite it. More importantly, if you use data generated from the parsed corpus please cite the corpus:

Kroch, Anthony, and Ann Taylor. 2000. The Penn-Helsinki Parsed Corpus of Middle English (PPCME2). Department of Linguistics, University of Pennsylvania. CD-ROM, second edition, release 4.

It takes a lot of time and effort to build and annotate historical corpora well, so it's important to acknowledge that hard work.

Comments

If you have comments or questions about anything, feel free to email [email protected] or create an issue

christopherahern / jespersens-cycle-middle-english Goto Github PK

jespersens-cycle-middle-english's Introduction

Description

Instructions

Data

Code

Output

Citation

Comments

jespersens-cycle-middle-english's People

Contributors

Watchers

jespersens-cycle-middle-english's Issues

Add output of dates to csv file to data.R

Re-write data.R to remove lag data

Dating scheme for documents

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent