zannick / demystify Goto Github PK
View Code? Open in Web Editor NEWA Magic: The Gathering parser
License: Other
A Magic: The Gathering parser
License: Other
Demystify A Magic: The Gathering Parser 1) ABOUT Demystify is an attempt to make it possible for a computer to understand what general Magic: The Gathering cards do. It is currently written in a combination of ANTLR 3.5 and Python 3.2. 2) INSTALLATION Demystify depends on the following libraries and/or programs, which you will need to have installed in order to build the parser and run Demystify. See INSTALL for more specific instructions. - Java 1.6.0 (or later) - ANTLR 3.5 (or later 3.5 release) - ANTLR 3 Python3 runtime [available at github.com/antlr/antlr3] - jpackage-utils 1.7.5 (or later) - Python 3.2 - python-progressbar2 3) BUILDING To build the parser generator, you need to have the macro.g and Words.g grammar files up-to-date. If they don't exist, or demystify/keywords.py has changed, run $ python3 demystify/keywords.py to regenerate them. Then, you need to run antlr3 on the grammar. If you've followed the instructions in INSTALL and gotten yourself an antlr3 script, all you need to do is: $ cd demystify/grammar/ $ antlr3 Demystify.g If not, your commandline will look something like this (supposing your classpath is properly set): $ cd demystify/grammar/ $ java org.antlr.Tool Demystify.g (Note that cd instruction; if you "antlr3 demystify/grammar/Demystify.g" instead, you'll get the .py output files where they belong, but the .tokens files are put in your working directory.) This takes a little while. If it worked (and it worked if all the output you received were of the form "warning(138): ... no start rule ...", and not "error(12345)" etc.), demystify/grammar/ should now contain a series of Demystify*.py files. These will be used by the main demystify.py script. If it doesn't work, and you're sure you installed antlr3 and java correctly, check that you followed instructions in INSTALL again. If antlr3 is actually giving grammar errors, please check the Issues tab in github for the issue or file a new bug. 4) RUNNING The main entry point is the demystify.py script in the demystify/ folder. It currently offers two modes of running: load Loads the card data from the Scryfall data file in demystify/data/cache/ (may download it from Scryfall if necessary), and performs preprocessing steps necessary before any lexing and parsing is done. With -i, opens an interactive prompt afterward, at which functions in demystify.py and card.py can be called. This is useful for actually invoking the parser, as well as doing special card searches using the utility functions in card.py. test Runs the tests (or a specific test) in demystify/tests/. These can be run from within demystify with: $ python3 demystify.py load -i Add -h or --help for more information: $ python3 demystify.py -h $ python3 demystify.py test -h 5) LEXING AND PARSING The intention of this project is to eventually generate full parse trees for every line of text on every card. However, as this is a low priority personal project, Demystify is far from this goal. At the moment, all that you can do is test the lexer or parser. First, load the environment. $ python3 demystify.py load -i Processing cards for card names... Innocent Blood [##########################] 14715 of 14715 Time: 0:00:01 >>> cards = card.get_cards() get_cards() returns all the cards loaded. If you like, you can select a smaller set of cards via card.get_card_set(), perform a search (note that searches return matching text rather than matching cards), or get an individual card. >>> karn = card.get_card('Karn, Silver Golem') The lexer is essentially done (modulo any new words or symbols in sets not yet added to the project), but it can still be tested, with test_lex or lex_card. lex_card will print the preprocessed text followed by a table of lex symbols. (You can also get at the preprocessed text by simply printing the card object.) >>> lex_card(karn) whenever SELF blocks or becomes blocked, it gets -4/+4 until end of turn. {1}: target non-creature artifact becomes an artifact creature with... 1 0 0 whenever WHEN 1 9 2 SELF SELF 1 14 4 blocks BLOCK 1 21 6 or OR 1 24 8 becomes BECOME 1 32 10 blocked BLOCKED 1 39 11 , COMMA ... 1 72 30 . PERIOD 2 0 32 1 MANA_SYM 2 3 33 : COLON ... Note that test_lex is very slow, but very effective at finding new vocabulary. The parse_all function is meant to eventually parse rules text, but at the moment it parses only mana cost and type lines. It takes a list of cards to parse, and adds the parse tree to each card individually. >>> parse_all(cards) Innocent Blood [##########################] 14715 of 14715 Time: 0:00:25 0 total errors. >>> karn.parsed_cost <antlr3.tree.CommonTree instance (COST (MANA 5))> >>> karn.parsed_typeline <antlr3.tree.CommonTree instance (TYPELINE (SUPERTYPES legendary) (TYPES artifact creature) (SUBTYPES golem))> The other parse functions use a regex to narrow down the text they attempt to parse. You may still pass all the cards. >>> parse_triggers(cards) Whippoorwill [############################] 3702 of 3702 Time: 0:00:02 181 total errors. 68 unique cases missing. >>> parse_keyword_lines(cards) Wasp Lancer [############################] 5085 of 5085 Time: 0:00:02 1 total errors. 1 unique cases missing. >>> parse_ability_costs(cards) Helm of Obedienc [############################] 4369 of 4369 Time: 0:00:02 2 total errors. 2 unique cases missing. Each parse function reports how many times it couldn't parse a line of text, and (attempts to) group them together into cases. These are all logged to the LOG file, which is useful for development. The parse results themselves are again attached to the cards, but as lists of strings rather than antlr tree objects. >>> karn.parsed_costs ['(COST (MANA 1))'] >>> karn.parsed_triggers ['(TRIGGER (EVENT (SUBSET SELF) (OR (BECOME BLOCKING) (BECOME blocked))))'] >>> akroma = card.get_card('Akroma, Angel of Wrath') >>> akroma.parsed_keywords ['(KEYWORDS flying FIRST_STRIKE vigilance trample haste (protection (PROPERTIES black) (PROPERTIES red)))'] You can test individual rules with arbitrary text by calling test_parse, or by creating a parsing unittest in test/. 6) MISCELLANEOUS Dependency visualization: The deps/deps.py script reads the grammar files and outputs to stdout a dot format graph file, which describes how parser rules in the grammar call other parser rules, as well as how the grammar files reference other grammar files. You'll want to use graphviz's dot program to visualize this, e.g. with: $ python3 deps/deps.py | dot -Tpng -o gdeps.png graphviz can be installed via your package management tool, or downloaded from http://www.graphviz.org. 7) CONTRIBUTIONS Are welcome! Please use github forks and pull requests. Bugs without fixes can be reported as well. http://github.com/Zannick/demystify/ 8) LICENSE AND DISCLAIMER The files in this repository fall into four categories: A) antlr3/antlr3 B) Python code in demystify/ and ANTLRv3 code in demystify/grammar/ C) deps/deps.py D) Card data in demystify/data/ and demystify/tests/ (A) is based on the antlr3 script that was installed on one of my machines when I installed ANTLRv3 via yum. It is therefore (to the best of my knowledge) covered until ANTLRv3's license (BSD), which is included as antlr3/LICENSE. (B) is the Demystify program itself, and the core of this repository. It is licensed under the Lesser GNU Public License version 3. See COPYING for the GPL v3 and COPYING.LESSER for the LGPL v3. (C) is also licensed under the LGPL v3 but it is not a part of Demystify (though it runs by reading parts of Demystify) which is why I've set it apart. (D) consists of Magic: The Gathering card data, in the form of full card records, slightly modified to fix errors (data/), or in the form of test cases, which may include actual card rules or possible card rules. All card information is copyrighted by Wizards of the Coast. Demystify is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. Demystify is not produced, endorsed, supported, or affiliated with Wizards of the Coast. 9) THANKS Thanks go to Terence Parr and the ANTLR project (without which this project would be much more difficult), and to Wizards of the Coast (for making Magic: The Gathering, without which this project would not exist).
Hello,
I'm trying to study this project. Currently, I've cloned the repo, installed ANTLR 3, and tried to run it. Unfortunately, it looks like the Scryfall API has changed over the years so some of your codes don't run anymore. Can you help me look into this issue?
Regards,
Yawgatog no longer provides a text Oracle collection. Viable options seem to be MTGJSON4 and Scryfall, which the former partially depends upon. Scryfall provides a link to a bulk json file in its API docs while MTGJSONv4 provides its own bulk json files on its main page.
Either way, it's also worth considering storing the database in json format, and locally cached (instead of checked-in). This will require rewriting the Card constructor and ditching much of the data.py update logic.
Sorry if I missed something obvious, but what should I type after python3 demistify.py load -i
to see a parse tree or AST for a given card?
Not really an issue per se; I was looking through the docs and was hoping to find an example of a parsed card, especially the rules text. I'm looking into building a parser for rules text and found your project, I just can't tell if it'll produce the sort of output I'm looking for. For example, what does the parsed version of Liliana of the Veil look like?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.