flipz357 / smatchpp Goto Github PK
View Code? Open in Web Editor NEWA package for handy processing of semantic graphs such as AMR, with a special focus on standardized evaluation
License: GNU General Public License v3.0
A package for handy processing of semantic graphs such as AMR, with a special focus on standardized evaluation
License: GNU General Public License v3.0
It could be useful if we could retrieve all scores from the bootstrapping. This can be useful when we want to compare multiple systems for significancy. Since you are using scipy bootstrap, I think youcan just optionally also return "bootsrap_distribution".
Secondly, for reproducibility, it might be a good idea to allow the option to provide a random state (fixed seed) which is then passed to scipy's bootstrap function (random_state
parameter).
I am running this command
python -m smatchpp -a $1 \
-b $2 \
-solver ilp \
-edges dereify \
-score_dimension main \
-score_type micromacro \
-log_level 20 \
-output_format json \
--bootstrap \
--remove_duplicates
And getting this output
-------------------------------
-------------------------------
---------Micro scores----------
-------------------------------
-------------------------------
{
"main": {
"F1": {
"result": 27.12,
"ci": [
24.08,
30.51
]
},
"Precision": {
"result": 26.44,
"ci": [
22.81,
30.18
]
},
"Recall": {
"result": 27.84,
"ci": [
24.29,
31.72
]
}
}
}
-------------------------------
-------------------------------
---------Macro scores----------
-------------------------------
-------------------------------
{
"main": {
"F1": {
"result": 27.24,
"ci": [
23.99,
30.41
]
},
"Precision": {
"result": 28.61,
"ci": [
24.83,
32.7
]
},
"Recall": {
"result": 28.84,
"ci": [
24.92,
32.74
]
}
}
}
This is not json :)
Hello
I've just reimplemented my neural network training pipeline and instead of smatch
I am using smatchpp
. Overall this works great, so thank you for your work!
Unfortunately however I sometimes get a terminal error that disrupts the whole training loop and it cannot be recovered. I have also reported this here. I do not know how to debug this so I am wondering/hoping that you have experience with a similar issue when using mip
during testing your library.
This is the error trace, but I can't figure out how to read it. Is mip
the trigger, or is torch
the trigger? Does it have to do with distributed training? How can I debug this? A lot of questions... So if you have any insights, they are very welcome because this is stopping me from using it in my code as it completely destroys the training progress. smatch
does not rely on mip as far as I know.
ERROR while running Cbc. Signal SIGABRT caught. Getting stack trace.
/home/local/vanroy/multilingual-text-to-amr/.venv/lib/python3.11/site-packages/mip/libraries/cbc-c-linux-x86-64.so(_Z15CbcCrashHandleri+0x119) [0x7f5f955c3459]
/lib64/libc.so.6(+0x54df0) [0x7f6697654df0]
/lib64/libc.so.6(+0xa154c) [0x7f66976a154c]
/lib64/libc.so.6(raise+0x16) [0x7f6697654d46]
/lib64/libc.so.6(abort+0xd3) [0x7f66976287f3]
/lib64/libstdc++.so.6(+0xa1a01) [0x7f66938a1a01]
/lib64/libstdc++.so.6(+0xad37c) [0x7f66938ad37c]
/lib64/libstdc++.so.6(+0xad3e7) [0x7f66938ad3e7]
/lib64/libstdc++.so.6(+0xad36f) [0x7f66938ad36f]
/home/local/vanroy/multilingual-text-to-amr/.venv/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so(_ZN4c10d16ProcessGroupNCCL8WorkNCCL15handleNCCLGuardENS_17ErrorHandlingModeE+0x278) [0x7f64d9cbd4d8]
/home/local/vanroy/multilingual-text-to-amr/.venv/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so(_ZN4c10d16ProcessGroupNCCL15workCleanupLoopEv+0x19f) [0x7f64d9cc102f]
/lib64/libstdc++.so.6(+0xdb9d4) [0x7f66938db9d4]
/lib64/libc.so.6(+0x9f802) [0x7f669769f802]
/lib64/libc.so.6(+0x3f450) [0x7f669763f450]
Hi, many thanks for providing this convenient tool to calculate SMATCH++! I had a problem when running the script. When comparing the following two graphs, I got an error below. Could you please offer some information on the solution? Thanks!
(s10196 / time.n.08
:member-of (b10194 / box
:member (s10193 / "Oxford country"
:Part s10195)
:member (s10195 / city.n.01
:Name s10196)))
(b0 / "box"
:member (s0 / "city.n.01"
:Name "Bilbao"
:Part-of (s3 / "country.n.02"
:Name "Basque Country"))
:member (s1 / "be.v.01"
:Theme s0
:Time (s2 / "time.n.08"
:EQU "now"))
:member s2
:member s3)```
``File "MA_Thesis/SBN-evaluation-tool/2.evaluation-tool-detail/smatchpp/smatchpp/data_helpers.py", line 120, in _string2graph
triple = (tmpsrc[nested_level], tmprel[nested_level], tgt)
KeyError: 3``
For meaningfully evaluating any form of graph without any pre-processing on the graphs, we can call:
python -m smatchpp -a <graphs1> \
-b <graphs2> \
-solver ilp \
-score_type micromacro \
--bootstrap \
Currently, to achieve best eval practice for AMR, we call:
python -m smatchpp -a <graphs1> \
-b <graphs2> \
-solver ilp \
-syntactic_standardization dereify \
-score_type micromacro \
--bootstrap \
--remove_duplicates \
So we note that there are some general commands that may be useful for ALL graphs (optimizer, bootstrap) but also more or less specific AMR arguments (dereify, maybe also remove duplicates).
Proposal would be to disallow all very AMR specific arguments, and just pass an argument like -graph_type amr
that loads a best-practice AMR-specifc standardization model that must be defined somewhere (e.g., maybe just in a standardizer object similar to as is the case already). Then we can access best-practice AMR eval more simply as:
python -m smatchpp -a <graphs1> \
-b <graphs2> \
-solver ilp \
-graph_type amr \
-score_type micromacro \
--bootstrap \
This would also highlight how to customize well for other potential kinds of graphs.
Thanks for this library!
It would be great if the API had some easy access point for scoring in Python, like
score(graph_a: str, graph_b: str)
: scoring two penman graphs against each otherscore_all(graphs_a: List[str], graphs_b: List[b])
: scoring two corpora against each otherThat should make it easier to use the library during machine learning experiments, for instance. It is currently not clear how that can be done.
There are some Python examples here but for someone who does not know the implementation details it is hard to know what to use. In other words, how can I do the same commands as written here but directly in Python and not with files but with in-memory objects (lists of penmans)?
Thanks
Hi
after creating an issue for https://github.com/snowblink14/smatch I found a case where Smatch and Smatchpp score very differently, and I'm not sure which is the ideal/correct score:
# ::snt The boy is a hard worker.
(p / person
:domain (b / boy)
:ARG0-of (w / work-01
:manner (h / hard)))
and `
(w / worker
:mod (h / hard)
:domain (b / boy))
give Precision: 0.5000, Recall: 0.6667, F-score: 0.5714 with Smatch and
F1: 42.86 Precision: 37.5 Recall: 50.0 with Smatchpp (with hillclimber and ilp)
Which score is correct ?
Transformed in triples (S,P,O) the two graphs correspond to
TOP :top p
p :instance person
b :instance boy
w :instance work-01
h :instance hard
p :domain b
w :ARG0 p
w :manner h
(8 edges)
and
TOP :top w
w :instance worker
h :instance hard
b :instance boy
w :mod h
w :domain b
(6 edges)
Smatch (-v) aligns p(person)-w(worker) b(boy)-b(boy) w(work-01)-Null h(hard)-h(hard) (2 correct)
and aligns the incoming relations domain and top (another 2).
that is 4 matches, with 8 in the first and 6 in the second graphs it calculates recall 4/8 (0.5), precision 6/8 (0.66...) and F1 (2*P*R/(R+P)
= 0.5714)
Who is right in your opinion, smatch or smatchpp ? Evaluation of AMR predictions depend on this, for instance a prediction on the AMR3.0 test file scores P: 0.8071, R: 0.8371, F1: 0.8218 with Smatch (micro) and P: 80.5, R: 83.44, F1: 81.94 with SmatchPP (micro).
As also discussed in this issue in the penman library by @goodmami, the license situation of the predicate file is not 100% clear (folks suspect its under public license, but nobody seems to know for sure).
The file is only relevant for advanced finer semantic scoring, so I propose to download it on demand, and remove it from the repository assets.
I think it would be interesting to have the option for some additional useful but optional AMR graph transformations:
They could make sense for some application or parsing diagnostics.
To start with some examples:
Sense2Node
E.g.,
(j / jump-01
:arg0 (f / frog))
would map to
(j / jump
:sense 01
:arg0 (f / frog))
This transformation may be useful for some applications (and something like this could also give us a balanced mix of the concept-as-root and anonymous root issue @goodmami @jheinecke)
AbstractifyNode
E.g.,
(c / city
:name (n / name
:op1 "Berlin"))
would map to
(l / location
:name (n / name
:op1 "Berlin"))
This transformation may be useful for an additional informative parsing evaluation score. We could use the concept groups that I started building here
And so on...
I tried this and it crashed on one dataset but worked on another. Here's the test set that crashed
test-gold.txt and test-pred.txt.
python3 -m smatchpp -a $GOLD \
-b $PRED \
-solver ilp \
-syntactic_standardization dereify \
-score_dimension main \
-score_type micromacro \
-log_level 20 \
--bootstrap \
With the traceback..
...
2023-11-28 16:16:00,233 - __main__ - INFO - bindings - graph pairs processed: 1500; time for last 100 pairs: 2.1990966796875
2023-11-28 16:16:03,175 - __main__ - INFO - bindings - graph pairs processed: 1600; time for last 100 pairs: 2.9419069290161133
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/bjascob/.local/lib/python3.10/site-packages/smatchpp/__main__.py", line 162, in <module>
match_dict, status = SMATCHPP.process_corpus(amrs, amrs2)
File "/home/bjascob/.local/lib/python3.10/site-packages/smatchpp/bindings.py", line 128, in process_corpus
match, tmpstatus, _ = self.process_pair(a, amrs2[i])
File "/home/bjascob/.local/lib/python3.10/site-packages/smatchpp/bindings.py", line 71, in process_pair
g2 = self.graph_reader.string2graph(string_g2)
File "/home/bjascob/.local/lib/python3.10/site-packages/smatchpp/interfaces.py", line 60, in string2graph
triples = self._string2graph(string)
File "/home/bjascob/.local/lib/python3.10/site-packages/smatchpp/data_helpers.py", line 85, in _string2graph
triple = (tmpsrc[nested_level], tmprel[nested_level], stringtok)
KeyError: 5
Looks like you've got an array indexing error.
I don't need this fixed, but since I spotted the issue I thought I'd pass it along.
Smatchpp already can read tsv, but it cannot write tsv yet:
Expected actions
triples = [("ROOT_OF_GRAPH", ":root", "t"),
("t", ":instance", "test")]
from smatchpp.preprocess import TSVWriter
TSVWriter.graph2string(triples)
Expected output of graph2string
:
ROOT_OF_GRAPH t :root
t test :instance
Note that any root should be explicit, since tsv is more general than penman (which always has a implied root).
If I understand the output correctly, when we are bootstrapping we get results like this:
"result": 81.3,
"ci": [
80.67,
81.89
]
I think that the result
is calculated independently, on the full corpus, and ci is the 95% CI min/max. It would be useful to also include the estimated mean based on the bootstrap. As far as I can tell, this is common in research papers too, where you report "85 +- 1.2" where 85 is the estimated mean and 1.2 the CI range with 95% confidence.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.