Comments (14)
I attached a log file, not sure if it helps - been stuck on this one for a
while.
Original comment by [email protected]
on 14 Dec 2009 at 4:03
from gwtwiki.
Did you try to increase the recursion limit:
in info.bliki.wiki.model.Configuration.java set:
PARSER_RECURSION_LIMIT = 30;
Original comment by [email protected]
on 14 Dec 2009 at 9:22
from gwtwiki.
yes, changed it to 32, I noticed that it was now parsing all of the
expressions, the
recursion level stayed well below 32, but still is tuck in an infinite loop.
Original comment by [email protected]
on 14 Dec 2009 at 9:35
from gwtwiki.
This looks like a circular parsing of one template causing a parsing of another
template which in turn causes parsing of the first template, so I don't think
this
has something to do with the expressions, instead the template parser
instantiation
mechanism.
Some kind of protection/limit may be needed to eventually abort having the
TemplateParser create instances of itself for further parsing.
Original comment by [email protected]
on 14 Dec 2009 at 9:41
from gwtwiki.
[deleted comment]
from gwtwiki.
Found a solution, simple limit to template recursive calls by keeping
state/count in
the WikiModel. Patch/diff with trunk 583 attached. Please ignore the log4j
stuff.
Just an FYI, I am using your parser on 5M+ wikipedia topics, naturally there is
lot of
topics with malformed syntax and other extensions/markup which could
potentially send
parser into infinite loops. I think it makes sense to have a hard stop
limit/protections built into the parser.
Original comment by [email protected]
on 15 Dec 2009 at 5:23
Attachments:
from gwtwiki.
I think something more sophisticated like this must be implemnted:
http://en.wikipedia.org/wiki/Wikipedia:Template_limits
Original comment by [email protected]
on 15 Dec 2009 at 5:11
from gwtwiki.
Original comment by [email protected]
on 15 Dec 2009 at 5:11
- Changed state: Accepted
from gwtwiki.
Original comment by [email protected]
on 15 Dec 2009 at 5:11
from gwtwiki.
Axel, attaching the patch file for my changes against the trunk rev. 931.
Original comment by [email protected]
on 28 Jan 2010 at 7:02
Attachments:
from gwtwiki.
attached files, changes tagged as EXPERIMENTAL
Original comment by [email protected]
on 31 Jan 2010 at 4:49
Attachments:
from gwtwiki.
For those interested, some context:
Trying to put 5M+ topics into jamwiki has been a huge learning experience to
say the
least. As a result, given the vast variation of topic markup/syntax in
wikipedia
content having parsing limits is critical to maintaining operational
performance.
As you know, in java there is no clean way to abort a thread, so when parsing a
topic, a runaway parser can destroy tomcat/glassfish as it causes the request
processing threads to get stuck in infinite loops and it only takes few bad
topic
requests to take the whole server down. I implemented caching architecture
similar
to the one used in wikipedia.org, however when building cached data the parser
performance and limits are still important.
I did manage to get control on this problem through my modifications to
WikiScanner,
TemplateParser, and AbstractParser classes of the bliki parser, where I put
limits
on the number of recursive calls, recursion depth, the size of certain buffers,
and
finally I try to measure the total parsing time and break out if possible. If
you
like I can send you an updated diff file with my changes against your current
trunk.
FYI, I am using the bliki parser, and a sandbox version of jamwiki performance
branch can be found at http://www.uniblogger.com
Original comment by [email protected]
on 31 Jan 2010 at 4:52
from gwtwiki.
Implemented the changes in revision:
http://code.google.com/p/gwtwiki/source/detail?r=935
Original comment by [email protected]
on 31 Jan 2010 at 6:34
from gwtwiki.
Original comment by [email protected]
on 13 May 2010 at 3:29
- Changed state: Fixed
from gwtwiki.
Related Issues (20)
- ConcurrentEditException while importing HTML page to my mediawiki HOT 1
- Does not recognize image tags with non-ascii characters HOT 2
- Problem when a language uses a mix of native and English tags HOT 2
- WikiXMLParser doesn't read bzip2 files directly HOT 2
- Jar File ?
- iframe incorrect translation HOT 2
- CmContinue Empty for XMLCategoryMembersParser HOT 2
- version 3.0.19 renderns invalid html for TOC HOT 2
- Login Broken
- Bold Wiki Text does not end properly HOT 3
- Problem in converting MediaWiki text to Plain text HOT 1
- It is impossible to create new page with Connector.edit() method HOT 1
- make hide Table of Contents (toc) convenient HOT 2
- Error on rendering Wiki to HTML of some pages HOT 4
- toc generate not valid anchor HOT 3
- Html to wiki for embedded unordered lists adds an additional list item HOT 1
- line breaks removed after stripping wikipedia markup language
- Nested Links Are Not Resolved Correctly
- Slashes in titles should be escaped (?)
- Scribunto (Luaj scripting for Wikipedia) support
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gwtwiki.