Giter VIP home page Giter VIP logo

django-spotnet's People

Contributors

alvra avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

django-spotnet's Issues

Header not compatible with spotnet listed as spot

comment on spot listed as spot

Message-id [email protected]
Date: Thu, 6 Dec 2012 18:07:28 -0800 (PST)

Message-id [email protected]
Posted: 2012-12-09 15:00:22

Message-id [email protected]
Posted: 2012-09-04 15:15:53

message not related to Spotnet
Message-id [email protected]
posted: 2012-12-07 14:25:24

Message-id [email protected]
Posted: 2012-12-03 19:15:27

Message-id [email protected]
Posted: 2012-09-03 09:05:39

In the last 150 days there are 240 headers with this problem.

Specs: Latest Spotnet version, example_project, Xampp 1.7.7 (Apache + mysql), XP sp3, python 2.7.3, Djando 1.4.2.

Use XOver instead of HEAD/Article

Hi,

First of all, your code was somewhat an inspiration when designing my own django spotnet app; however, I have made some significant speed improvements.

  1. I dropped the use of header/article checks, main reason for that was that it was slow, and that posts since 2011 and upwards all (and a lot before that ) have all the important information set in the headers you receive when doing xover.

  2. your method of determining the last post date by doing a stat on the last msg id fails, I would personally stay away from it; make a model for the group you try to update, and store it in there (or, as I do it, store the post id and group in the spots itself, so you can always query it)

  3. use the RSA signing method spotweb uses (it took me a while to figure out what was actually happening, but it works great) to verify the validity of a post... same goes for dispose messages; only validated dispose messages should be accepted, and can then be used to directly remove the post in question (and create a marker on the targeted msgid so you won't load it back in later).

  4. you must be going: but what of the data that isn't in the xover data.. simple... load it on demand; the main spotnet application does so too.. it xovers to find the new posts, parses it, stores it, and when you double click on it, it does a head/article call on it to get all the other info (nzb segments, image segments, even comments are done this way).. which means you only store info in your database that you actually use.

  5. I've found personally that using regexes work a lot faster than manually parsing data character by character.. for example, your category parser can also be parsed with this regex:

    re.compile('([abcdz])([0-9]{1,2})')

and the invalid character remover:

    re.compile('[^\x20-\x7F]*')

my current test code manages to parse around 250 items every 3 seconds; or about 250k posts per hour ... now there's a lot of room for improvement, so I wouldn't be surprised if I manage to hit double that in a not-too-distant future... on the same machine with your code I managed to get around 35k items per hour.

I might seem a complete arse for not sharing my code, but it's a private project so I'm not at liberty to discuss actual internals.. but I couldn't resist not helping a fellow django coder out.

NZB can't be retrieved if the NZB is split into multiple segments (because of error in header)

Message-ID: [email protected]
Date: 29 Nov 2012 19:52:27 GMT

<NZB><Segment>[email protected]></Segment><Segment>[email protected]</Segment><Segment>[email protected]</Segment></NZB>

Reason: The spot is added with Spotnet 1.7.4. This client has a bug but is still used for adding spots.
Fix: remove the > from the first segment and add a t to the last segment.

An other spot with working NZB created after combining multiple segments.
Message-ID: [email protected]
NNTP-Posting-Date: 23 Nov 2012 21:05:15 GMT

<NZB><Segment>[email protected]</Segment><Segment>[email protected]</Segment></NZB>

NZB size: 9,48 MB

There are 331 spots in the last 150 days with this problem.

Workaround for a mysql database:

UPDATE `spotnet_post` SET `nzb` = REPLACE(REPLACE(REPLACE(`nzb`, '>', ''), 'spot.ne', 'spot.net'), 'spot.nett', 'spot.net')

Removes > from the first segment + Add t to the last segment + Prevent renaming spot.net to spot.nett
There is probably a better solution. But this seams to work.

update_spotnet error "Incorrect string value"

Added post <[email protected]>
Traceback (most recent call last):
  File "\example_project\manage.py", line 8, in <module>
    execute_from_command_line(sys.argv)
  File "\django\core\management\__init__.py", line 443, in execute_from_command_line
    utility.execute()
  File "\django\core\management\__init__.py", line 382, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "\django\core\management\base.py", line 196, in run_from_argv
    self.execute(*args, **options.__dict__)
  File "\django\core\management\base.py", line 232, in execute
    output = self.handle(*args, **options)
  File "\spotnet\management\commands\update_spotnet.py", line 42, in handle
    conn.update(overwrite, self.logger_function)
  File "\spotnet\connection.py", line 140, in update
    self.update_group(group, overwrite=overwrite, logger=lambda x: logger('  %s' % x))
  File "\spotnet\connection.py", line 179, in update_group
    logger=lambda x: logger('  %s' % x),
  File "\spotnet\connection.py", line 225, in update_group_postnumbers
    logger=lambda x: logger('  %s' % x),
  File "\django\db\transaction.py", line 209, in inner
    return func(*args, **kwargs)
  File "\spotnet\connection.py", line 333, in add_post
    snp.save()
  File "\django\db\models\base.py", line 463, in save
    self.save_base(using=using, force_insert=force_insert, force_update=force_update)
  File "\django\db\models\base.py", line 551, in save_base
    result = manager._insert([self], fields=fields, return_id=update_pk, using=using, raw=raw)
  File "\django\db\models\manager.py", line 203, in _insert
    return insert_query(self.model, objs, fields, **kwargs)
  File "\django\db\models\query.py", line 1593, in insert_query
    return query.get_compiler(using=using).execute_sql(return_id)
  File "\django\db\models\sql\compiler.py", line 910, in execute_sql
    cursor.execute(sql, params)
  File "\django\db\backends\util.py", line 40, in execute
    return self.cursor.execute(sql, params)
  File "\django\db\backends\mysql\base.py", line 114, in execute
    return self.cursor.execute(query, args)
  File "\MySQLdb\cursors.py", line 203, in execute
    if not self._defer_warnings: self._warning_check()
  File "\MySQLdb\cursors.py", line 117, in _warning_check
    warn(w[-1], self.Warning, 3)
_mysql_exceptions.Warning: Incorrect string value: '\xF0\x9F\x9A\x8F S...' for column 'description' at row 1

Converting the colomn "description" from "utf8_unicode_ci" to "utf8_general_ci" doesn't fix the problem.

Suggested solution: Change the column encoding from utf8 to utf8mb4 source
Applied: Converting the colomn "description" from "utf8_general_ci" to "utf8mb4_unicode_ci"
Result: Warning: Incorrect string value: '\xF0\x9F\x9A\x8F S...' for column 'description' at row 1

This is probably my last ticket because i am running out of servers usable for updating Django-Spotnet.

NZB with errors

message-id from the spot [email protected]
Date: 01 Oct 2012 20:59:50 GMT
Some lines starting from error line 1735

1735 <segment bytes="793162" number="5951ent 46179" [email protected]</segment>
1736 <segment3ytes="793392" 1 [email protected]</segment>
1737 <segment bytes="793338" number="29">[email protected]</segment>
1756 <segment bytes="793516" number="60">[email protected]</greb.com</segment>
1766 <segment bytes="793516" number="60"[email protected]</segment>
1781 <segment3ytes="793392" 1 [email protected]">13aweb.com</segment>
1784 <segment bytes="793352" number="3"[email protected]</segment>
1825 <segment bytes="793342" number="[email protected]</segment>
1826 <segment bytes="793495" number="13">[email protected]</segment>
1831 <segment bytes="793338" number="6"[email protected]</segment>

Good NZB :

1735 <segment bytes="793162" number="57">[email protected]</segment>
1736 <segment bytes="793470" number="2">[email protected]</segment>
1737 <segment bytes="793518" number="19">[email protected]</segment>
1756 <segment bytes="793493" number="60">[email protected]</segment>
1766 <segment bytes="793089" number="10">[email protected]</segment>
1781 <segment bytes="793374" number="2">[email protected]</segment>
1784 <segment bytes="793572" number="3">[email protected]</segment>
1825 <segment bytes="793226" number="5">[email protected]</segment>
1826 <segment bytes="792857" number="1">[email protected]</segment>
1831 <segment bytes="793330" number="6">[email protected]</segment>

Reason problem: The NZB is not fully compatible with the Spotnet protocol.
As a result part of the NZB can't be downloaded (in this case 6%).
Some newsreaders can't use the NZB because it isn't a valid XML.

Current solutions:
1- Use a newsreader accepting an invalid NZB.
1- Download the par2 files mentioned in the retrieved NZB for repair.
2- The missing part can be found on usenet index sites by searching with the filename without filetype.
3- Use SpotLite to show the filename and search on sites like Binsearch.
4- Use Spotplanet to retrieve a valid NZB.

Request: Support for this kind of problem spots.
Remark: The current NZB is better than no NZB.

update_spotnet error "Probably not a spotnet post"

Traceback (most recent call last):
  File "\example_project\manage.py", line 8, in <module>
    execute_from_command_line(sys.argv)
  File "\django\core\management
\__init__.py", line 443, in execute_from_command_line
    utility.execute()
  File "\django\core\management
\__init__.py", line 382, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "\django\core\management\base.py", line 196, in run_from_argv
    self.execute(*args, **options.__dict__)
  File "\django\core\management\base.py", line 232, in execute
    output = self.handle(*args, **options)
  File "\spotnet\management\commands\update_spotnet.py", line 42, in handle
    conn.update(overwrite, self.logger_function)
  File "\spotnet\connection.py", line 140, in update
    self.update_group(group, overwrite=overwrite, logger=lambda x: logger('  %s'
 % x))
  File "\spotnet\connection.py", line 179, in update_group
    logger=lambda x: logger('  %s' % x),
  File "\spotnet\connection.py"
, line 225, in update_group_postnumbers
    logger=lambda x: logger('  %s' % x),
  File "\django\db\transaction.py", line 209, in inner
    return func(*args, **kwargs)
  File "\spotnet\connection.py", line 330, in add_post
    snp = Post.from_raw(raw, id=existing_id)
  File "\spotnet\models.py", line 155, in from_raw
    description=raw.description,
  File "\spotnet\post.py", line 468, in description
    "Probably not a spotnet post"
spotnet.post.InvalidPost: Probably not a spotnet post

search currently not working

When using the search with or without a specific category nothing happens.
Is the search not yet working or is my setup missing someting?

Specs: Blue-django-spotnet-2f992c6dee29 from 2012-08-23, example_project, XP sp3, Firefox 17, sqlite3, Django 1.4.2, Python 2.7.3, pip-1.2.1, setuptools-0.6c11, python-dateutil-2.1, six-1.2.0, pycrypto-2.6.win32-py2.7.

update_spotnet error "Post does not have a From header"

Traceback (most recent call last):
  File "\example_project\manage.py", line 8, in <module>    execute_from_command_line(sys.argv)
  File "\django\core\management\__init__.py",line 443,
  in execute_from_command_line
    utility.execute()
  File "\django\core\management\__init__.py",line 382,
  in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "\django\core\management\base.py", line 196,
  in run_from_argv
    self.execute(*args, **options.__dict__)
  File "\django\core\management\base.py", line 232,
  in execute
  output = self.handle(*args, **options)
  File "\spotnet\management\commands\update_spotnet.py", line 42,
  in handle
    conn.update(overwrite, self.logger_function)
  File "\spotnet\connection.py", line 140,
  in update
    self.update_group(group, overwrite=overwrite, logger=lambda x: logger('  %s' % x))
  File "\spotnet\connection.py", line 179,
  in update_group
    logger=lambda x: logger('  %s' % x),
  File "spotnet\connection.py", line 225,
  in
  update_group_postnumbers
    logger=lambda x: logger('  %s' % x),
  File "\django\db\transaction.py", line 209,
  in inner
    return func(*args, **kwargs)
  File "\spotnet\connection.py", line 330,
  in add_post
    snp = Post.from_raw(raw, id=existing_id)
  File "\spotnet\models.py", line 153,
  in from _raw
    poster=raw.poster,
  File "\spotnet\post.py", line 438, in poster

    raise InvalidPost("Post does not have a From header")
spotnet.post.InvalidPost: Post does not have a From header

Skipped spots because of invalid xml data

Spots with & between <Description> </Description> in the header without ![CDATA[ ]] are skipped

message-ID: [email protected]
NNTP-Posting-Date: Thu, 15 Nov 2012 08:47:33 UTC

Skipped invalid post <[email protected]>: Post has invalid XML data for header X-XML: not well-formed (invalid token): line 1, column 238. The problem is &.

An other spot with the same problem:
message-ID: [email protected]
Date: 17 Nov 2012 16:41:49 GMT

Skipped invalid post <[email protected]>: Post has invalid XML data for header X-XML: not well-formed (invalid token): line 1, column 102. The problem is &.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.