Light

shaarli / netscape-bookmark-parser Goto Github PK

View Code? Open in Web Editor NEW

This project forked from kafene/netscape-bookmark-parser

21.0 21.0 10.0 227 KB

PHP library to parse Netscape bookmark files

Home Page: https://packagist.org/packages/shaarli/netscape-bookmark-parser

License: MIT License

PHP 71.38% Makefile 0.87% HTML 27.75%

bookmarks chrome-bookmarks firefox-bookmarks netscape netscape-bookmark netscape-bookmark-file parser

netscape-bookmark-parser's Introduction

The personal, minimalist, super fast, database-free, bookmarking service.

Do you want to share the links you discover? Shaarli is a minimalist link sharing service that you can install on your own server. It is designed to be personal (single-user), fast and handy.

Quickstart

Demo

You can use this public demo instance of Shaarli. It runs the latest development version of Shaarli and is updated/reset daily.

License

Shaarli is Free Software. See COPYING for a detail of the contributors and licenses for each individual component.

netscape-bookmark-parser's People

Contributors

Stargazers

Watchers

Forkers

virtualtam rajasekharponakala bytefather ronanchilvers frozire arthurhoaro aklump slowdream rokde kovah

netscape-bookmark-parser's Issues

Tests: add coverage for Diigo parsing

Add PHPUnit & Travis support

TODO:

add PHPUnit configuration
add a basic test
add basic Travis support

Support multi-words separated by a whitespace

There is no rule on how tags should be separated (and for anything for that matter), and we have a few unit tests where tags are actually separated by a whitespace.

However, it seems that the comma is a common separator for tags.
Regarding shaarli/Shaarli#594, I suggest that if tags contains a comma, we use the comma as a separator, otherwise we use the whitespace.

Add test coverage for browser exports

Add test coverage for parsing bookmark exports from popular browsers:
- #21 Firefox
- #22 Chromium / Chrome
- #23 IE

Tags case should be preserved

X-issue from shaarli/Shaarli#1726 - quoting @cy7yz2rj:

Shaarli is case insensitive for tags but should be case preserving when importing.

Steps for reproducing:

export bookmarks to HTML file from Tools > Export Database
delete all bookmarks
import the file from step 1
all tags are now lowercase despite having correct case information in the HTML file

Firefox correctly preserves case when importing the same bookmark file.

Include composer.json in release?

The release zip files and tarballs (such as https://github.com/shaarli/netscape-bookmark-parser/archive/v1.0.1.tar.gz) don't include composer.json.

I think it should be included in releases, so that composer install will work. Also Debian has tools for packaging PHP libraries (pkg-php-tools) that use Composer.

Why is there no rule to match icon

use composer require shaarli/netscape-bookmark-parser
no rule to match 'icon'

download zip
There are matching rules about ‘icon‘’

First link not imported with the old export format

I exported my database from Shaarli v0.6.4, so before you refactored the bookmark exporting.

In this version, the first link is on the same line as the title:

<!DOCTYPE NETSCAPE-Bookmark-file-1>
<!-- This is an automatically generated file.
     It will be read and overwritten.
     DO NOT EDIT! -->
<!-- Shaarli all bookmarks export on 2016/08/08 10:04:26 -->
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">
<TITLE>Bookmarks</TITLE>
<H1>Bookmarks</H1><DT><A HREF="https://gist.github.com/rodrigomuniz/8408734" ADD_DATE="1470640652" PRIVATE="1" TAGS="css">A Pen by Rodrigo Muniz.</A>
<DD>simple on/off button 
<DT><A HREF="https://www.addedbytes.com/articles/for-beginners/url-rewriting-for-beginners/" ADD_DATE="1469950052" PRIVATE="0" TAGS="apache">URL Rewriting for Beginners - Web Development in Brighton - Added Bytes</A>

When I try to import back this file (using your PR with this parser), the first link is not imported. We would expect it to be able to import files from Shaarli older version.

Debian packaging

Goal

provide a Debian package to enable packaging Shaarli > 0.8

TODO

this repo:
- #27 keep Composer metadata
- add a CHANGELOG.md
create/register a Debian package/project
- add Debian-specific metadata
- automate metadata generation (copyright, changelog, etc.)

Resources

Meta:

http://keepachangelog.com/en/0.3.0/

Shaarli package:

Debian packaging:

Debian tools for PHP packaging:

Debian CI:

Tests: add coverage for Delicious parsing

Full rewrite with a lexer/parser grammar

It would be more elegant and maintainable to describe the Netscape variants as a grammar, and generate/write a corresponding tokenizer/lexer/parser to process data.

Inspired by:

See:

Improve packaging & publish to Packagist

TODO:

add Composer metadata
- authors
- supported PHP versions
- version
~~kafene#6 - tag and release a new version under kafene/netscape-bookmark-parser~~
create a Packagist account for Shaarli community packages
#26 - update Composer metadata to use the shaarli/netscape-bookmark-parser namespace
tag and release a new version
create the actual package entry
generate the package archive
setup a GitHub -> Packagist webhook

See:

Using with Laravel

Hello! I'm trying to use this in one of my Service classes in a Laravel project, but I can't seem to get it to work.

I've run

composer require shaarli/netscape-bookmark-parser

I've added to composer.json:

"autoload": {
    "psr-4": {
        "App\\": "app/",
        "Database\\Factories\\": "database/factories/",
        "Database\\Seeders\\": "database/seeders/",
        "Shaarli\\NetscapeBookmarkParser\\":"vendor/shaarli/netscape-bookmark-parser/"
    }
},

I've run

composer dump-autoload

The following line of code yields the proceeding Symfony \ Component \ ErrorHandler \ Error \ FatalError error:

$parser = new \Shaarli\NetscapeBookmarkParser\NetscapeBookmarkParser();

Cannot declare class NetscapeBookmarkParser, because the name is already in use

I've tried some other approaches with no success. If I remove (or change) the autoload line from composer, then the error is that the class NetscapeBookmarkParser couldn't be found.

Anyone know what I'm doing wrong here? Thanks!

4.0.0 Undefined variable: time

netscape-bookmark-parser/NetscapeBookmarkParser.php

Line 232 in c92c6c7

return $time;

Support multiline descriptions

"ksort() expects parameter 1 to be array, null given" when trying to parse an empty bookmarks file

When trying to import an empty bookmarks file, the process fails with a PHP error:

ksort() expects parameter 1 to be array, null given at /app/vendor/shaarli/netscape-bookmark-parser/NetscapeBookmarkParser.php:203

Instead of this behavior, the parser should either throw a custom exception, or simply return an empty array. The former would be preferable to be as explicit as possible.

Simplify data cleanup

Tests: add coverage for Shaarli parsing

ADD_DATE does not support Unix epochs

Include massive BC breaking changes into Shaarli repository or maintain my own fork ?

Hi Arthur,

I recently happen to need a php parser for a bookmark file and stumped upon Shaarli's repository.
I cleaned up as much code as I could without introducing any backward compatibility breaks so that Shaarli community can benefit from improvements as well.

But, I plan to make the parser more compatible and easier to use in my Symfony project, and I'm afraid this will introduce a lot of BC breaking changes.

Would you be willing to take hold of the significant workload needed for a hypothetical 4.0 integration into Shaarli project, or do you believe I should create and maintain my own fork ?

Here are the changes I am planning to introduce:

Rename class to "NetscapeBookmarkDecoder" for nomenclature consistency.
'parseBoolean' method should always return a boolean value / refactor tests accordingly.
All methods from parser should be private except for entrypoint / remove tests accordingly.
Remove dependency on Katzgrau\KLogger and remove logging altogether.
Remove useless "parseFile" method which violates Single Responsibility Principle / update documentation accordingly.
Replace parameters from constructor with "$context" array and class constants: This will drastically change class signature but will allow for Dependency Injection and Symfony autowiring.

And finally:

Make parser implement "Symfony\Component\Serializer\Encoder\DecoderInterface". I understand this change would create an unwanted dependency on Symfony Serializer but I am willing to make a version for Shaarli if necessary.
Rename "parseString" method to "decode" for nomenclature consistency and Inferface conformity.
Create "supportsDecoding" method for Inferface conformity and Symfony services integration.

New features (maybe):

Use Data Transer Object design pattern for bookmark.
Include "ICON" property.
Create "NetscapeBookmarkEncoder" to return bookmark file from array.
Rename properties to be (mostly) compatible with "https://schema.org" structured data vocabulary:
- 'uri' => 'url'
- 'title' => 'name'
- 'icon' => 'image'
- 'note' => 'description'
- 'tags' => 'tags'
- 'time' => 'dateCreated'
- 'pub' => 'published'

Each tab in bookmark description is replaced by a single space

Issue

When parsing a bookmark whose description contains tabs (tabulation character), each one is replaced by a single space.

This could break formatting if the description contains formatted text, code examples, or terminal output.

Importing descriptions

I had a problem importing descriptions from an old scuttle installation.

Is it possible that the regexp pattern is too specific? I had to change the pattern in line 148
https://github.com/shaarli/netscape-bookmark-parser/blob/master/NetscapeBookmarkParser.php#L148

from
/note="(.*?)"<\/a>/i
to
/note="(.*?)"/i

ie. removing the requirement for the description to be immediately followed by the closing anchor-tag. (my export had attributes like the tags and the date etc in between the description and the closing anchor.

also, the description in the scuttle export is called "description", instead of "note". so maybe the parser could check for both versions?

Ensure PHP 5.3 -> 7 compatibility

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.