Giter VIP home page Giter VIP logo

php-unstructured-text-parser's People

Contributors

aymanrb avatar beriw98 avatar dependabot[bot] avatar fredericseiler avatar germanllop avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

php-unstructured-text-parser's Issues

Handle template without linefeed

Hi,
Thanks for an excellent text parser.

I'm having issues with handling patterns where I like to capture the entire string after a selected word, but stop before linefeed.

Example
Offices: New York, London, Paris Feel free to give us a call

My template:
Offices: {%Offices%} Feel free to give us a call

However the the next line would also be included. The only way to avoid this would be to include "Feel" in the template, but my source files are not consistent in what text will follow the "Offices" line. Is there a way to tell the parser to stop at CR/LF?

Template matching - Needs to be exact character position?

How closely does the template need to match the original text? Perfectly, or are whitespace differences ignored?

For example, my template might look like this:

Name: {%name}

and my parsed text like this:

Name:           Charlie Brown

Would it fail to match because of the difference in whitespace?

That's what I'm currently observing.

Warning: file_get_contents / failed to open stream: Permission denied

Hello, I'm currently using the package it's amazing, but I get a couple of warnings, but the code runs perfectly, tried with different php versions 5.6, 7.0, 7.1 and 7.2 and still got the warning:

Warning: file_get_contents(C:\wamp64\www\ticket-parser/templates.): failed to open stream: Permission denied in C:\wamp64\www\ticket-parser\vendor\aymanrb\php-unstructured-text-parser\src\TextParser.php on line 76

I have access to that folders, it's the same Administrator user.

I did not tried in Linux but I will, any help?
Thanks

Edit:
I've added this to line 76 to validate and don't use 'file_get_contents' on a directory:

if( !is_dir( $fileInfo->getPathname( ) ) ) $templates[$fileInfo->getPathname()] = file_get_contents($fileInfo->getPathname());

Dynamic Template

I'm glad i stumbled upon this very useful plugin . i have one question,can i parse dynamic data. and how do i handle it on the template file ?
Use case i have a template for orders each order has items(products),meaning one order can have 1 product another one can have 3 or more products,how do i parse each product as a variable? whats the best way to handle this?

Extracting 2 consecutive variables from the same file

There must be a way to define 2 consecutive variables in a plain text document or even an HTML one when the 2 parameters we need to extract are one after the other with no defined separator (just a space or new line for example)

Dont Work with UTF8 ?

Not Result with ..

`Lieber Kunde,

Ihre Bestellung hat unser Versandlager verlassen und wurde unserem
Logistikpartner DACHSER übergeben.

Die Sendungsnummer zum Auftrag SO123456 (Referenz: xxxxx ) lautet:
123456789

Unter Eingabe der oben genannten Sendungsnummer können Sie durch Klick
auf den folgenden Link den Status Ihrer Sendung einsehen:
DACHSER Kontrollinformationen
https://elogistics.dachser.com/?66fghj

Für Rückfragen stehe ich Ihnen natürlich jederzeit gerne zur Verfügung.

Wichtiger Hinweis zur Warenannahme:
Trotz aller Sorgfalt kann es leider vorkommen, dass die Ware auf dem
Transportweg zu Ihnen Schäden abbekommt.
Prüfen Sie die Verpackung und die Ware daher unbedingt bei Anlieferung
auf Transportschäden, in Anwesenheit des Spediteurs! Jeder Spediteur ist
dazu verpflichtet, die Sichtprüfung abzuwarten. Wenn eine Beschädigung
der Verpackung oder der Ware ersichtlich ist, ist diese Beschädigung mit
einer kurzen Beschreibung, was genau beschädigt ist, auf dem Frachtbrief
zu vermerken und vom Fahrer bestätigen zu lassen. Danach nehmen Sie
bitte schnellstmöglich Kontakt mit uns auf!
Gemeldete Transportschäden ohne einen Vermerk auf den Frachtpapieren
oder verspätet gemeldete Transportschäden können nicht ersetzt werden!

Schöne Grüße aus Bremen`

Doubt php-unstructured-text-parser

First of all thanks for giving such a wonderful plugin. i have an small doubt that how can i parse dynamic data. now all the template is in static format ? can u pls advice for this ?

Parse only new files in a directory

I want to only parse text files that have not be parsed before(from the folder with text files- newer files),how can I best approach this

My Template Doesn't Work

My template is here.

{%title%}<br />
Türkçe Adı:{%name%}<br />
Soyadı:{%surname%}<br />
Telefon :{%phonenumber%}<br />
Faks:{%zip%}<br />
E-Posta:{%email%}<br />
Ağ:{%website%} <br />
Şirket:{%company%}<br />
Ülke: {%country%}<br />
<br />
{%description%}

And our input here.

020. Tunus'tan safran yağı alım talebi
Tunus'tan safran yağı alım talebi

Türkçe Adı: Zayani Karim
Soyadı:
Telefon: 0021625453108
Faks:
E-Posta: [email protected]
Ağ:
Şirket: الجاذبية للاستيراد والتصدير Yerçekimi İthalat ve İhracat
Ülke: Tunus

اريد شراء زيت عطر الزعفران
Safran yağı almak istiyorum

why doesn't work?

base64 encoded email -> plaintext issues

Hey,

Not sure if you can assist with this but figured I'd post an issue in case you have some insight.

I've been using this awesome library for some time [thanks!] but recently ran into some issues with a client that was sending emails encoded in base64 format, and having an issue matching the decoded (plaintext) message to any templates.

I've never had to decode base64 encoded emails until recently, and discovered that simply using imap_base64 wasn't working in terms of getting them to plaintext, and that the decoded message was in HTML format.

So I've attempted to make use of the html2text library in order to convert the decoded base64 messages and remove the HTML formatting, however, the php-unstructured-text-parser doesn't seem to be matching any of the data defined within the template when running it against a data variable that contains the plaintext data formatted by html2text.

However, if I parse an actual email composed with the body of the data created/formatted by this html2text library, it does work.. so I'm somewhat left scratching my head here as to why it won't work when comparing against this same data [stored in a variable].

I'm thinking of just shooting another email out into the queue for parsing (composed of the data generated by html2text) and re-parsing it that way, but this isn't ideal and if you have any suggestions on where I can improve this (or improvements to what I've descibed), let me know!

Also note that I'm using dev-master branch of the library with my application, since I too was having similar issues parsing plaintext emails and having linebreaks ignored in my templates.

Thanks!

Advice on handling Complex Extract Template

Hello,

I have gotten the following text (12345678 John Anthony Doe) with the names being able to vary, I have tried the template ({%id%} {%name%}) however I end up with the values 12345678 John Anthony and Doe

Do you have any advice on how I can code the template to take the number alone and the rest of the text string?

Thank you in advance

Incompatible with Laravel 10

Thanks for your package, it looks very useful :)

Unfortunately we've been attempting to use it with Laravel 10 but it gets hung up on the following dependencies :

 - Root composer.json requires aymanrb/php-unstricted-text-parser ^2.3 -> satisfiable by aymanrb/php-unstricted-text-parser[v2.3.0].
 - aymanrb/php-unstricted-text-parser v2.3.0 requires psr/log ^1.1 -> found psr/log[1.1.0, …, 1.1.4] but the package is fixed to 3.0.0 (lock file version) by a partial update and that version does not match. Make sure you list it as an argument for the update command.   
- monolog/monolog 3.3.1 requires psr/log ^2.0 || ^3.0 found psr/log[2.0.0, 3.0.0] but the package is fixed to 1.0.0 (lock file version) by a partial update and that version does not match. Make sure you list it as an argument for the update command. 
- laravel/framework v10.11.0 requires monolog/monolog ^3.0 satisfiable by monolog/monolog[3.3.1].
- laravel/framework is locked to version v10.11.0 and an update of this package was not requested.

I've bumped the dependency locally and it seems to work, will attach a pull request to correct this for your consideration

change default logs dir

Please add possibility to change default logs dir. Maybe in constructor as a optional value or in set method

Exclude 'dot' directories from templates lookup

In the fix implemented for issue #10 (released in version 1.2.2) the iterators ignored looking up for "dot" directories fetched by the "DirectoryIterator" injected for template files lookup.

It is better to use FilesystemIterator instead that would avoid the faulty directory lookup as attempted by @carriera in PR #15.

Template not Working

Hello,

$parser = new aymanrb\UnstructuredTextParser\TextParser('application/controllers/template');
$textToParse = 'hallo das ist ein test 4126552 und hier geht es weiter DFGHJKL mit dem Text ...';

//file_put_contents('application/controllers/template/test2.txt',$ordner['body']['html']);

print_r(
	$parser
		->parseText($textToParse,false )
		->getParsedRawData()
);

Template:
hallo das ist ein test {%name%} und hier geht es weiter {%namfe%} mit dem Text ...

Return empty Array
What am I doing wrong?

Mac OS M1
PHP 7.4

Using regex thows error

Hi there,

Just realized that you released an update a few years ago that allows you to use a regex when targeting data to parse, however, when I try to utilize this, the script appears to be throwing an error:

 Got error 'PHP message: PHP Warning:  preg_match(): Compilation failed: quantifier does not follow a repeatable item at offset 249 in /vendor/aymanrb/php-unstructured-text-parser/src/TextParser.php on line 68
PHP message: PHP Fatal error:  Uncaught TypeError: array_keys() expects parameter 1 to be array, null given in /vendor/aymanrb/php-unstructured-text-parser/src/TextParser.php

I am using $parser->parseText($message)->getParsedRawData(); in conjunction with this, if that helps.

And simply testing trying to extract a phone number from the text, something like +17785542644 using a variable with regex such as {%customer_phone:^\+\d{1,15}$%}

Using a plain variable such as {%customer_phone%} has no issue, only when I attempt to use a regular expression.

Let me know if you have any insights! Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.