Giter VIP home page Giter VIP logo

diff-match-patch's Introduction

diff-match-patch

npm package for https://github.com/google/diff-match-patch

Build Status Dependency Status NPM version Known Vulnerabilities

Installation

npm install diff-match-patch

API

Source

Initialization

The first step is to create a new diff_match_patch object. This object contains various properties which set the behaviour of the algorithms, as well as the following methods/functions:

diff_main(text1, text2) → diffs

An array of differences is computed which describe the transformation of text1 into text2. Each difference is an array (JavaScript, Lua) or tuple (Python) or Diff object (C++, C#, Objective C, Java). The first element specifies if it is an insertion (1), a deletion (-1) or an equality (0). The second element specifies the affected text.

diff_main("Good dog", "Bad dog") → [(-1, "Goo"), (1, "Ba"), (0, "d dog")]

Despite the large number of optimisations used in this function, diff can take a while to compute. The diff_match_patch.Diff_Timeout property is available to set how many seconds any diff's exploration phase may take. The default value is 1.0. A value of 0 disables the timeout and lets diff run until completion. Should diff timeout, the return value will still be a valid difference, though probably non-optimal.

diff_cleanupSemantic(diffs) → null

A diff of two unrelated texts can be filled with coincidental matches. For example, the diff of "mouse" and "sofas" is [(-1, "m"), (1, "s"), (0, "o"), (-1, "u"), (1, "fa"), (0, "s"), (-1, "e")]. While this is the optimum diff, it is difficult for humans to understand. Semantic cleanup rewrites the diff, expanding it into a more intelligible format. The above example would become: [(-1, "mouse"), (1, "sofas")]. If a diff is to be human-readable, it should be passed to diff_cleanupSemantic.

diff_cleanupEfficiency(diffs) → null

This function is similar to diff_cleanupSemantic, except that instead of optimising a diff to be human-readable, it optimises the diff to be efficient for machine processing. The results of both cleanup types are often the same.

The efficiency cleanup is based on the observation that a diff made up of large numbers of small diffs edits may take longer to process (in downstream applications) or take more capacity to store or transmit than a smaller number of larger diffs. The diff_match_patch.Diff_EditCost property sets what the cost of handling a new edit is in terms of handling extra characters in an existing edit. The default value is 4, which means if expanding the length of a diff by three characters can eliminate one edit, then that optimisation will reduce the total costs.

diff_levenshtein(diffs) → int

Given a diff, measure its Levenshtein distance in terms of the number of inserted, deleted or substituted characters. The minimum distance is 0 which means equality, the maximum distance is the length of the longer string.

diff_prettyHtml(diffs) → html

Takes a diff array and returns a pretty HTML sequence. This function is mainly intended as an example from which to write ones own display functions.

match_main(text, pattern, loc) → location

Given a text to search, a pattern to search for and an expected location in the text near which to find the pattern, return the location which matches closest. The function will search for the best match based on both the number of character errors between the pattern and the potential match, as well as the distance between the expected location and the potential match.

The following example is a classic dilemma. There are two potential matches, one is close to the expected location but contains a one character error, the other is far from the expected location but is exactly the pattern sought after: match_main("abc12345678901234567890abbc", "abc", 26) Which result is returned (0 or 24) is determined by the diff_match_patch.Match_Distance property. An exact letter match which is 'distance' characters away from the fuzzy location would score as a complete mismatch. For example, a distance of '0' requires the match be at the exact location specified, whereas a threshold of '1000' would require a perfect match to be within 800 characters of the expected location to be found using a 0.8 threshold (see below). The larger Match_Distance is, the slower match_main() may take to compute. This variable defaults to 1000.

Another property is diff_match_patch.Match_Threshold which determines the cut-off value for a valid match. If Match_Threshold is closer to 0, the requirements for accuracy increase. If Match_Threshold is closer to 1 then it is more likely that a match will be found. The larger Match_Threshold is, the slower match_main() may take to compute. This variable defaults to 0.5. If no match is found, the function returns -1.

patch_make(text1, text2) → patches

patch_make(diffs) → patches

patch_make(text1, diffs) → patches

Given two texts, or an already computed list of differences, return an array of patch objects. The third form (text1, diffs) is preferred, use it if you happen to have that data available, otherwise this function will compute the missing pieces.

patch_toText(patches) → text

Reduces an array of patch objects to a block of text which looks extremely similar to the standard GNU diff/patch format. This text may be stored or transmitted.

patch_fromText(text) → patches

Parses a block of text (which was presumably created by the patch_toText function) and returns an array of patch objects.

patch_apply(patches, text1) → [text2, results]

Applies a list of patches to text1. The first element of the return value is the newly patched text. The second element is an array of true/false values indicating which of the patches were successfully applied. [Note that this second element is not too useful since large patches may get broken up internally, resulting in a longer results list than the input with no way to figure out which patch succeeded or failed. A more informative API is in development.]

The previously mentioned Match_Distance and Match_Threshold properties are used to evaluate patch application on text which does not match exactly. In addition, the diff_match_patch.Patch_DeleteThreshold property determines how closely the text within a major (~64 character) delete needs to match the expected text. If Patch_DeleteThreshold is closer to 0, then the deleted text must match the expected text more closely. If Patch_DeleteThreshold is closer to 1, then the deleted text may contain anything. In most use cases Patch_DeleteThreshold should just be set to the same value as Match_Threshold.

Usage

import DiffMatchPatch from 'diff-match-patch';

const dmp = new DiffMatchPatch();
const diff = dmp.diff_main('dogs bark', 'cats bark');

// You can also use the following properties:
DiffMatchPatch.DIFF_DELETE = -1;
DiffMatchPatch.DIFF_INSERT = 1;
DiffMatchPatch.DIFF_EQUAL = 0;

License

http://www.apache.org/licenses/LICENSE-2.0

diff-match-patch's People

Contributors

forbeslindesay avatar greenkeeper[bot] avatar jackub avatar wbinnssmith avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

diff-match-patch's Issues

Documentation/Examples

Hi, when I try to access to https://code.google.com/p/google-diff-match-patch/wiki/API I get this:

SVN hosting has been permanently disabled.

Where can I find the documentation?

Thanks

Uncaught ReferenceError: module is not defined

We have been receiving the following error:
index.js:2214 Uncaught ReferenceError: module is not defined at index.js:2214:1

Steps to replicate:

  • run npm install diff-match-patch
  • import index.js in the main app '/node_modules/diff-match-patch/index.js', which is added to the plain <script type="text/javascript" charset="UTF-8" src="http://localhost:1962/node_modules/diff-match-patch/index.js"></script>

Can't instantiate DiffMatchPatch

Hello, I can't use the object DiffMatchPatch. Here is my code

import {Component, Input, OnInit} from '@angular/core';
import {ActService} from "../../../../services/act.service";
import {DialogService} from "../../../../dialog-service/dialog.service";
import {Router} from "@angular/router";
import { Observable } from 'rxjs/Observable';
import 'rxjs/add/observable/forkJoin';
import DiffMatchPatch from 'diff-match-patch';
const dmp = new DiffMatchPatch();

@component({
selector: 'act-step-content-validation',
templateUrl: './act-step-content-validation.component.html',
styleUrls: ['./act-step-content-validation.component.css']
})
export class ActStepContentValidationComponent implements OnInit {

@input() metadata;
public loading: boolean = false;
public previewModifications;
public previewImport;

constructor(private actService: ActService,
private dialogService: DialogService) {
}

ngOnInit() {

this.loading = true;
this.actId = this.metadata.id;

Observable.forkJoin([
  this.actService.getWorkingActPreview(this.metadata.id.value, "10-10-2018"),
  this.actService.getImportedActPreview(this.metadata.id.value)
])
  .subscribe(([wP, iP]) => {
  this.previewModifications = wP;
  this.previewImport = iP;
    dmp.Diff_Timeout = 0;
    let diffOutput = dmp.diff_main(this.previewModifications, this.previewImport, false);
    console.log(diffOutput);
  });

}

Here is the error : image

What's wrong ? thank you

How to compare multi-line diff

demand:
I want to match based on the number of rows. According to the basis of each line, get the detailed information about deleting/adding/modifying each line

The data information view is as follows:
image

Update to newest Google version

Thanks for packaging for node.
It appears that Google has migrated their repo to github and have been making updates. Can you please update this to match the latest version?

Library throws "URI malformed" error when creating patch with emojis

The following code:

const DiffMatchPatch = require('diff-match-patch');
const dmp = new DiffMatchPatch();

const patchText = dmp.patch_toText(dmp.patch_make('', '👨‍🦰 👨🏿‍🦰 👨‍🦱 👨🏿‍🦱 🦹🏿‍♂️'));
const patchObj = dmp.patch_fromText(patchText);
const [patchedText] = dmp.patch_apply(patchObj, '');
dmp.patch_toText(dmp.patch_make(patchedText, '👾 🙇 💁 🙅 🙆 🙋 🙎 🙍'));

Will throw an error "URI Malformed" at this line. That's often the problem when using encodeURI on arbitrary data (the md5 package has the same problem) but in that case as far as I can see the inputs are valid UTF-8.

I think either patch_make or patch_apply generates invalid text.

But also I'm wondering why is encodeURI needed in this lib? Wouldn't a simple escape/unescape of specific reserved characters be enough?

Wrong license on npm

The version released on npm as 1.0.0 has http://www.apache.org/licenses/LICENSE-2.0 in the license field, which has been fixed in master ("Apache-2.0")

This means that tools for automated license verification of dependencies fail on this package, as they can't recognise the exact license used.

Could you release a new version in npm with the contents of master?

API messed up 1.0.0 -> 1.0.3

1.0.0 worked.
In 1.0.3, the API is somehow messed up. The test harness does not detect it.

// that's what it should be but is not:
var a = [[DIFF_EQUAL, 'ab'], [DIFF_INSERT, '123'], [DIFF_EQUAL, 'c']]
var b = dmp.diff_main('abc', 'ab123c', false)
// Instead it's a strange object
console.log(b)
[ { '0': 0, '1': 'ab' },
  { '0': 1, '1': '123' },
  { '0': 0, '1': 'c' } ]
console.log(Array.isArray(a[0])) // true
console.log(Array.isArray(b[0])) // false
console.log(b)
assertEquivalent(a, b); // yes, equivalent

And it breaks ES6 clients:

const [eq, str] = a[0]  // works
const [eq, str] = b[0]  // breaks

patch_toText, unified diff like output

The docs says this method should create something that looks like a unified diff. When I try it I just get "[object ...]" blah blah output. Looking at the actual source code for the method, it doesn't do much, it simply tries to output some of the objects in the diff. Here's the actual source code:

/**
 * Take a list of patches and return a textual representation.
 * @param {!Array.<!diff_match_patch.patch_obj>} patches Array of Patch objects.
 * @return {string} Text representation of patches.
 */
diff_match_patch.prototype.patch_toText = function(patches) {
  var text = [];
  for (var x = 0; x < patches.length; x++) {
    text[x] = patches[x];
  }
  return text.join('');
};

So patches[x] is a javascript object, and the text.join method just creates these faulty "object" strings. I can't find any code that does anything like the comments in the source code mentions. So I guess either we need some more code or the docs should be updated and mark this method as faulty.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.