Giter VIP home page Giter VIP logo

Comments (4)

nenadpnc avatar nenadpnc commented on May 31, 2024 1

@appshore i've just published an update with your suggestions. Please check it, and let me know if there are any problems. Thanks.

from cl-editor.

appshore avatar appshore commented on May 31, 2024 1

@nenadpnc It is working as expected. I'll do more tests over the next weeks and I'll report if needed.

Thank you for the quick update and this nice svelte editor.

from cl-editor.

nenadpnc avatar nenadpnc commented on May 31, 2024

@appshore Can you show me example of incorrect stripped html?
As i see here, your solution differs in that it doesn't replace Mso css classes (Microsoft Word css classes), and removes match(/<!--StartFragment-->(.*?)<!--EndFragment-->/).
Default behaviour is to not transfer styling from Word document (or any document) on pasting, but to keep text form. Can you check that your solution doesn't break that?

from cl-editor.

appshore avatar appshore commented on May 31, 2024

The issue seems to come first from the match. Mso classes (all classes in fact) will be removed with my modifications:

		.replace(/class="[^"]*"/gi, '')
		.replace(/class='[^']*'/gi, '')

Here a complete command line index.ts file test case:

export const cleanHtml = (input: string) => {
    // remove line brakers and find relevant html
    const html = input.replace(/\r?\n|\r/g, ' ').match(/<!--StartFragment-->(.*?)<!--EndFragment-->/);
    let output = html && html[1] || '';
    console.log('cleanHtml origin html', html)
    output = output
                // 1. removeMso classes
                .replace(/(class=(")?Mso[a-zA-Z]+(")?)/g, ' ')
                // 2. strip Word generated HTML comments
                .replace(/<!--(.*?)-->/g, '')
                // 3. remove tags leave content if any
                .replace(new RegExp('<(/)*(meta|link|span|\\?xml:|st1:|o:|font|w:sdt)(.*?)>','gi'), '')
                .replace(/<!\[if !supportLists\]>(.*?)<!\[endif\]>/gi, '')
                .replace(/style="[^"]*"/gi, '')
                .replace(/style='[^']*'/gi, '')
                .replace(/&nbsp;/gi, ' ')
                .replace(/>(\s+)</g, '><');
                        
    // 4. Remove everything in between and including tags '<style(.)style(.)>'
    output = removeBadTags(output);
    return output;
}

export const cleanHtml2 = (input: string) => {
	let output = input
		.replace(/\r?\n|\r/g, ' ')
		.replace(/<!--(.*?)-->/g, '')
		.replace(new RegExp('<(/)*(meta|link|span|\\?xml:|st1:|o:|font|w:sdt)(.*?)>', 'gi'), '')
		.replace(/<!\[if !supportLists\]>(.*?)<!\[endif\]>/gi, '')
		.replace(/style="[^"]*"/gi, '')
		.replace(/style='[^']*'/gi, '')
		.replace(/&nbsp;/gi, ' ')
		.replace(/>(\s+)</g, '><')
		.replace(/class="[^"]*"/gi, '')
		.replace(/class='[^']*'/gi, '')
		.replace(/<[^/].*?>/g, i => i.split(/[ >]/g)[0] + '>')
		.trim()
	// 4. Remove everything in between and including tags '<style(.)style(.)>'
    output = removeBadTags(output);
    return output;
};

export const removeBadTags = (html: string) => {
    ['style', 'script', 'applet', 'embed', 'noframes', 'noscript'].forEach((badTag: string) => {
        html = html.replace(new RegExp(`<${badTag}.*?${badTag}(.*?)>`, 'gi'), '')
    });

    return html;
}

let input = '<h1><div class="Mso-xyz" style="font-size:2em">Hello World</div></h1>';
console.log('input', input);
console.log('cleanHtml origin', cleanHtml(input));
console.log('cleanHtml modified', cleanHtml2(input));

from cl-editor.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.