Giter VIP home page Giter VIP logo

autolink-java's Introduction

autolink-java

Java library to extract links such as URLs and email addresses from plain text. Fast, small and smart about recognizing where links end.

Inspired by Rinku. Similar to it, regular expressions are not used. Instead, the input text is parsed in one pass with limited backtracking.

This library requires Java 7. It works on Android (minimum API level 15). It has no external dependencies.

Maven coordinates (see here for other build systems):

<dependency>
    <groupId>org.nibor.autolink</groupId>
    <artifactId>autolink</artifactId>
    <version>0.6.0</version>
</dependency>

Build status Coverage status Maven Central status

Usage

Extract links:

import org.nibor.autolink.*;

String input = "wow, so example: http://test.com";
LinkExtractor linkExtractor = LinkExtractor.builder()
        .linkTypes(EnumSet.of(LinkType.URL, LinkType.WWW, LinkType.EMAIL))
        .build();
Iterable<LinkSpan> links = linkExtractor.extractLinks(input);
LinkSpan link = links.iterator().next();
link.getType();        // LinkType.URL
link.getBeginIndex();  // 17
link.getEndIndex();    // 32
input.substring(link.getBeginIndex(), link.getEndIndex());  // "http://test.com"

Note that by default all supported types of links are extracted. If you're only interested in specific types, narrow it down using the linkTypes method.

There's also a static method to replace links found in the text. Here's an example of using that for wrapping URLs in an <a> tag. Note that it doesn't handle escaping at all:

import org.nibor.autolink.*;

String input = "wow http://test.com such linked";
LinkExtractor linkExtractor = LinkExtractor.builder()
        .linkTypes(EnumSet.of(LinkType.URL)) // limit to URLs
        .build();
Iterable<LinkSpan> links = linkExtractor.extractLinks(input);
String result = Autolink.renderLinks(input, links, (link, text, sb) -> {
    sb.append("<a href=\"");
    sb.append(text, link.getBeginIndex(), link.getEndIndex());
    sb.append("\">");
    sb.append(text, link.getBeginIndex(), link.getEndIndex());
    sb.append("</a>");
});
result;  // "wow <a href=\"http://test.com\">http://test.com</a> such linked"

Features

URL extraction

Extracts URLs of the form scheme://example with any scheme. URIs such as example:test are not matched (may be added as an option in the future). If only certain schemes should be allowed, the result can be filtered.

Includes heuristics for not including trailing delimiters such as punctuation and unbalanced parentheses, see examples below.

Supports internationalized domain names (IDN). Note that they are not validated and as a result, invalid URLs may be matched.

Example input and linked result:

Use LinkType.URL for this, and see test cases here.

WWW link extraction

Extract links like www.example.com. They need to start with www. but don't need a scheme://. For detecting the end of the link, the same heuristics apply as for URLs.

Examples:

Not supported:

  • Uppercase www's, e.g. WWW.example.com and wWw.example.com
  • Too many or too few w's, e.g. wwww.example.com

The domain must have at least 3 parts, so www.com is not valid, but www.something.co.uk is.

Use LinkType.WWW for this, and see test cases here.

Email address extraction

Extracts emails such as [email protected]. Matches international email addresses, but doesn't verify the domain name (may match too much).

Examples:

Not supported:

  • Quoted local parts, e.g. "this is sparta"@example.com
  • Address literals, e.g. foo@[127.0.0.1]

Note that the domain must have at least one dot (e.g. foo@com isn't matched), unless the emailDomainMustHaveDot option is disabled.

Use LinkType.EMAIL for this, and see test cases here.

Hashtag extraction

Extract Twitter/Facebook/Instagram Hashtags such as #photooftheday. Hashtags need to start with #. In general hashtags do not contain special characters other than _. Also hashtags cannot be all numeric. For example, #123 is not a valid hashtag.

Examples:

You can configure the allowed special characters in the hashtag. For example if you want to extract the hashtags containing both _ and '-', then you can configure your builder as shown:

import org.nibor.autolink.*;

String input = "#photo-of-the-day";
Set<Character> allowedSpecialChars = new HashSet<Character>() {{
    add('_');
    add('-');
}};
LinkExtractor linkExtractor = LinkExtractor.builder()
        .linkTypes(EnumSet.of(LinkType.URL, LinkType.HASHTAG))
        .allowedHashtagSpecialChars(allowedSpecialChars)
        .build();
Iterable<LinkSpan> links = linkExtractor.extractLinks(input);
String result = Autolink.renderLinks(input, links, new LinkRenderer() {
    @Override
    public void render(LinkSpan link, CharSequence text, StringBuilder sb) {
        System.out.println("Link: " + link);
        sb.append("<a href=\"https://api.twitter.com/1.1/search/tweets.json?q=");
        sb.append(text, link.getBeginIndex(), link.getEndIndex());
        sb.append("\">");
        sb.append(text, link.getBeginIndex(), link.getEndIndex());
        sb.append("</a>");
    }
});
System.out.println(result); // <a href="https://api.twitter.com/1.1/search/tweets.json?q=#photo-of-the-day">#photo-of-the-day</a>

see test cases here.

Contributing

Pull requests, issues and comments welcome ☺. For pull requests:

  • Add tests for new features and bug fixes
  • Follow the existing style (always use braces, 4 space indent)
  • Separate unrelated changes into multiple pull requests

License

Copyright (c) 2015-2016 Robin Stocker and others, see Git history

MIT licensed, see LICENSE file.

autolink-java's People

Contributors

robinst avatar mtddk avatar andyklimczak avatar vijaykumarmidde avatar

Stargazers

 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.