Giter VIP home page Giter VIP logo

shapefile-1's Introduction

Shapefile

Simple java library to read point and polygon shapefiles.

In addition, this library contains classes that extend Hadoop FileInputFormat and RecordReader to enable users to read shapefiles placed in HDFS. Since shapefiles are relatively "small" to a typical 128MB Hadoop block, the isSplitable function implementation always returns false. That means that only one mapper will be used to read the content of a shapefile in HDFS in an MapReduce job. The shapefile specification describes an Index Sequential Access Model that could be used to create splits on "large" shapefile. This is on the TODO list :-) When defining the input path to the job, a user can point to a folder or a set of folders containing both the .shp and/or the .dbf files.

Input Format Writable Description
DBFInputFormat MapWritable Read a DBF file in HDFS
PointInputFormat PointWitable Read an SHP file in HDFS containing simple 2D point geometry
PolygonInputFormat PolygonWitable Read an SHP file in HDFS containing simple 2D polygon geometry
PointFeatureInputFormat PointFeatureWitable Read SHP/DBF files in HDFS containing simple 2D point feature
PolygonFeatureInputFormat PolygonFeatureWriable Read SHP/DBF files in HDFS containing simple 2D polygon feature

This library depends on the Esri Geometry API for Java under the GIS Tools For Hadoop.

Clone or pull the latest version of the geometry API and build/install it before building this project.

Sample MapReduce Job

public class ShapefileTool extends Configured implements Tool
{
    public static void main(final String[] args) throws Exception
    {
        System.exit(ToolRunner.run(new Configuration(), new ShapefileTool(), args));
    }

    @Override
    public int run(final String[] args) throws Exception
    {
        final int rc;
        final JobConf jobConf = new JobConf(getConf(), ShapefileTool.class);

        if (args.length != 2)
        {
            ToolRunner.printGenericCommandUsage(System.err);
            rc = -1;
        }
        else
        {
            jobConf.setJobName(ShapefileTool.class.getSimpleName());

            jobConf.setMapperClass(PolygonFeatureMap.class);

            jobConf.setMapOutputKeyClass(NullWritable.class);
            jobConf.setMapOutputValueClass(Writable.class);

            jobConf.setNumReduceTasks(0);

            jobConf.setInputFormat(PolygonFeatureInputFormat.class);
            jobConf.setOutputFormat(TextOutputFormat.class);

            FileInputFormat.setInputPaths(jobConf, new Path(args[0]));
            final Path outputDir = new Path(args[1]);
            outputDir.getFileSystem(jobConf).delete(outputDir, true);
            FileOutputFormat.setOutputPath(jobConf, outputDir);

            JobClient.runJob(jobConf);
            rc = 0;
        }
        return rc;
    }
}

The mapper function emits the value of the CNTRY_NAME attribute and the centroid of the polygon envelope as a Text instance:

final class PolygonFeatureMap
        extends MapReduceBase
        implements Mapper<LongWritable, PolygonFeatureWritable, NullWritable, Writable>
{
    private final static Text NAME = new Text("CNTRY_NAME");

    private final Text m_text = new Text();
    private final Envelope m_envelope = new Envelope();

    public void map(
            final LongWritable key,
            final PolygonFeatureWritable val,
            final OutputCollector<NullWritable, Writable> collector,
            final Reporter reporter) throws IOException
    {
        val.polygon.queryEnvelope(m_envelope);
        final Point center = m_envelope.getCenter();
        m_text.set(String.format("%.6f %.6f %s",
                center.getX(), center.getY(), val.attributes.get(NAME).toString()));
        collector.collect(NullWritable.get(), m_text);
    }
}

Sample Shp Usage

final File file = new File("cntry06.shp");
final FileInputStream fileInputStream = new FileInputStream(file);
try
{
    final Envelope envelope = new Envelope();
    final Polygon polygon = new Polygon();
    final ShpReader shpReader = new ShpReader(new DataInputStream(new BufferedInputStream(fileInputStream)));
    while (shpReader.hasMore())
    {
        shpReader.queryPolygon(polygon);
        polygon.queryEnvelope(envelope);
        final Point center = envelope.getCenter();
        System.out.format("%.6f %.6f%n", center.getX(), center.getY());
    }
}
finally
{
    fileInputStream.close();
}

Sample DBF Usage

final File file = new File("cntry06.dbf");
final FileInputStream fileInputStream = new FileInputStream(file);
try
{
    final Map<String, Object> map = new HashMap<String, Object>();
    final DBFReader dbfReader = new DBFReader(new DataInputStream(new BufferedInputStream(fileInputStream)));
    while (dbfReader.readRecordAsMap(map) != null)
    {
        System.out.println(map);
    }
}
finally
{
    fileInputStream.close();
}

Sample DBF Field Usage

final File file = new File("cntry06.dbf");
final FileInputStream fileInputStream = new FileInputStream(file);
try
{
    final DBFReader dbfReader = new DBFReader(new DataInputStream(new BufferedInputStream(fileInputStream)));

    System.out.println(dbfReader.getFields());

    final int numberOfFields = dbfReader.getNumberOfFields();
    byte dataType = dbfReader.nextDataType();
    while (dataType != DBFType.END)
    {
        System.out.println("-----");
        for (int i = 0; i < numberOfFields; i++)
        {
            System.out.println(dbfReader.readFieldWritable(i));
        }
        dataType = dbfReader.nextDataType();
    }
}
finally
{
    fileInputStream.close();
}

shapefile-1's People

Contributors

mraad avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.