The physical world is a complex place. It’s no wonder that there are so many ways to describe it using geographic data. But that data itself can be quite difficult to understand, with each format a little (or a lot) different from the others. We’ve compiled the most common data file types for GIS and other mapping data. We’ll explain how you would use each type, how you might convert them, and how they help you build a map.
Shapefile is a popular format for geographic data originally defined by Esri, the company behind ArcGIS and other GIS software. The singular “shapefile” is a misnomer, as there are actually multiple files needed to describe the geographic data in a shapefile. For every shapefile, at least three files are required:
- .shp is what gives it the name shapefile and where the geometric features are described, such as lines, points, and polygons.
- .shx is an index of the .shp file data, compiled into a binary format which makes it easier for applications like ArcGIS to access the data.
- .dbf is a database format that contains additional data that is not necessarily geographic, such as names/labels for your shapes.
While not required, another common file is .prj, which contains the projection details. This file describes how to convert three dimensional data (i.e., shapes on the earth) to a two dimensional representation.
And there’s more. Shapefiles can contain any of these files: .ain, .aih, .atx, .cpg, .ixs, .mxs, .qix, .shp.xml
To read shapefiles, you need software that understands the format. Several Esri products in the ArcGIS suite of desktop software support shapefiles, including the free ArcGIS Explorer. There is also an open source geographic data editor, QGIS, which can read and write shapefiles.
Before there was Google Maps, there was Google Earth. And before Google Earth, there was Earth Viewer, made by Keyhole, Inc., later acquired by Google. Earth View became Google Earth, and the Keyhole Markup Language became a popular format for describing geographic data.
KML, used in .kml files, is an XML-based data format. It’s moderately human readable, since XML is written in plain text. For example, here’s a single point on a map, along with some metadata associated with the location:
<Placemark> <name>Paris</name> <Point> <coordinates>2.3522219,48.856614,0</coordinates> </Point> <ExtendedData> <Data name='Flag'> <value>http://upload.wikimedia.org/wikipedia/en/thumb/c/c3/Flag_of_France.svg/23px-Flag_of_France.svg.png</value> </Data> <Data name='Continent'> <value>Europe</value> </Data> </ExtendedData> </Placemark>
You can create KML files with BatchGeo that have many locations and can be loaded into Google Earth or other geographic software. In addition to points, KML can include lines and shapes, as well.
KMZ files, with the .kmz extension, is a compressed version of KML. It’s a binary format meant to increase the efficiency of sharing KML files, since the size is significantly smaller.
To convert KMZ to KML:
- Rename filename.kmz to filename.zip
- Decompress the file like you would other ZIP files (usually you can just double click).
After decompressing, you should have a KML file (it may be inside a directory/folder).
A common way to generate geographic data is with a GPS device, such as Garmin, Magellan, or even a smart phone. While the output formats vary greatly, they typically describe points or lines. For individual locations, you may need to save them periodically (sometimes called waypoints). Most commonly, the entire route will be stored as a series of points.
GPX files, with a .gpx extension, is the most common of the many GPS data output formats. Like KML, it is based on XML, so it is text-based markup. GPX can contain both waypoints and tracks/routes. For tracks, there will be multiple points that describe the route. That portion of the GPX file will look something like this:
<trk> <name>Golden Gate Bridge Welcome Center to Vista Point</name> <number>1</number> <trkseg> <trkpt lat="37.8077333" lon="-122.4750286"> <name>TP001</name> </trkpt> <trkpt lat="37.80771" lon="-122.47502"> <name>TP002</name> </trkpt> <trkpt lat="37.8077" lon="-122.47502"> <name>TP003</name> </trkpt> ... </trkseg> </trk>
GPX is often used to track and share a certain route, such as a hike or run. It’s also the format accepted by OpenStreetMap, a community that has used GPS traces to create a map of the world.
GPI, with a .gpi extension is a Garmin file format. It’s one of many formats that OpenStreetMap recommends converting to GPX. You can see all the GPS trace formats on the OpenStreetMap site.
Geographic data may not be the first thing you think of when it comes to spreadsheets and other tabular data formats. That said, a spreadsheet is one of the easiest files to share and remains very human-readable. For that reason, you’ll often find location data stored in Excel, Numbers, Google Sheets, or other spreadsheet formats. Alternatively, spreadsheet-equivalent formats like CSV or tab-delimited files are a way to store data without requiring a specific spreadsheet application. These can typically be imported as a spreadsheet.
Unlike other formats that are specific to geographic data, there is no schema for how locations are stored in spreadsheets. Among the conventions, however, are to have separate columns for levels of data. For example:
- State of Province
These can be turned into geographic data by geocoding the address or location data. Some geocoders require the individual location pieces, while others want a single field.
Other methods of storing geographic data are to store latitude and longitude coordinates directly, usually in two columns.