Capture Location Data & Geographic Coordinates from a Wikipedia Table in a Spreadsheet
Many modern teachers tell their students not to rely on Wikipedia as a source of information. But it’s difficult to ignore one of the Internet’s top sites, peer-reviewed all day long by the connected world. The free online encyclopedia is often the only place some information is aggregated. With crowd-sourced data about anything and everything, Wikipedia’s attraction is obvious. And that’s not to mention its lists and data tables that make it easy to copy and paste information into spreadsheets like Excel for further data analysis.
That said, there are still limitations to data gathering on Wikipedia, especially if locations are involved. While many Wikipedia tables contain the exact latitude and longitude coordinates of each location, other times, that information is missing. There are plenty of reasons to need that specific information. One example is if you’re dealing with locations that formerly existed. Exact coordinates of where these places once stood would be necessary as they may no longer exist on a map.
While you can certainly click on most Wikipedia pages to see a location’s coordinates in the upper right-hand corner, you wouldn’t want to have to do that for each item in a 135-location table. There’s an easier way to obtain coordinates from Wikipedia locations using the MediaWiki API result—and we’ll show you how in this post.
Copy All Wikipedia Table Data to a Spreadsheet
Step one of capturing location data from a Wikipedia table in a spreadsheet is getting the data from a Wikipedia table into a spreadsheet. While there are multiple ways of doing so, including just regular copy-paste, to get gather what we need to use the MediaWiki API, we’ll need to use a tool like Table Capture.
- Add the Table Capture Google Chrome extension
- IMPORTANT: Click the extension in the upper right-hand corner of your browser and opt for options
- On the page that pops up, ensure Extract link URLs from table cells is checked
- Now, navigate the Wikipedia page with a table you wish to pull data from
- Click the extension
- Select your desired table
- Click the icon that represents the action you want to take (i.e. Copy table data to the clipboard, Export table to Google Sheets, etc.)
- Paste to your spreadsheet if necessary
Now that you have the Wikipedia table data in your spreadsheet, we’ll need to isolate captured links from table in your spreadsheet so that we can use the MediaWiki API.
Isolate Captured Links & Queries in Your Spreadsheet
The data we captured from the Wikipedia table in the previous section contains Wikipedia page links, which we’ll need in order to obtain exact geographic coordinates. However, these URLs are buried within the data, not useful until we can isolate them into one column.
The method we’re about to show you is quicker, especially if you have hundreds of locations, as with our Former Major League Baseball stadiums. So let’s continue.
- In your spreadsheet, identify the column that contains both the names of your locations and their corresponding Wikipedia page URLs (if the previous step was done correctly). In our case, that’s the “Stadium” column
- Next, you’ll need to separate location names from the URLs using Excel’s “Text to Columns” tool (Google Sheets has a similar feature called “Split text to columns”):
- Navigate to the “Data” tab in Excel and select “Text to Columns…”
- Opt for Delimited characters seeing as the URLs should be contained within parentheses
- Click Next and check “Other”
- Type in an open parenthesis ((), then click Finish
- Note: If any rows end up split into three columns instead of two (one for location names, one for Wikipedia page URLs), you’ll need to use
=CONCATENTATE
to combine the two parts of the links - First, make these easy to identify by using “Sort & Filter” > “Sort A to Z”
- Then, in the second cell of a fourth column, type
=CONCATENTATE
, click the first cell with part of a URL, “(”, followed by a comma and the second URL cell - Drag that formula down to all of the similar cells in the column
- Copy and “Paste Values” into that same column, then select the previous two columns, delete and shift them left
- Note: If any rows end up split into three columns instead of two (one for location names, one for Wikipedia page URLs), you’ll need to use
- You’ll finish up by following the same “Text to Columns” steps above and splitting the entire column by ending parentheses
- Then, use “Conditional Formatting” in the Home tab to highlight any link cells in which a starting parenthesis remains as part of the link
- You can either use ‘=CONCATENATE’ or manually add back the closing “)” to all of the highlighted cells
- Just one more step: we only need the query string or parameter part of the link, so you’ll once again use “Text to Columns,” this time noting “/” in the “Other” space
- Delete the other split link columns and you’re done!
Now you have the exact links to the exact Wikipedia pages, and can use the MediaWiki API result to more easily grab their exact coordinates.
Use MediaWiki API Result to Get Coordinates
With our data gathered and locations and link queries in their respective columns, we can finally use the MediaWiki API result in order to obtain latitude and longitude coordinates with minimal clicks. Non-developers needn’t worry—this is the easy part for any experience level.
- In your spreadsheet, copy (Ctrl+C or Cmd+C) your first Wikipedia page query
- Navigate to MediaWiki API result and paste your query at the end of the URL bar, replacing our Sahlen_Field example
- Press enter, then copy and paste the resulting “lat” and “long” into your spreadsheet!
They’re even in the perfect format for making a custom Google Map from your data…
Map Your Coordinates & Data
As an optional, final step, you might plot your newly gathered data points on a Google Map. Here’s how:
View MLB Stadiums in a full screen map
- Open your spreadsheet
- Select (Ctrl+A or Cmd+A) and copy (Ctrl+C or Cmd+C) your data
- Open your web browser and head to batchgeo.com
- Click on the location data box with the example data in it, then paste (Ctrl+V or Cmd+V) your own data
- Check to make sure you have the proper location data columns available by clicking “Validate and Set Options”
- Select the proper location column from each drop-down
- Click “Make Map” and watch as the geocoder performs its process
Get started for free at batchgeo.com.