Us and Them

The main goal of our batch geocoding service is not just to provide map coordinates, but also give a quick platform for those wishing to instantly plot map data on the web without any coding. There are several other sites out there that try to do the same. Wayfaring, MapBuilder, Platial, and so on.

I don’t work in a vacuum, I keep an eye on the competition. I really like the interfaces out there. To a certain extent, they all provide unique features that my service does not. My goal with the batch geocoder was to focus on making a tool that is fast, easy, and lets people work with data they already have.

Sitting down and creating a map by hand using one of these other tools takes quite some time. You must add locations one by one on a map. My service dumps the user name and password concept and lets people create and save maps instantly with no barriers.

Can you edit points one by one and move them around by clicking on a map? No you can’t do that. But I question how many people will create useful maps by creating points one by one. There’s only so many jogging maps, or bike ride maps, or maps to grandma’s house before you begin to question the value. Remember its all about the data… and for data to be useful you need lots of it. That’s why standardizing on formats like tab delimited is a good idea.

The user can keep their data in a file or in a database on their system, instead of on a server somewhere. They can easily edit their data, add/remove columns, etc.. No fussing about with some proprietary interface for editing it online.

Anyway, that’s my take on it. If you have feedback please let me know. I am not opposed to the idea of building in some of the tools that the “other guys” offer, but I could spend time in other areas too.

Yahoo set to launch international maps?

Yahoo needs international mapping to be a serious competitor to Google Maps in the long run, and recently there have been a few things that hint that international support may be coming.

First, an interesting post by Yahoo Employee Alan Brown, who points out that last year Yahoo acquired a company that provides geocoding on an international level.

Second, more recently, someone noticed that the Yahoo geocoding API is sporadically returning a number for the precision. The number appears to be the most appropriate zoom level for the geocoded point. This could be looked at as a way to make the interface more generic for international support.

Whatever all this means, my guess is just to stay competitive Yahoo will be going international soon. The question is will they be taking all their nifty geocoding features that make a service like ours possible. To stand above the rest, I’d say its probably a good idea.

Amazon S3 for map tile storage and delivery?

Amazon recently launched their new S3 Storage service, and everyone seems to be clamoring to figure out uses for it. Well here is my contribution: map tile storage and serving.

Think about it: You want to create your own tile based map delivery (because your boss has been nagging you about it ever since the Google Maps launch), but where are you going to store those gigabytes and gigabytes of images. Not to mention how to deliver them? Stick them all in a database? Write a wrapper script to that? Sounds like an awful lot of bandwidth, cycles, and storage is going to be needed. Imagine every time there is a breakdown, pager goes off and you got to fix it.

Or, get an S3 account and blaze away. I’d bet dollars to donuts that S3 is a heck of a lot cheaper than your average “enterprise” network storage solution. In fact I’ll just tell you: it’s cheaper. The drawbacks, well you won’t have LAN speed access to it, but if your target is the internet who cares. Likely it will take just as long to generate all those tiles as it will to upload them anyway.

I’d also bet that Amazon’s delivery will be much faster with lower latency than what an average sized shop could do on a T1 with a weenie little 4 processor database server. Who knows what sort of super optimized proprietary network, hardware, and software architecture Amazon has put together to make their system work. More than likely its much better than what’s available off the shelf.

Did I mention scalability? Scalability in this matter is just sending more dollars off to Amazon. Assuming your business model has you making more for each visit than you need to spend, you will just keep making more and more profit, no matter how many users show up. Got Slashdotted? No biggie, the server capacity briefly expands to take on the Niagara falls sized volume, and then returns to normal when it has passed. Point is, you captured every bit of the revenue generated from that extra traffic.

So the next question is: after storage, what’s next? Application delivery?

Maybe instead of thinking in terms of racks of servers, we should be thinking of tracking cycles and storage down to smallest possible measurements and paying for only what we need, with endless ability to scale. Developers can add their applications to the Internet Borg cube, and after some marketing, expect to see a linear increase in profits along with traffic. No more hassling over rack space, load balancers, hard drive failures, backups, software licensing, and so on.

Here it comes, the infinitely scalable internet application model. Sustainable growth, just add water.

Commercial or Public, It’s still all about the data

My day job at the City of Portland lets me work with really cool data. Take a look at PortlandMaps for example. It has several dozen different datasets all rolled into one easy to navigate interface. The mapping GUI and speed is not up to today’s standards of AJAX based map viewers (yet), but the underlying data is much more complete and powerful than what is available anywhere at the national level.

For example, we have access to four counties worth of parcel, or tax lot, data This information is key in seeing where property lines are on a map without squinting through the trees on aerial photos (but we do have those too.) We also have building footprints for the entire city of Portland. Overlay the two on top of an 6 inch/pixel aerial photograph and pair it with weekly updated assessor data, and you have a very powerful property viewing tool.

We also have great data for zoning, utilities, crime incidents, hazard levels, building permits, City Parks, etc.. We have first hand access to all this data because our group at the city is responsible for gathering it from the various regional entities (mostly government based, at the city and county level.) In exchange for the entities giving us their data, we give them back all of the other data we have collected. The three most popular: tax lots, aerial photos, and street center line.

A few years ago we decided it might also be nice to open up access to the general public, hence PortlandMaps.com. After its launch, it soon became apparent that PortlandMaps was not only an excellent tool for citizen access, but for all of our data partners as well. It has become a invaluable resource for both.

Where am I going with all of this? Well the main point is: it’s all about the data. PortlandMaps would not exist if it was not for the work of hundreds of individuals at the city, county, and state level creating datasets and giving them back to the public for free.

Now we have a parallel with companies like MapQuest, Yahoo, and Google all offering transportation/routing information at the national (and sometimes international) level. These are great services, but they only provide directions and routing.

Why not provide all of the data of PortlandMaps, in a nationwide interface? Again: It’s all about the data. Even if they could collect data from all the various counties, cities, and states in the U.S. compiling it all into one database would be a sizeable task. Companies like Zillow are attempting this, they have parcel data in many areas as well as detailed assessor records. No doubt a huge effort went into Zillow gathering and normalizing data from all of these various entities.

I know how hard that can be, because I see what we must go through in Portland to do it on a local level. Data formats are different and can change at the will of the data provider. There are no standards, so creating one for all data to file into is a task to say the least. Add into that the data providers tendency to set rules on how data can be used or how much it might cost to obtain it and at what difficulty (hint: they don’t just leave it out on an FTP server somewhere.)

The local data providers know how valuable their data is, and even though they might be required by law to make it publicly available (in the case of government agencies), they will make it as difficult as possible. Again, a similar parallel to the commercial data providers like Navteq and TeleAtlas. Getting that data from these companies is not usually difficult, in fact if you own a car with a navigation system you probably already have a copy of it. But they impose strict licensing rules that limit what you can use it for and maybe even charge extra. This is why it is estimated that Yahoo, MapQuest, and Google all pay a small fee back to their data providers every time they calculate a route. Now no doubt this is a small fee, probably a small fraction of a penny, but a fee all the same.

Now the service providers are interested in giving away free APIs, to further expose their branding and potential advertising to would be affiliate web sites. No doubt checking their every move is the data providers, who desperately need to protect the value of their hard worked for data. Data that needs to be maintained constantly to keep up with the ever changing infrastructure of our country. Just like the local data providers that help PortlandMaps become a service, everyone wants to protect what they work so hard on to create and maintain.

It’s all about the data.

Google Maps vs. Yahoo Maps vs. MapQuest – API’s

Since Google Maps launched their API allowing developers to use their mapping service to draw their own data, Yahoo has tried to play catchup with their own API. Well now with MapQuest’s announcement of their new API, it’s now a three way. Which one to choose?

Google Maps API

Pros:

  • Fluid interface, brilliant looking map marker flyouts
  • International
  • Built in Aerial Photos
  • Largest developer base, as a result…
  • Lots of hacks and how-to’s available

Cons:

  • No built-in geocoding service
  • No built-in routing capability

Yahoo Maps API

Pros:

  • Built-in and external geocoding capability
  • Very flexible and open API’s
  • Rate limiting by IP instead of appID
  • Built-in GeoRSS support
  • Flash version available

Cons:

  • U.S. and Canada only
  • Flyouts not quite as spiffy as Google
  • No aerial photo option

MapQuest API

Pros:

  • Built-in routing (driving directions) capability
  • Built-in geocoding capability

Cons:

  • No smooth AJAX client (yet)
  • Rate limiting by appID + web site URL (instead of end-user IP)
  • No photos option

Yahoo and MapQuest seem to be eager to please their developers, probably with good reason. They have a lot of catching up to do with Google. I give Yahoo a lot of credit for being first to release a AJAX map client with built-in geocoding functionality. That’s one clear area where they are ahead of Google.

Time will tell how sustainable each companies model is and how much change will be necessary. Remember too that they aren’t just always going to give this away for free, even if there will be no charge in the future, there are bound to be ADs.

Mashups getting mashed by data providers

Brian Flood has a piece on mashup fragility, and in particular our plight after Yahoo’s recent updates to their geocoder api. Then there is the article on the risk of mashups that kicked things off.

I think I have already said all that needs to be said on the issue, or have I?

Mashups are built on the shoulders of data and service providers, which seemlingly will need to get something out of the relationship. In the case of map interfaces, that could mean ads. In other situations, where there is not the same opportunity, I think we will see the data providers getting more and more stingy about who they let do what.

Right now I think Yahoo is being the most liberal in their data API’s, so I can’t really fault them for rolling back functionality on a service that nobody else is daring to offer. Also I’d remember there is another player in this game besides Yahoo, their data providers. They want to protect the investment and value in the data products they have created.

No doubt its an on going tango between data provider, service provider, and application provider (in this case, the batch geocoder.) I like being the application provider, its the most fun…. and even though there is certainly less control at the end of the chain, there is also less expense, less risk, and most importantly less work!

Geocoder updates

As a result of Yahoo Maps rolling back some functionality in their geocoder, I have had to get rid of a couple of features on the batch geocoder.

First, you will no longer be able to lookup associated 9 digit zip codes for your address list. I am not sure how many of you were interested in this functionality, if you were a fan of it, I’m sorry. Of course you can always use single free zip+4 lookup services like this one.

Second, the exact precision of the geocoded addresses will not be reported any longer. To help deal with this I have added a new feature, it will let you view the street level location of any point on the map just by clicking on it. It be used in place of an image if you haven’t populated the Image URL field. So you can click on points to verify that they fall on the correct street.

Please feel free to post your feedback on these new changes.

Yahoo disables JSON output on geocoding API

Word is that Yahoo will soon disable the JSON output format for their geocoding API, the REST based geocoder will remain. What does this mean? Well the JSON api is what makes tools like our batch geocoder possible. Without it I would need to use a server side proxy, meaning requests going to Yahoo would be coming from our web server instead of the end-user IP. This means the 50,000 per day limit would be set on the server, only 50,000 geocodes total for batchgeocode.com.

Why can’t the the user’s browser communicate directly with the XML based REST geocoding API? Well despite being built with nifty XML enabling features like XMLHttpRequest, modern browsers are held back by security constraints that keep client side scripts from communicating with multiple domains. JSON gets around this problem by using ON-Demand JavaScript to dynamically load content through <script> tags that don’t have the same cross browser limitation. Why do the browsers limit your ability to make calls out using XMLHttpRequest but not by using the <script> tag? Who knows….

What I do know is that I did see this coming, no way is Yahoo going to throw out a free geocoding API with a JSON output format and not think about the possibility of turning it off someday. It was inevitable that a service like batchgeocode.com would be created, and that would inevitably mean that the data providers would complain about such a service. Perhaps this is why the JSON output format was never mentioned on the Yahoo geocoding API reference page?

Still Yahoo is interested in providing geocoding services in their maps, it’s what differentiates them from the competition. So geocoding isn’t really going away its just getting reworked a bit. The whole farm is no longer available for free, but the house still is.

Calculating distances to multiple addresses is fun!

Okay maybe its not really that entertaining, but you can do it now by checking the “Calculate distance” option in Step #4 of the batch geocoder.

The distance is purposely limited to miles and kilometers (two digit precision.) Why not display more precision by using feet and meters? Well anyone who’s familiar with how geocoding works knows that it’s not quite that precise. Coordinates are calculated by finding the block the address is on, that part is quite accurate. Then the side of the street is determined by checking to see if the address number is odd or even, so far so good.

What follows is not so accurate…

First the point is set a certain distance from the street center line, after all the building is not likely to be in the middle of the block. However there is no good way to know just how far back from the center of the street the building is, so it’s guessed. Usually this is a global value set when the geocoder is configured. At the City of Portland we generally pick around 50 feet from the address block. The Yahoo Geocoder that I use for BatchGeocoder.com does not specify how far the offset from the centerline, but from my crude measurements its probably close to 50′. There is no way to know for sure how far back the building or building entrance is located, but 50′ is usually darn close.

The final step (and least accurate) in the geocoding process is to try to approximate a location along the block using the address number. This part is really just a total guess. Reason being is the address range on your average block face is a nice big range like 1000-2000, or 100-200. However on average there only exists a dozen or less properties on a block. The geocoder does not actually know how many properties are located on the block, the centerline data does not indicate this. In fact it’s not even sure if the address is really there or not. You can test this yourself by going to our single address lookup tool and entering an address number on your block that doesn’t exist.

The geocoder’s best guess about where the address might be located on the block is done by taking the street number calcing it’s position along the centerline using the block range. Example: If the range was 100-200, and the address number was 150, the geocoder would place the point halfway along the block range. Again, this is a guess at best. If the geocoder manages to place the point right on top of the address it is just getting lucky!

Now other things can help the accuracy, like setting an offset from the start of the block range (similar to the offset from the center line.) The geocoder does know which end of the street to start the calculation from (for example does the 100 address start on the north end or south end, east or west.) But for the most part, geocoding is not that accurate when looking at precision beyond the block range.

For most applications this doesn’t matter too much. If you are looking at points zoomed out to the zip code or city level, then who cares about +-100 feet of precision. For more precision you have to have parcel data that is linked to an address database. Then you are looking up actual addresses with attached parcel polygons and centering the point in the middle of the parcel (talk about accuracy!) A good example of this is PortlandMaps.com (the day job.)

So that is the not so short explanation of why batchgeocoding.com will not show you distance precision in feet and meters. Isn’t GIS fun?