Yahoo set to launch international maps?

Yahoo needs international mapping to be a serious competitor to Google Maps in the long run, and recently there have been a few things that hint that international support may be coming.

First, an interesting post by Yahoo Employee Alan Brown, who points out that last year Yahoo acquired a company that provides geocoding on an international level.

Second, more recently, someone noticed that the Yahoo geocoding API is sporadically returning a number for the precision. The number appears to be the most appropriate zoom level for the geocoded point. This could be looked at as a way to make the interface more generic for international support.

Whatever all this means, my guess is just to stay competitive Yahoo will be going international soon. The question is will they be taking all their nifty geocoding features that make a service like ours possible. To stand above the rest, I’d say its probably a good idea.

Amazon S3 for map tile storage and delivery?

Amazon recently launched their new S3 Storage service, and everyone seems to be clamoring to figure out uses for it. Well here is my contribution: map tile storage and serving.

Think about it: You want to create your own tile based map delivery (because your boss has been nagging you about it ever since the Google Maps launch), but where are you going to store those gigabytes and gigabytes of images. Not to mention how to deliver them? Stick them all in a database? Write a wrapper script to that? Sounds like an awful lot of bandwidth, cycles, and storage is going to be needed. Imagine every time there is a breakdown, pager goes off and you got to fix it.

Or, get an S3 account and blaze away. I’d bet dollars to donuts that S3 is a heck of a lot cheaper than your average “enterprise” network storage solution. In fact I’ll just tell you: it’s cheaper. The drawbacks, well you won’t have LAN speed access to it, but if your target is the internet who cares. Likely it will take just as long to generate all those tiles as it will to upload them anyway.

I’d also bet that Amazon’s delivery will be much faster with lower latency than what an average sized shop could do on a T1 with a weenie little 4 processor database server. Who knows what sort of super optimized proprietary network, hardware, and software architecture Amazon has put together to make their system work. More than likely its much better than what’s available off the shelf.

Did I mention scalability? Scalability in this matter is just sending more dollars off to Amazon. Assuming your business model has you making more for each visit than you need to spend, you will just keep making more and more profit, no matter how many users show up. Got Slashdotted? No biggie, the server capacity briefly expands to take on the Niagara falls sized volume, and then returns to normal when it has passed. Point is, you captured every bit of the revenue generated from that extra traffic.

So the next question is: after storage, what’s next? Application delivery?

Maybe instead of thinking in terms of racks of servers, we should be thinking of tracking cycles and storage down to smallest possible measurements and paying for only what we need, with endless ability to scale. Developers can add their applications to the Internet Borg cube, and after some marketing, expect to see a linear increase in profits along with traffic. No more hassling over rack space, load balancers, hard drive failures, backups, software licensing, and so on.

Here it comes, the infinitely scalable internet application model. Sustainable growth, just add water.

Commercial or Public, It’s still all about the data

My day job at the City of Portland lets me work with really cool data. Take a look at PortlandMaps for example. It has several dozen different datasets all rolled into one easy to navigate interface. The mapping GUI and speed is not up to today’s standards of AJAX based map viewers (yet), but the underlying data is much more complete and powerful than what is available anywhere at the national level.

For example, we have access to four counties worth of parcel, or tax lot, data This information is key in seeing where property lines are on a map without squinting through the trees on aerial photos (but we do have those too.) We also have building footprints for the entire city of Portland. Overlay the two on top of an 6 inch/pixel aerial photograph and pair it with weekly updated assessor data, and you have a very powerful property viewing tool.

We also have great data for zoning, utilities, crime incidents, hazard levels, building permits, City Parks, etc.. We have first hand access to all this data because our group at the city is responsible for gathering it from the various regional entities (mostly government based, at the city and county level.) In exchange for the entities giving us their data, we give them back all of the other data we have collected. The three most popular: tax lots, aerial photos, and street center line.

A few years ago we decided it might also be nice to open up access to the general public, hence PortlandMaps.com. After its launch, it soon became apparent that PortlandMaps was not only an excellent tool for citizen access, but for all of our data partners as well. It has become a invaluable resource for both.

Where am I going with all of this? Well the main point is: it’s all about the data. PortlandMaps would not exist if it was not for the work of hundreds of individuals at the city, county, and state level creating datasets and giving them back to the public for free.

Now we have a parallel with companies like MapQuest, Yahoo, and Google all offering transportation/routing information at the national (and sometimes international) level. These are great services, but they only provide directions and routing.

Why not provide all of the data of PortlandMaps, in a nationwide interface? Again: It’s all about the data. Even if they could collect data from all the various counties, cities, and states in the U.S. compiling it all into one database would be a sizeable task. Companies like Zillow are attempting this, they have parcel data in many areas as well as detailed assessor records. No doubt a huge effort went into Zillow gathering and normalizing data from all of these various entities.

I know how hard that can be, because I see what we must go through in Portland to do it on a local level. Data formats are different and can change at the will of the data provider. There are no standards, so creating one for all data to file into is a task to say the least. Add into that the data providers tendency to set rules on how data can be used or how much it might cost to obtain it and at what difficulty (hint: they don’t just leave it out on an FTP server somewhere.)

The local data providers know how valuable their data is, and even though they might be required by law to make it publicly available (in the case of government agencies), they will make it as difficult as possible. Again, a similar parallel to the commercial data providers like Navteq and TeleAtlas. Getting that data from these companies is not usually difficult, in fact if you own a car with a navigation system you probably already have a copy of it. But they impose strict licensing rules that limit what you can use it for and maybe even charge extra. This is why it is estimated that Yahoo, MapQuest, and Google all pay a small fee back to their data providers every time they calculate a route. Now no doubt this is a small fee, probably a small fraction of a penny, but a fee all the same.

Now the service providers are interested in giving away free APIs, to further expose their branding and potential advertising to would be affiliate web sites. No doubt checking their every move is the data providers, who desperately need to protect the value of their hard worked for data. Data that needs to be maintained constantly to keep up with the ever changing infrastructure of our country. Just like the local data providers that help PortlandMaps become a service, everyone wants to protect what they work so hard on to create and maintain.

It’s all about the data.