How to Parse and Standardize Street/Postal Addresses

For any apps or websites that work with addresses, it's necessary for these addresses to be validated and parsed, as well as standardized and verified. There are various mechanisms that are best suited to different projects, so figuring out what you need exactly isn’t always that easy.

What Problems Appear Around Parsing and Standardization?

There are three primary issues that often occur in the parsing and standardization process.

Splitting Lines and Numbering the Pieces

As I mentioned in my computational survivalist post, I’m working on a project where I have a dedicated computer with little more than basic Unix tools, ported to Windows. It’s given me a new appreciation for how the standard Unix tools fit together; I’ve had to rely on them for tasks I’d usually do a different way.

I’d seen the nl command before for numbering lines, but I thought, “Why would you ever want to do that? If you want to see line numbers, use your editor.” That way of thinking looks at the tools one at a time, asking what each can do, rather than thinking about how they might work together.

Java 8 Java.Time Package: Parsing Any String to Date [Code Snippets]

In one of my projects, I received a requirement that stated that while parsing a text file, Strings denoting a date or a timestamp are expected to be in many different formats that are not known in advance, yet all of them represent a valid date or timestamp needed to be parsed properly. So, the solution I came up with is this: To have a set of formats stored in the property file, and when a String needs to be parsed, the formats are read from a file and attempts to parse the String are made sequentially with each format until it is parsed successfully, or until we run out of formats. The advantages of this solution are that if you discover a valid String that was not parsed successfully, all you will need to do is to add a new format to your properties file and no re-compilation and re-deployment is needed. Also, this way, you can set your priorities: Say if the US date format is preferable to the European one, just place US formats first and only after the European ones. Also, in Java 8, the format Strings allow for the optional format sections denoted by '[]'. So, several formats actually may be combined into a single one with optional sections. For example, instead of:

MM/dd/yyyy

MM-dd-yyyy

MM.dd.yyyy