Saturday, July 7, 2012

Census 2011 Online Maps User Guide

This is a comprehensive introduction to a free online application available on @ aus-emaps.com website and called Census 2011 Online Maps. The application contains a selection of Australian 2011 Census of Population and Housing data published by the Australian Bureau of Statistics. It is designed specifically to facilitate creation of custom thematic choropleth maps with Census 2011 data (that is, maps are not predefined upfront and each is generated “on the fly”, according to user specifications).

Let’s start with some warnings and tips which will help in creating meaningful and informative maps.

Warning 1: Mapping Census data for the entire country poses some challenges since there are very significant differences between individual regions of Australia. Take for example median weekly personal incomes that vary from $76 to $8,312. It is therefore more practical to generate maps for smaller areas, such as individual States, rather than attempting to create a meaningful summary measure for the country as a whole.

Warning 2: Thematic choropleth maps are considered not suitable for mapping absolute values (eg. counts of persons). If absolute values are used for mapping areas that vary in size, misleading maps can be produced.

On the other hand using ratios, which show the relationship between two quantities, eliminates the influence of area, so that the map becomes meaningful by portraying accurately the distribution of features. Therefore, it is recommended to use only datasets labelled as “proportions” or “median values” for thematic choropleth mapping.

Nevertheless, there are legitimate cases where mapping absolute values can be justified. For example, when geography areas are of equal size (which is most likely the case with smaller geography units and/or urban locations) or when you deliberately want to “visualise absolute numbers” to help in defining territories based on counts of people with certain characteristics (since it easy to identify on the map which polygons are next to each other).


GENERAL INFORMATION

Base Geography:

The first release of Census 2011 Online Maps offers only one geography option - Postal Areas (which are good approximation of Australia Post Postcodes).

Data:

Census 2011 Online Maps application provides convenient access to approximately 150 datasets selected from the Basic Community Profile (only a fraction of 7942 individual datasets published by the ABS but of most interest to all). This initial selection will be extended in due course. 

Where appropriate, “proportions of total” for individual datasets were calculated and added to selection. There is also an overall “population density” dataset available for mapping. Full list of available datasets can be downloaded in a csv format by clicking on the following link: Census 2011 Online Maps Data List.


CONFIGURING MAPS

Start with clicking on a blue configure map text in the top right corner of the page, then follow the steps below.

Data Selection:

To simplify selection of individual datasets they have been grouped into themes and topics.


Select Theme:
Select from a dropdown list: Persons, Family or Dwellings

Select Topic:
Select relevant topic from a dropdown list (topics are specific for individual themes).

Select Data Item:
A list of available data items for a given theme and topic will be printed on the page. Select dataset of interest by clicking on the relevant box. Your selection will be indicated by change of box colour to a light blue. 

Tip: to reload the current selection just “unselect” and “select” it again by clicking twice on the light blue box.


Region Selection:

Select from a dropdown list. Options include Australia, States and larger capital cities. Default is “Australia”.

Changing region will trigger data refresh (ie. histogram, summary statistics, and data ranges will be recalculated and reloaded).


Data Exclusions:

Sometimes data is not available for all locations, either due to confidentiality or lack of valid answers to Census questions. By default empty cells are not counted in calculations of data statistics (ie. counts, mean and median, etc).

Some data series have lots of zero values. Users have an option to remove cells with “0” from calculations of data statistics. Changing option will trigger data refresh (ie. histogram, summary statistics, and data ranges will be recalculated and reloaded).


Histogram Customisation:

Default method for calculating optimal number of histogram bins is sq root of count of records, divided by 2 and rounded to a full number. Users can adjust the number of bins for a histogram by typing in a preferred number. It will trigger data refresh (ie. histogram will be reloaded).



Selecting Best Data Classification Method:

Start with inspecting histogram to determine how data is distributed. General rule is that for normally distributed data (symmetrical, bell shaped histogram) Standard Deviation is the optimal method for defining values for range breaks.

[Example of histogram of normally distributed data]

In this application St Deviation with 3 classes defines range breaks at -1SD and +1SD. With 5 classes, range breaks are placed at -1.5SD, -0.5SD, +0.5SD and +1.5SD.


Equal Range (also known as Equal Distribution or Interval) is the best option for uniformly distributed data (histogram with bars of similar height) as for example, Median Rents for QLD.

[Example of histogram of uniformly distributed data]


In Equal Range breaks are determined by dividing the difference between max and min values by the selected number of ranges. This method can result in ranges with few or no records, if used with other than uniformly distributed data.


The majority of Census data will have irregular distribution for which Natural Break or Quantiles classification methods are the most appropriate.

[Example of histogram of irregularly distributed data]

Jenks algorithm is used to determine range breaks for Natural Break classification. The algorithm groups data into classes that are themselves as separate as possible, but where the data values within each class are fairly close together. That is, it maximises the differences between the classes and minimises the differences within the classes. This classification can be used to discover spatial patterns within the data, but it can lead to some classes being populated by low numbers of observations. ABS recommends Natural Break as a default classification for Census data but prefers an alternative methodology, based on the Dalenious Hodges algorithm.


Quantiles also work well with irregularly distributed data and this is probably the most well known data classification method. Range breaks for quantiles are calculated by sorting data in an ascending order and dividing into equal parts, based on the number of selected classes. Therefore each range will contain a similar number of units (ie. equal count or proportion of records, polygons). If the entire dataset is divided into halves, the range break is a median. Dividing data into four classes produces quartiles separated by three range breaks. Quintiles have five classes with four range breaks.


The fifth classification method available in Census 2011 Online Maps is “Custom Ranges”. Simply, type in your preferred values in relevant boxes to overwrite automatically generated numbers.


Min and Max values are used to define respectively the lower and the upper limits for the “outer ranges”, inclusive of these numbers. Selectors can be adjusted to exclude those values from ranges, if required.


Changing Classification Method and Colour Scheme:

Select Classification Method:
Select from a dropdown list: St Deviation, Equal Range, Quantiles or Natural Break.


Note to Internet Explorer 7 users: this browser version has very inefficient Javascript processing engine so, when executing complex code that handles large volume of data the browser may be struggling. This could be the case when computing Natural Break ranges. A pop-up window will appear with the following warning: “Stop running this script? A script on this page is causing Internet Explorer to run slowly. If it continues to run, your computer may become unresponsive”.  Keep clicking “No” button until the process is finished (usually more than a few times!) and the browser will resume  normal operation.
 

Select Number of Ranges:
Select from a dropdown list a value corresponding to preferred number or ranges. Options vary depending on selected data classification method. Selecting ranges will trigger data refresh (ie. data ranges will be recalculated and reloaded).

Please note, Fusion Tables API has a limit of 5 colour ranges for thematic choropleth drapes. This limitation should not pose an issue for the majority of Census 2011 Online Maps users since ABS recommends 5 data ranges for thematic mapping as an optimal number.


Selecting Colours:
There are 3 single colour schemes (blue, purple and green) and one 2-colour scheme (red-blue) available for selection in the Census 2011 Online Maps application. For convenience, reverse colour option have also been added. Functionality allowing customising colours will be enabled at a later date.


Generating Thematic Map:

To complete map configuration step simply click on a blue button “Update Map”. A new thematic drape will be generated and loaded onto the map.



Please note, you may need to zoom in/ zoom out a few times to “force” update of thematic drapes (it is a known issue with Fusion Tables dynamic layer generation functionality).

Clicks on coloured polygons will open windows with detailed information about individual postcodes.



QUESTIONS

If you have any questions regarding the functionality of Census 2011 Online Maps please submit them as comments to this post. Requests for customisation of the application to specific business requirements can be sent to info@aus-emaps.com.

3 comments:

Wogfella said...

Very interesting - have you done any census mapping with the ABS suburb boundary data? I'm finding the conversion of the SHP to KML or any other vector type tends to create a file that is too large for either Maps API, Maps Engine, and even FUSION TABLES to handle.

Wogfella said...
This comment has been removed by a blog administrator.
Arek said...

Yes, it is a common problem and requires different approach. That is, large dataset such as suburbs are best handled as dynamic tiled data.

I have a tool called Thematic Mapper that can do this but it is in private release for now. Let me know if I can assist you in any way.

Best regards

Arek