Wednesday, October 5, 2011

Getting data organised

My first task with the NMA project was to get started working with the data. Mitchell Whitelaw helpfully set us up with some example code.

Our data came in a verbose xml that was too big to keep in memory in  Processing, so Mitchell showed us how to in Processing split the data and parse it into JSON format one line at a time, extracting only the data we needed. JSON is a lightweight format based on JavaScript that works well with Java (Processing).

Mitchell also demonstrated loading images from the collection (you can't load all at once - there are 20,000 in 3 different sizes!) and picking random objects to show, using a class for items. He also showed us hashmaps, which I first used with myTram - calling a key is much easier to work with than trying to remember an index position. The hashmap here contains arraylists of items organised by object type.

I used the hashmap to select a random object type to show all of the objects of that type in the collection. Clicking through random object types is not a bad way to start browsing. The data was indeed organised!

Showing an object type - motor cars, there are 11 in the NMA collection
Next I wanted to be able to sort the data, so that I could view it other than randomly. It was easy to sort an array alphabetically or numerically using the Processing sort array function, so I converted my arraylist of object types to an array, and hey presto I had a Ben Ennis Butler inspired histogram! It was indeed easy to scroll though object types and see how many of each there were.

Object type histogram, alphabetically sorted - advertising cards
Due to memory I only visualised the first 20 object types, but in the future I could have a more sophisticated way of not bothering with what was not on screen.

After this, however I was stuck. I wanted to sort numerically by the number each object type. I couldn't do this with arrays, because even if I extracted an array of all the counts and sorted this, there would be no way to syncronise it with any other lists.

The answer - to make another class for objtypes, and then to use comparators which instruct how to compare objects. In this case the comparator says when sorting an arraylist of object types to compare them based on the size of their corresponding arraylist of items.

I visualised this simply as a list for now. I would have to think about what to do visually with the scale difference between the most numerous couple of object types (6000, 3000, 2000) and the quick drop off (to a few hundred) and then a long tail (2, 1). Mitchell suggested something like a treemap that was compact.

List of most numerous object types - there are 6,000 mineral samples in the collection

List of some of the object types for which there are only 1 in the collection

I think that now I have the organisation to get started in making mockup visualisations in Processing - I still have to figure out how to translate to an online world. Hopefully I can experiment with the NMA API before building my own MySQL database.

No comments:

Post a Comment