We’ve explained what Mongo and NoSQL is, and why we’re using it. Now it’s the turn of the actual data access and manipulation methods, something we’ve termed Nucleus.
Nucleus is part of a bigger plan which Alex and I have been looking at around using SOA ((Service Oriented Architecture)) principles for data storage at Lincoln, in short building a central repository for just about anything around events, locations, people and other such ‘core’ data. We’re attempting to force any viewing or manipulation of those data sets through central, defined, secured and controlled routes more commonly known as Application Programming Interfaces, or APIs.
In the past it would be common for there to be custom code sitting between services, responsible for moving data around. Often this code would talk directly to the underlying databases and provide little in the way of sanity checking, and following the ancient principle of “Garbage In, Garbage Out” it wouldn’t be unheard of for a service to fail and the data synchronisation script to duly fill an important database with error messages, stray code snippets and other such nonsense which wasn’t valid. The applications which then relied on this data would continue as though nothing was wrong, trying to read this data and then crashing in a huge ball of flames. Inevitably this led to administrators having to manually pick through a database to put everything back in its place.
APIs avoid this by providing a way in which applications can read, write and manipulate data without actually having to touch it directly. The code which provides the API services is responsible for making sure that only valid operations can be performed on the data, and provides error messages in response to things which don’t make sense or can’t happen. Where in the past a direct database write would be able to bypass sanity checking (eg by creating an event which ended before it started) and would break things expecting only valid data, trying to do the same through an API will result in an error (In our case an “Error 88: Time Travel Exception”) and a clean database.
Aside from the benefits to applications not being able to write nonsense all over your shiny data set, APIs can also provide tightly integrated permissions. What we’ve done with Nucleus is build in a really powerful permissions system which can define down to the user level exactly what they can do with any given item of data. For example, as we pipe timetable events into the system we set permissions so that anybody can read the event, but it can’t be written. On the other hand something like Room Bookings will assign the creator of the event appropriate permissions to change and delete their booking. These permissions aren’t inherent to Total ReCal or Room Bookings, they must be obeyed by any application which is using the data since without the appropriate permissions being set the API simply won’t let it happen. Whereas in the past there was nothing technically stopping a room booking applications from accidentally deleting timetable events it’s now an impossibility.
Finally, and perhaps most interestingly, we’ve tied the Nucleus APIs (and by extension all of Total ReCal’s data) to our trial OAuth system. The full depth of OAuth is well beyond this blog post, and indeed the project’s scope as a whole, but put simply it’s a platform which allows people to grant specific access to their own data to other applications. In other words there’s nothing stopping somebody from building a web application (or mobile, or set-top box…) which accesses a user’s Total ReCal data and presents it in a different way. The API won’t work unless that user has explicitly granted the application the appropriate permissions by logging in, and the security model ensures that only the one user’s data is available to view (and not to write, unless that user has permission to do so and has granted it to the application).
In summary, by creating more work for ourselves in the short run by effectively building a whole new system just to store event data we have opened the door in the long run to easier development of applications which use that information for something else in a secure and controlled manner.
[…] This file is (currently) 157mb in size and has approximately 1.7 million rows. In his last post, Nick explained that we have now developed an events API for our Nucleus metadata service which is going to be the repository for all of this time space data. Currently we’re able to […]
[…] This file is (currently) 157mb in size and has approximately 1.7 million rows. In his last post, Nick explained that we have now developed an events API for our Nucleus metadata service which is going to be the repository for all of this time space data. Currently we’re able to […]