The 5-Minute Interview: Bert Spaan, New York Public Library


Catch This Week’s 5-Minute Interview with Bert Spaan, Space/Time Directory Engineer at the New York Public LibraryFor this week’s 5-Minute Interview, I got to sit down with Bert Spaan, the Space/Time Directory Engineer at the New York Public Library. Bert just moved from Amsterdam to New York and is a member of the New York City Neo4j Meetup Group.

Q: Talk to me about your projects with Neo4j. What are you working on?

Bert: Before I moved to New York, I worked for the Waag Society, which is a non-profit research institute media lab in Amsterdam. One of the things I did there previously was working on a geocoder search engine for historical place names which was commissioned by the Ministry of Education. The aim of the project was to make sure that library archives had relevance and had a singular way to talk about all place names and street names.

As of course you know, places and streets change names all the time. Maybe Amsterdam is called Amsterdam right now, but in past centuries people just wrote down anything in the archives on a piece of paper. If you search through them with the word “Amsterdam,” you won’t find anything on local maps. The same goes with street names and municipality names, because of course, municipalities merge all the time.

To tackle this, institutions needed a way to find historical places and to have a common ground to talk about identifiers of those names. We’ve called this project Histograph, and we used Neo4j as one of our backends to create this graph of place names and historical place names and so that’s what I’ve done in the Netherlands.

The Histograph project finished up this past October and luckily I found this great new opportunity here in NYC for the New York Public Library. It’s called the Space/Time Directory, and a big part of it is it has the same goals as my project in the Netherlands, which is trying to create a sort of digital time machine to travel through the collection of the library.

The library, of course, has lots of maps, photos, books, archives and stuff, but you also need to find those things by place names and by street names. Part of the Space/Time Directory will be the search engine geocoder, so I will reuse some of my work on the Histograph. Neo4j is also back for this project, and I plan to use that as well.

Q: What are some of the most surprising results you’ve seen from your use of Neo4j?

Bert: What I like most about Neo4j is that it’s so easy to translate this spatial model of the data we had into something which you can store in the database. The model that we used was a graph – e.g., places that connected to each other and that changed over time – and those connections all need to be stored somewhere.

Our data model is easy to draw on a piece of paper and it’s easy to talk about. With Neo4j, it’s really easy to actually create the database, store it in a way that makes sense, and then also curate that data. It was just a really easy translation between our IDs, the data model and the actual database.

Q: Can you talk to me about what made you choose Neo4j specifically?

Bert: We tried a few graph databases, including Titan, before we chose Neo4j. I liked Neo4j both because the community was great, and it was clearly the most active in terms of development. I could see that you guys were organizing lead-up to try to go somewhere and there were clear plans for the future. All the other graph databases weren’t so clear, and so Neo4j was promising.

On the other hand, some stuff has been awkward to do with Cypher, but I’m sure you’re getting there because it’s easy to see that there’s lots of plans on the roadmap, so I have full trust that the couple of things that are not as easy as I’d hoped will become easier over time.

Also, there are great libraries with read-mes and lots of help with Stack Overflow. It’s nice to know that it’s not just our team that use Neo4j, it’s many, many more people, so there’s a lot of trust for the project.

Q: If you could take everything you’ve learned about Neo4j now and go back to the beginning, what would you do differently?

Bert: I don’t know. We changed many things over time so there is not really many things that I would do differently now. There are some things I would love to be able to do differently, but we currently aren’t able to.

For example, it’s not possible to index edges in Neo4j. That was a big problem for us because edges are a separate entity in our data model. Right now, we need to add new nodes for edges and so each edge is a node and two edges come out of that to the other nodes. Figuring that out was a bit difficult for us, which we didn’t know in the beginning. That discovery cost us some time to find out until we found ways around it, but when we look at the data just in the browser in the Cypher interface, it looks a bit awkward.

Of course we don’t use the browser all the time for our own graph visualizations in APIs. Those two things, I think, made Neo4j a little bit more complicated, and we didn’t know that upfront, so if we would have known it all ahead of time, then we would have started thinking about it differently and wouldn’t have spent so much time figuring it out.

Editor’s note: The ability to index edges is currently on the product roadmap for a future edition of Neo4j.

Q: Anything else you want to add or say?

Bert: I just can’t wait to see what comes out of my work here because, in the Netherlands, it was mostly about names of places and that was quite interesting for me and for other people in the project.

But at the New York Public Library, their collection is just so beautiful and rich. They have the most beautiful old maps, including some that are 400 years old, and photos of the whole city of New York outside. Much of these materials are already online in digital collections at nypl.org, but it’s still a bit difficult to find, especially in one place.

When we are finally able to connect this beautiful, beautiful material that the library has – all in one search for either a place or time – it will be fantastic. APIs and graph databases are only interesting for some people, but when you connect to something as rich and beautiful as this collection at the library in New York then I think you get something really nice. That’s what I’m looking forward to.

Want to share about your Neo4j project in a future 5-Minute Interview? Drop us a line at content@neotechnology.com


Interested in how graph databases compare to RDBMS? Download this ebook, The Definitive Guide to Graph Databases for the RDBMS Developer, and discover when and how to use graphs in conjunction with your relational database.