A smaller data set




So far, I have one complete calendar year of port records, and it’s 1805. The rest of my larger data set is spread from 1886 to 1817. Here, I’ve made a graph of nodes from only 1805.


1805 full data legend 1805 full data

Again I see immediately that London merchants have a big role in this port, but also that in this year, the activity of Baltimore’s merchants is a higher proportion of the whole.


More Gephi


Since the graph with all 3000 nodes is pretty busy, I thought about looking at the “major players” – those nodes with degree 10 or higher. Degree is simply a measure of the number of connections a node has. The highest degree in this dataset is 66, with¬†Average Degree: 7.448

major players legend major players


The legend tells me that there is about 1 merchant from Baltimore represented on this graph. Who is it? Not sure – when I filter for the location Baltimore within this subset, I don’t get any results.

Gephi graphs

Some initial results from my entire database:

raw data


The data for my entire database contains almost 3000 nodes. Applying some partitions and layouts results in something that looks a bit more organized:

full data unprocessed legend full data unprocessed


It’s easy to see right away that London merchants make up a sizable portion of owners of cargoes moving in and out of Barbados. One manipulation I’d like to make to the dataset is to group the ports by nationality/empire and color code them accordingly. I haven’t figured out the best way to do that yet.

And what about Baltimore? If I isolate those nodes, the graph looks like so:

Baltimore nodes

Database and python script for Gephi

Good morning! Before I step away from my computer for a few hours, I wanted to share my database rationale and the python code my partner wrote to convert my database into usable tables for Gephi.

The database is my transcription of port records from the British Caribbean. Each record corresponds to an entrance or clearance from the port, and includes a number of data points about that ship. For the Gephi graph, the important points are Location and Owner Name (there are actually 23 fields for this, since some voyages have many owners). Location is the port where the ship’s cargo is registered. The limitation here is that occasionally an owner registers a cargo in a port that is not his/her home port. However, because of the size of my database and the stage at which I sit right now, I’m taking the cargo registration as the home location for the owners. The other complicating factor is that occasionally an owner will have different cargos registered out of more than one port. The python script is written to match a name with only one location, so once it retains the first location it matches with a name.

The script changes my data table into two outputs: a reference file and an output file (these become the nodes and the edges, respectively). There are three variables that have to be customized for each new data set: nummerchants, numlines, largestgroup.

The script was written in Python by my partner, Weston, who you can tweet @westonschreiber. If you have questions or comments about it, feel free to post here where both he and I can see the question.

Plan for Day of DH

My daily work life isn’t that exciting since I’m a remote dissertation writer who teaches online. I don’t have meetings, leave my house for work or even interact with living humans very much. Try not to be too jealous, for it can be pretty isolating. While thinking of how to make my usual practice a little interesting for everyone, including myself, I decided that for Day of DH I’m going to start interrogating the data I have up to this point with Gephi and maybe some other viz tools. I did some preliminary runs of this a few weeks ago, but since then I’ve transcribed a couple years’ worth of port entries and clearances. As I go, I’ll post some graphs and tentative interpretations.