Good morning! Before I step away from my computer for a few hours, I wanted to share my database rationale and the python code my partner wrote to convert my database into usable tables for Gephi.
The database is my transcription of port records from the British Caribbean. Each record corresponds to an entrance or clearance from the port, and includes a number of data points about that ship. For the Gephi graph, the important points are Location and Owner Name (there are actually 23 fields for this, since some voyages have many owners). Location is the port where the ship’s cargo is registered. The limitation here is that occasionally an owner registers a cargo in a port that is not his/her home port. However, because of the size of my database and the stage at which I sit right now, I’m taking the cargo registration as the home location for the owners. The other complicating factor is that occasionally an owner will have different cargos registered out of more than one port. The python script is written to match a name with only one location, so once it retains the first location it matches with a name.
The script changes my data table into two outputs: a reference file and an output file (these become the nodes and the edges, respectively). There are three variables that have to be customized for each new data set: nummerchants, numlines, largestgroup.
The script was written in Python by my partner, Weston, who you can tweet @westonschreiber. If you have questions or comments about it, feel free to post here where both he and I can see the question.