You are browsing the archive for 2013 April.

Home to a New Set of Clothes

April 8, 2013 in Uncategorized

A non-DH part of my Day of DH.

After ten hours in and about campus today, home through the pouring rain to find a package waiting for me. My doctoral robes! They had been my graduation present when finishing my PhD last year, but I’d felt conflicted and hesitant about buying them unless I became a full-time academic.

Not pictured: the relatively classy robes and hood that come with the set. But this is a great hat.

Not pictured: the relatively classy robes and hood that come with the set. But this is a great hat.

Well, as of last summer I’ve been at the University of Waterloo, and I finally got around to ordering them. Hopefully there’ll be many more years, and convocations, to attend.

The hat is a bit ridiculous, but it symbolizes a good chunk of my youth, so I’ll be wearing it with pride at our next graduation ceremony.

A Day of (Exhausted) DH?

April 8, 2013 in Uncategorized

A few hours later, I’ve been at a few hours of candidate interviewing as well as a job lunch.

It’s hard to switch gears right back into my research, let alone my teaching (it simply wouldn’t be fair for me to do my marking right now). Luckily, it’s pouring outside, so I will probably wait for another 90 minutes or so until I can get a pick up from my partner.

My secret weapon. It seems wasteful, but when I'm here I often spend about 10 hours on campus so it seems worth it.

My secret weapon. It seems wasteful, but when I’m here I often spend about 10 hours on campus so it seems worth it.

In the meantime, time for even more coffee (must be cup.. number.. five by this point?!).

In any case, on a day like today when I’m exhausted, I really do try to mentally reassure myself that everything will be okay – that I’m not an impostor – (my first year on the tenure-track and I’ve managed to keep my research agenda going full speed)  is good. But there are always more grant applications to fill out, proposals to read, classes to tweak, etc.

Coffee’s ready, let’s see what I can do with the time I have left in the office.

Today’s Data

April 8, 2013 in Uncategorized

The big dataset I’ve been playing with is a ‘small’ subset of the Archive Team Geocities Snapshot. What I’ve done is gone into that collection and downloaded the subsites of the ca.geocities.com domain.

It’s funny how I consider that ‘small,’ when the compressed .tar file that results from downloading them all and putting them together is something like 2.74GB. But anyhoo, compared to the complete collection that is in the 680GB+ range, it’s ‘small.’ In any case, it’s a good training set – a decent size, able to be digested and analyzed, and most of it is focused on Canadian topics.

What the heck do you do with all that data?

Since I’m thinking methodologically these days, I’m considering that part and parcel of my research process. So far, I’ve done the following:

(1) created a mirrored version of them all on my hard drive, which lets me go in and see what each site looked like back then. At some point, I need to get a legacy browser emulator to do that.

(2) backed it all up so I can mess around with things in impunity

(3) Most importantly, to date, I’ve created a full-text repository of each site. To do this, I went through each site with Mathematica, scraped out the plaintext, and aggregated each individual account’s site into one big text file. Some of these are massive, some small. But this lets me start thinking about them at the page level, and also lets me quickly find the information I might be looking for.

(4) From that, I’ve initially created unigram data – the word frequency of each site, one with stopwords, one without. So for each site I have something like this:

{{"sara", 70}, {"said", 63}, {"jack", 60}, {"kyle", 56}, {"time", 54},
 {"mother", 48}, {"just", 48}, {"asked", 46}, {"like", 43}, {"man", 41},
 {"did", 39}, {"tower", 38}, {"didn", 38}, {"medea", 37}, {"father", 36},
 {"saw", 35}, {"way", 35}, {"woman", 35}, {"face", 33}, {"knew", 30},
 {"began", 30}, {"simon", 30}, {"came", 30}, {"jason", 29}, {"edward", 29},
 {"eyes", 29}, {"new", 29}, {"thought", 29}, {"felt", 28}, {"moment", 28},
 {"know", 28}, {"stories", 28}, {"village", 27}, {"good", 27}, {"hand", 26},
 {"replied", 25}, {"beautiful", 25}, {"looked", 24}, {"evie", 24},
 {"love", 24}, {"selena", 23}, {"glass", 23}, {"turned", 23}, {"light", 23}, ...}

One of the goals with this is to find out what we can from collections like this, and then compare it to what we can learn from collections that are compiled in WARC files. What should historians be pushing for? How should we try to advance digital preservation for our own needs? How does the metadata look like? And, more importantly for me, what can we learn? Can I detect the voices of young Canadians? Can we extract date information to see how things have evolved over time?

So many questions.

But now, a few quick pieces of paperwork await, and it’s off to the interview.

A Day of Digital Humanities (or, a day of service)

April 8, 2013 in Uncategorized

This year’s Day of DH is falling on a day when the service side of my job (along with research/teaching) rears its head. I’m on the search committee for our new head of the Archives and Special Collections here at the University of Waterloo. While I obviously won’t be talking much about the specifics of that process, I’ve managed to find three or four hours today to hopefully do some interesting DH work.

Thinking of service, today is bearing out a comment I think Ted Underwood tweeted or retweeted this weekend (it came across my phone, and I’m too hurried to find it): that while grad school doesn’t do a good job of preparing people for non-academic careers, it doesn’t always prep people well for some dimensions of academic ones! I find the administrative side of my job has a decent learning curve, from finding out how to write proper covering documents here at UW, what to wear in what context (today I’m wearing a jacket and dress shirt, which is pretty dressy for me), how decisions are made, how to work within a large bureaucracy.

It's me! Trust me, I don't normally dress like this. But since a candidate is coming in, I want them to feel as respected as possible.

It’s me! Trust me, I don’t normally dress like this. But since a candidate is coming in, I want them to feel as respected as possible.

I’m finding service opportunities really rewarding and interesting, however, so I’m glad for that!

Where all the 'magic' happens: Hagey Hall at the University of Waterloo.

Where all the ‘magic’ happens: Hagey Hall at the University of Waterloo.

The day starts with an early trip into campus.

Since the interview process doesn’t start until 9am, that gives me an hour or so to do my usual morning tasks:
– Skim through the Twitter feed: it’s become the hub of my learning activities, and that’s how I keep on top of what everybody else is doing in the field;
– Deal with any pressing e-mails, making sure my unread count is at zero and all fires are put out;
– Drink a cup of coffee;

And if I find some time, which may be tricky today, I always do read the paper copy of the Globe and Mail. It’s funny to read the analog rather than digital edition, but I find the layout works and it forces me to read a bunch of articles I wouldn’t otherwise touch. Strange times.

Also today, I just launched a blog post: “Saving History: The Eternal Responsibility of the Historian,” over at my normal site. Hopefully people find it somewhat interesting.

Skip to toolbar