How To – Political Ad Sleuth, Part 2

How did it go? Did you have any problems opening the file? If so, you may want to make it smaller by getting rid of information you don’t really need. If your ‘puter didn’t choke on that, you can skip the next step and move on to cleaning it up and refining things.

Reduce File Size

Leave your original file alone and save a second copy as _edit or something.

The first thing I do is right click on the number 1 so the entire row is highlighted. Delete. *Tip: Go all the way to the top in Office 2010. Click View>Freeze Panes>Freeze Top Row so the top row will always be visible.

Every data set is different. Let’s see what we want to keep and what we want to get rid of. Less data makes it easier to focus on what’s important and reduces the file size:

  • source_file_url is useful, but not right now. The column is too big and gets in the way.
  • tv_market-id is not necessary. We’ll use the tv_market.
  • fcc_folder can go.
  • file_name can go.
  • and everything else to the right of advertiser_name can be deleted.

It’s trimmed. Time to polish.

Google Refine

Download. Extract all files. Run the .exe file. Now you’ve got Refine. It opens in a browser window.

Browse to your file and upload it into Refine.

The Art of Refining

Google has some great videos to get you started. I’m starting with the ad_type. I went to the little box at the top of the column. Facet>Text Facet. The list on the left is populated with each unique name. At the top of that box, sort by count.

Look at each group and try to get it down to as few groups as possible. Watch the first tutorial at the link above to see how to do this.

Here’s my list of ad_types.

Non-Candidate Issue Ads    10999
US Senate    4993
US House    4860
President    4642
State    2739
Local    461
US Congress    100
Terms and Disclosures    21
Candidate Ads Rate Cards    4
Classes of Time    3
Political Guidelines    3
Station Contacts    3
CRAVAACK715920 (13500717880925)    1
Duckworth 10.01-10.01 C399508 R    1
Duclworth 09.25-09.30 C399804 R    1
flinn for congress 9-12_2012091    1
Foster 10.09-10.14 C396162 Rev0    1
Foster 10.22-10.28C396159 Rev00    1
KNBC  tacts (13444398098929)_.p    1
Smith Inv. 94328 (1346341508269    1
Station  tacts (13450581429382)    1
(blank)    112

I can’t put any of the remaining contracts into any of the bigger groups for certain. Remember not to get overzealous. Make sure you’re making changes that maintain the integrity of what we started with.

Let me know if you have any questions so far. How’s it going? Are my instructions easy to follow? Is this helpful so far?

By the end, we’ll be able to generate charts and graphs with amazing detail. Just stick with it! Next time, we’ll get even more detailed.

Event – The Marketing of a President 2012

Aside

 

ProPublica is hosting an online event tonight at 6:30: The Marketing of a President 2012.

I’m attending. I’ve copied the details below. Hope to see you there!

*Free and open to the public on a first come, first served basis.

Campaign 2012 has already proven to be the most expensive presidential election to date with both parties pouring in hundreds of millions of dollars to influence voters. In particular, candidates and interest groups have increasingly relied on sophisticated online targeting and marketing efforts tailored to key voter groups. But how exactly do campaigns track voters online? And how much do they know about us?

Panelists:

Lois Beckett, reporter at ProPublica
Kate Kaye, managing editor at ClickZ News
Joseph Turow, professor of communication at the University of Pennsylvania’s Annenberg School

Moderator:
Farai Chideya, author, journalist, professor and former NPR News & Notes host

Not in NYC? Watch it live here: http://www.ustream.tv/channel/tenement-talks

You can also tweet questions to #PTTalks, and we’ll pass them along to the panel.