How To – Political Ad Sleuth, Part 1

We’re going to teach ourselves a new skill today. Step by step.

We’ll download a raw set of fresh data and analyze it (using excel and google refine) for fodder for a print story – deadline Friday (Today is Wednesday; This is real time). I’ll break this tutorial down into smaller posts to leave space for questions along the way.

I don’t want to lead you into the dark. Let me shine a little light backward for a minute.

Public Inspection Files

The FCC started requiring that broadcast television stations affiliated with the top networks in the top fifty markets upload their public inspection files to a website, where the public can access them. The new rule went into effect on August 2, 2012.

I wrote a story about the process. It was a bitch. Before August 2, the stations kept their public files – including the political advertising file – in the office on paper. You could come and look and pay a quarter per page if you wanted them to run copies. It was hard to keep track of everything and even more difficult to figure out details, like where the money is coming from. Details about the contents of the public file here.

Political Ad Sleuth

The Sunlight Foundation launched a new project called Political Ad Sleuth. They gather political advertising contracts from the FCC’s website, which is extremely time consuming if you’re doing it all yourself because each contract is in PDF format and folders, and volunteers who submit scans.

Political Ad Sleuth saves you days or maybe weeks of work and gives you a good starting point. They bring it all together and make it available, but it still needs to be cleaned and analyzed. You’ll soon see what I mean.

The Hypothesis

They sent out a press release today saying that Cincinnati was No. 7 in the nation last week when being ranked by the number of documents uploaded… which should be a contract, usually purchased/renewed on a weekly basis. That makes us the most heavily saturated Ohio market. Cleveland was the top market when I wrote that earlier story.

Is it true? Who is doing the advertising? What are they running? When did things change, exactly? Where are they buying the most time? Why did things change? Can you think of any other questions?

I stumble down the path and find new things that I never even thought of while I work.

Now that that’s out of the way, let’s test the theory.

Download the data

Click [CSV of all files]. It’s big – 28,951 records.. Name it and save it somewhere that it won’t get lost.

Have a look around.  Let me know when you’ve made it this far. Ask any questions as we go along. I respond pretty quickly.

Next, we’ll start cleaning it up and refining things.

How To – Political Ad Sleuth, Part 2

How did it go? Did you have any problems opening the file? If so, you may want to make it smaller by getting rid of information you don’t really need. If your ‘puter didn’t choke on that, you can skip the next step and move on to cleaning it up and refining things.

Reduce File Size

Leave your original file alone and save a second copy as _edit or something.

The first thing I do is right click on the number 1 so the entire row is highlighted. Delete. *Tip: Go all the way to the top in Office 2010. Click View>Freeze Panes>Freeze Top Row so the top row will always be visible.

Every data set is different. Let’s see what we want to keep and what we want to get rid of. Less data makes it easier to focus on what’s important and reduces the file size:

  • source_file_url is useful, but not right now. The column is too big and gets in the way.
  • tv_market-id is not necessary. We’ll use the tv_market.
  • fcc_folder can go.
  • file_name can go.
  • and everything else to the right of advertiser_name can be deleted.

It’s trimmed. Time to polish.

Google Refine

Download. Extract all files. Run the .exe file. Now you’ve got Refine. It opens in a browser window.

Browse to your file and upload it into Refine.

The Art of Refining

Google has some great videos to get you started. I’m starting with the ad_type. I went to the little box at the top of the column. Facet>Text Facet. The list on the left is populated with each unique name. At the top of that box, sort by count.

Look at each group and try to get it down to as few groups as possible. Watch the first tutorial at the link above to see how to do this.

Here’s my list of ad_types.

Non-Candidate Issue Ads    10999
US Senate    4993
US House    4860
President    4642
State    2739
Local    461
US Congress    100
Terms and Disclosures    21
Candidate Ads Rate Cards    4
Classes of Time    3
Political Guidelines    3
Station Contacts    3
CRAVAACK715920 (13500717880925)    1
Duckworth 10.01-10.01 C399508 R    1
Duclworth 09.25-09.30 C399804 R    1
flinn for congress 9-12_2012091    1
Foster 10.09-10.14 C396162 Rev0    1
Foster 10.22-10.28C396159 Rev00    1
KNBC  tacts (13444398098929)_.p    1
Smith Inv. 94328 (1346341508269    1
Station  tacts (13450581429382)    1
(blank)    112

I can’t put any of the remaining contracts into any of the bigger groups for certain. Remember not to get overzealous. Make sure you’re making changes that maintain the integrity of what we started with.

Let me know if you have any questions so far. How’s it going? Are my instructions easy to follow? Is this helpful so far?

By the end, we’ll be able to generate charts and graphs with amazing detail. Just stick with it! Next time, we’ll get even more detailed.