How to Use Classifications With Adobe Analytics Data Feeds and R

Follow @trevorwithdata

Adobe Analytics Classifications is one of the most useful and popular features of Adobe Analytics, allowing you to upload meta-data to any eVar, prop, or campaign that you may be recording in Adobe Analytics. Classifications are useful when you need to do things like:

Classify your marketing campaign tracking codes into their respective marketing channels
Bucket your product SKUs into product categories, or give them a friendly consumable name
Classify your pages into site sections
Categorize your video content into long or short videos
etc.

Here’s the problem though: If you’re a heavy user of Adobe Analytics Data Feeds, it can be confusing when analyzing your product SKUs or campaign tracking codes only to find those crucial friendly names or campaign channel classifications missing. I’ve often wished that the Data Feed tool would just include any classifications for a report suite by default, but alas.

In this post, I’m going to show you where to find the classifications you need; I’ll also show you how you can incorporate them into your R/sparklyr analysis. I’m going to be relying heavily on my previous posts including using the R interface for Apache Spark – sparklyr as well as how to set up Adobe Analytics Data Feeds. So, assuming you’ve got some data feeds ready to go, here’s what you’re going to need next:

First, you’ll need admin access (or find someone who has it) to your Adobe Analytics setup. You’ll navigate to Admin -> Classification Importer. From there, you’ll click on “Browser Export”, or if your classification files are really big (more than 50,000 rows) you’ll need to go to “FTP Export”:

Once there, you’ll want to focus on the following:

Make sure you’ve selected the right report suite at the top.
Choose the variable you want to download the classifications for – in my example above, you can see I’ve selected “Campaigns” which is where I’ve stored all of my tracking codes that get classified. If you want the products or any other variable, just select it in the drop-down.
Make sure to select all rows – this is the default, but double check it to make sure.
You’ll need to make sure you go back far enough to get all of the classification values you need – so I tend to go back a year or two from the earliest dates in my data feeds just to be safe.
Finally, fill out the FTP information and hit “Export File”. You should see the classification show up before too long. If you chose Browser Export previously, you can download it straight from the browser instead – again this only works if you’re not classifying very many values. Once complete, you should see a file named something like “SC_EXPORT_campaigns_classifications_reportsuite.tab” show up in your FTP folder.

Now that you’ve got the classification file, you’ll need to open it in a text editor and get rid of a few superfluous rows in the file – they all start with the text “## SC”. You’ll also want to remove the extra space between the column headers and the actual lookup data as well. You should wind up with a file that looks something like this (I’ve just made up some fake examples):

Key	Classified Value 1	Classified Value 2
abcd1234	Email	Sports
efgh5678	Email	Sports
ijkl9101	Display Ad	News
mnop1121	Affiliate	Fashion

Next comes the fun part. Fire up sparklyr and load in the data feed files along with the classification files you just downloaded. In my case, I put them into a root directory “/data”:

library(dplyr)
library(sparklyr)
setwd("/data")

#Read Data
sc = spark_connect(master="local", version="2.2.0")
data_feed_local = spark_read_csv(
 sc = sc, 
 name = "data_feed", 
 path = "*-report.suite_2017-08-*.tsv.gz", 
 header = FALSE,
 delimiter="\t"
)
campaign_class_local = spark_read_csv(
 sc = sc, 
 name = "campaign_class", 
 path = "SC_EXPORT_campaigns_classifications_report.suite.tab", 
 header = TRUE, 
 delimiter = "\t"
)

This loads the files into my Spark cluster and I’m ready to merge these two files together. To do this, I’ll first create a table of the data feed files (I’ve only included a few data feed columns for the sake of this example). Perhaps an obvious callout, but you’ll need to include the variable that you’re using classifications on for this to work – in this case, the “post_campaign” variable.

#data prep
data_feed_tbl = data_feed_local %>%
 mutate(
   merged_visitor_ids = paste0(V1,"_",V2),
   fixed_event_list = paste0(",",V5,",")
 ) %>%
 select(
   visitor_id = merged_visitor_ids,
   visit_num = V3,
   hit_time_gmt = V4,
   post_event_list = fixed_event_list,
   post_product_list = V6,
   post_campaign = V7
 )

With that table created, I’m ready to link it up to my classification file. This is pretty easy to do with a left_join:

data_feed_tbl = data_feed_tbl %>%
 left_join(campaign_class_local, by=c("post_campaign"="Key"))

Notice that I’ve specified which columns should be matching for the join – in this case, the “post_campaign” column in the data feed should line up with the “Key” column from the classification file.

And with that, you’re ready to go! My data feed table will now have all of the extra columns of classified values I need to do further analysis on. All in all pretty easy, so long as you know where to find the files you need. Best of luck to you!

Follow @trevorwithdata

Trevor Paulsen

Trevor is a group product manager for Adobe's Customer Journey Analytics (CJA). With a background in aerospace engineering and robotics, he has a strong foundation in estimation theory and data mining. Before leading Adobe's data science consulting team, Trevor used these skills to drive innovation in the fields of aerospace and robotics. When he's not working, Trevor enjoys engaging in big data projects and statistical analyses as a hobby. He is also a father of five and enjoys bike rides and music. All views expressed are his own.