Contoso is a gaming company that creates games for multiple platforms: game consoles, hand held devices, and personal computers ...

Contoso is a gaming company that creates games for multiple platforms: game consoles, hand held devices, and personal computers (PCs). These games produce a lot of logs and Contoso's goal is to collect and analyze these logs to gain insights into customer preferences, demographics, usage behavior etc. to identify up-sell and cross-sell opportunities, develop new compelling features to drive business growth and provide a better experience to customers.

This sample specifically evaluates the effectiveness of a marketing campaign that Contoso has recently launched by collecting sample logs, processing and enriching them with reference data, and transforming the data. It has the following three pipelines:

  1. The PartitionGameLogsPipeline reads the raw game events from blob storage and creates partitions based on year, month, and day.
  2. The EnrichGameLogsPipeline joins partitioned game events with geo code reference data and enriches the data by mapping IP addresses to the corresponding geo-locations.
  3. The AnalyzeMarketingCampaignPipeline pipeline leverages the enriched data and processes it with the advertising data to create the final output that contains marketing campaign effectiveness.

The sample showcases how you can use the Azure Data Factory service to compose data integration workflows to copy/move data using the Copy Activity and process data using Pig or Hive scripts on an Azure HDInsight cluster using the HDInsight Activity.

To deploy the sample:

  • Select the storage account from the drop-down list that you want to use with the sample.
  • Select the database server and database that you want to use with the sample.
  • Enter username and password to access the database
  • Click the Create button.

The deployment process does the following:

  • Uploads sample data to your Azure storage
  • Creates a table in the Azure SQL database
  • Deploys linked services, tables, and pipelines to run the sample.

An on-demand HDInsight linked service is used in this sample, which creates a one-node on-demand HDInsight cluster to run Pig and Hive scripts and is deleted after the processing is completed.

After the deployment is complete, you can monitor end-to-end data integration workflow using the diagram view and use the monitoring features of the Microsoft Azure Portal to monitor datasets and pipelines.

NOTE: there are costs associated with transferring the data and processing the data with an on-demand HDInsight Cluster. See HDInsight Pricing and Data Transfer Pricing for details.

For more details about this sample, see this tutorial on Azure.com.