Datasets for Planning Online Advertising Using Gini Indices

Please cite the following paper if you use these datasets:

There are two batches of data.  Each batch is contained in a separate zip file.  Within each zip file, you will find multiple CSV files (one per instance used in the paper).

  • GINI_SMALL.zip (81KB) – These files pertain to the data described in Section 5.1 (“Data Instances”) of the paper, and referenced by Table 2.  They were all generated with 20 campaigns and 100 viewer types (hence the common file prefix 20C100V).  Note that sometimes two or more campaigns or viewer types that have the same targeting in the bipartite graph are merged in a post-processing step, resulting in slightly fewer than 20 campaigns and 100 viewer types in some instances.  TC0001 through TC0010 refer to the ten test cases that were analyzed for each type of instance (Loose, Globally Tight, Locally Tight) in Sections 5.2-5.5 of the paper.
  • GINI_LARGE.zip (216KB) – These files pertain to the data described in Section 6.5 (“Computational Results”) of the paper, and referenced by Tables 4 and 5.  As indicated in Table 5, there are three instance sizes: small is 100C100V, medium is 100C200V, and large is 100C500V, where the code xCyV denotes an instance generated with x campaigns and y viewer types.

Both batches of data are similar, in that they contain the following three files for each instance:

  • CAMPAIGN.CSV – Contains key campaign-related data.  The fields (in order) are the campaign identifier (paper notation: j), the impression goal (paper notation: demand d_j), and the shortfall penalty (paper notation: p_j).
  • VIEWERTYPE.CSV – Contains key audience segment-related data.  The fields (in order) are the audience segment identifier (paper notation: i), and the supply measured in impressions (paper notation: s_j)
  • TARGETING.CSV – Provides a list of (audience segment, campaign) pairs whose links in the underlying bipartite graph define the targeting of all campaigns in the instance.  The fields (in order) are the audience segment identifier (paper notation: i), the campaign identifier (paper notation: j), and finally a dummy column of all 1’s (to indicate the presence of a link in the bipartite graph).

In addition, the small batch used in Section 5.1 of the paper has a fourth file per instance:

  • SLACKPRICE.CSV – Contains additional audience segment-related data used to define the outside option price, as described in Section 5.5 and illustrated in Figure 7.  The fields (in order) are the audience segment identifier (paper notation: i), and the price of selling these impressions to an outside option (paper notation: \tilde{\beta_i}).

Notes:

  • Floating point quantities may be expressed using scientific notation, i.e., 2.80512E3 is equivalent to 2805.12.