externalDataSources
object[]
{ externalDataSources: [ { dataSourceId: 'your_data_source_id', type: 'googleanalytics'|'csv', // if type is "googleanalytics" metrics: ['ga:metric'], startDate: 'startdate', endDate: 'enddate', samplingLevel: 'DEFAULT', credentials: { type: 'service_account', client_email: 'client_email', private_key: 'privatekey', viewIds: ['view_id'], }, }, ], }
About this parameter
Defines external data sources you want to retrieve during every crawl and make available to your extractor function.
There are two supported data sources: Google Analytics and CSV files.
Once you set up an externalDataSource
, it’s exposed in your recordExtractor
. You can access it through the dataSources
object, which has the following structure:
1
2
3
4
{
dataSourceId1: { data1: 'val1', data2: 'val2' },
dataSourceId2: { data1: 'val1', data2: 'val2' },
}
You can add a maximum of 10 sources, which combined can provide a maximum of 11 millions URLs.
Examples
Adding a CSV to your externalDataSources
1
2
3
4
5
6
7
8
9
10
11
12
externalDataSources: [
{
dataSourceId: 'myPageviews',
type: 'csv',
url: 'http://www.example.com/pageviews.csv',
},
{
dataSourceId: 'myCSV',
type: 'csv',
url: 'http://www.example.com/website-data.csv',
},
],
Adding a GoogleAnalyticsto your externalDataSources
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
{
externalDataSources: [
{
dataSourceId: 'myAnalytics',
type: 'googleanalytics',
metrics: ['google_analytics_metric1', 'google_analytics_metric2', ...],
startDate: 'start_date',
endDate: 'end_date',
samplingLevel: 'DEFAULT',
credentials: {
type: 'account_type',
client_email: 'example@my-project.iam.gserviceaccount.com',
private_key:
'your_google_analytics_private_key',
viewIds: ['target_view_id1', 'target_view_id2', ...],
},
},
],
}
Parameters
externalDataSource
|
An external data source object from the provided list. |
externalDataSource ➔ dataSource
dataSourceID
|
type: string
Required
Each external data source must have a unique identifier |
type
|
type: string
Required
Type of data source. We support the following values: |
metrics
|
type: string[]
Required if type is 'googleanalytics'
List of metrics to fetch from Google Analytics for each URL. E.g., We make the value of each metric available through the Note: We automatically include the |
startData
|
type: string
default: 7daysAgo
Optional
The start date of the period from which the analytics should be fetched. Its format should comply with ISO 8601. Google Analytics also supports values like |
endDate
|
type: string
default: today
Optional
The end date of the period for which the analytics should be fetched. Its format should comply with ISO 8601. Google Analytics also supports values like |
samplingLevel
|
type: string|enum
default: DEFAULT
Optional
The Google Analytics sampling level to use for your analytics data. We support the following values: |
credentials
|
type: Object
default: false
if type is `googleanalytics`
Your Google Analytics credentials. |
externalDataSource ➔ credentials ➔ credentials
type
|
type: string
Required
Type of authentication mechanism. So far, |
client_email
|
type: string
Required
Client email provided after creating a service account. It must have been given |
private_key
|
type: string
Required
Private key provided after creating a service account. |
viewIds
|
type: string[]
default: all credential accessible views
Optional
List of Google Analytics view identifiers to fetch data from. |