Connecting the Logs API to ClickHouse

allows you to work with non-aggregated statistical data from Yandex.Metrica that you receive via the Logs API. To connect the Logs API to ClickHouse:

  1. Download the Python integration script. You can use the git clone command to do this.

    git clone https://github.com/yndx-metrika/logs_api_integration.git
  2. Make changes to the config file (config.json) that is located in the configs directory:

    {
        "token" : "<your_token>", // access token for the Yandex.Metrica API
        "counter_id": "<your_counter_id>", // counter number
        "visits_fields": [ // list of session parameters
            "ym:s:counterID",
            "ym:s:dateTime",
            "ym:s:date",
            "ym:s:firstPartyCookie"
        ],
        "hits_fields": [ // list of hit parameters
            "ym:pv:counterID",
            "ym:pv:dateTime",
            "ym:pv:date",
            "ym:pv:firstPartyCookie"
        ],
        "log_level": "INFO", // logging level
        "retries": 1, // number of attempts to restart the script after an error
        "retries_delay": 60, // interval between attempts
        "clickhouse": {
            "host": "http://localhost:8123", // address of a running instance of ClickHouse
            "user": "", // username for accessing the database
            "password": "", // password for accessing the database
            "visits_table": "visits_all", // name of the table for storing sessions
            "hits_table": "hits_all", // name of the table for storing hits
            "database": "default" // name of the database for tables
        }
    }
  3. Start the script. When you run the script, you must use the -source option to specify the data source (pageviews or sessions). The script has several modes available:

    • history — Loads all data from the date when the Yandex.Metrica counter was created until the day before yesterday.
    • regular— Loads data for the day before yesterday (we recommend this mode for regular downloads).
    • regular_early — Loads data for yesterday.

    Example of running the program:

    
    python metrica_logs_api.py -mode history -source visits

    In addition, you can get data for a specific time period:

    
    python metrica_logs_api.py -source hits -start_date 2016-10-10 -end_date 2016-10-18