Advanced HubSpot Blog Export With Custom Fields

Written by BJ Szyjakowski on 29 Jul, 2021

Export any HubSpot blog with all or any of its custom fields as a csv.

I've recently run into a slew of projects requiring me to either convert old blogs into hubdb dynamic pages or consolidate blogs from multiple portals into a single portal. In both cases I found that the native blog export tool never quite did the job I was looking for. It exports all of the blogs (all of the posts across all of the blogs) and it doesnt export custom fields.

This caused me to roll out my own solution for exporting blogs into a CSV format that can pretty easily be utilized on any HubSpot account.

Store Blogs as JSON Objects

This step is pretty straight forward but also is the most time consuming portion of this project. The general idea is that we are going to create a module that loops through all of the blog posts(like any listing page) and stores all of the blog post's data as an array of JSON objects that we can later access via an ajax call.

This module will then be added to the blog's listing page

module.html (Blog Listing JSON Module)

{# 
  Will only render contents of if statement
  if the url has a query parameter of ajax=1
  ex: https://www.example.com/blog?ajax=1
 #}
{% if request.query_dict.ajax == 1 %}

{% require_js%}

{# 
  Tells the browser that this js is really only 
  used as a template and isnt containting code that needs
  to be run. Note the ID -- we will need that later
 #}
<script type="text/template" id="export-data">[

  {# 
    Loop through all of the blog posts
  #}
  {% for post in contents %}
  
  {%- set post_url = post.absolute_url|split('/') -%}
  
  
  {# 
    Start building json. Note that we are using JS and Hubl
    together in order to create our objects.
  #}
  {"id":"{{post.id}}",
  "name": "{{post.title|replace('"', '\\"')}}",
  "title":"{{post.title|replace('"', '\\"')}}",
  "bio": {{post.widgets.body_text.body.html|regex_replace('({{)(.*)(}})', "")|tojson}},
  "job_title": "{{post.widgets.job_title.body.value}}",
  "slug": "{{post_url[-1]}}",
  "industries": "{{post.widgets.industries.body.value|replace(',','')|replace(' | ', ",")|replace('| ', ",")|replace(' |', ",")|replace('|', ",")}}",
  "employee_photo": "{{post.widgets.employee_photo.body.src}}",
  "employee_thumbnail": "{{post.widgets.employee_thumbnail.body.src}}",
  "departments": "{{post.widgets.departments.body.value|replace(' | ', ",")|replace('| ', ",")|replace(' |', ",")|replace('|', ",")}}",
  "sort_order": "{{post.widgets.sort_order.body.value}}",
  "location": "{{post.widgets.location.body.value}}",
  "insurance_lines": "{{post.widgets.insurance_line.body.value|replace(' | ', ",")|replace('| ', ",")|replace(' |', ",")|replace('|', ",")}}",
  "intro_text": {{post.widgets.intro_text.body.html|regex_replace('({{)(.*)(}})', "")|tojson}},
  "office_phone": "{{post.widgets.office_phone.body.value}}",
   "cell_phone": "{{post.widgets.cell_phone.body.value}}",
   "fax": "{{post.widgets.fax.body.value}}",
   "email": "{{post.widgets.email_address.body.value}}",
   "contact_text": {{post.widgets.contact_information.body.html|regex_replace('({{)(.*)(}})', "")|tojson}},
   "author_slug": "{{post.widgets.author_slug.body.value}}",
   "twitter": "{{post.widgets.social_twitter.body.value}}",
   "google": "{{post.widgets.social_google.body.value}}",
   "facebook": "{{post.widgets.social_facebook.body.value}}",
   "linkedIn": "{{post.widgets.social_linkedin.body.value}}",
   "executive_order": "{{post.widgets.executive_order.body.value}}",
   "client_service_order": "{{post.widgets.client_service_order.body.value}}",
   "video_url": "{{post.widgets.video_url.body.value}}",
   "video_link_text": "{{post.widgets.video_link_text.body.value}}"
  }{{',' if not loop.last}}
{% endfor %}
 ]</script>
{% end_require_js %}
{% endif %}

Normally I would slim this example way down but there is a lot of value in seeing some of the filters that I am using for some issues that you may be experiencing. The main point of this step is that we need to convert all of the Hubl data into valid JSON. VALID JSON.

It can be a bit of a beast to get the hubl data, escpecailly from rich text fields, to format the correct way.

|tojson Filter is a pretty huge help when it comes to escaping quotes and special characters so that you can throw html into a value safely but sometimes it seems that the naitive |tojson filter doesnt quite behave as expected. In those instances I find that I need to use |regex_replace() or |replace() to get the job done.

Access the JSON Objects and Convert to CSV

The next step is to create a page that will act as our export hub and make an ajax call from it to our blog listing page with the correct query parameter. Once we have the blog JSON array we will convert it into a csv directly in the browser and download the file.

In order to easily parse the JSON into a CSV format I going to use PapaParse. This is technically an npm package and I dont think it's hosted on a cdn anywhere so you will need to download it and upload it to your portal.

module.html (Blog Conversion Module)

<!-- Change this in order to change which blog is being accessed -->
{% set blog = <blogID> %}
{% set name = "<nameOfFileDownload>" %}
{{require_js(get_asset_url('rel/path/to/papaParse.js'), 'head') }}
<script>

  var this_hub_id = {{ hub_id }};
  var this_blog_url_all = '{{ blog_all_posts_url(blog)|replace('http:',request.scheme+':')  }}';
  
  
$(function() {
    var blogs;
    var url = this_blog_url_all;
    //console.log(url);

     var getBlogs = function() {      
        $('.loading-data').show();
        $.ajax({
            url: url + "?ajax=1",
            success: function (data) {
              //console.log("loaded!");

              
              var html=$('<div>'+data+'</div>');
              var export_data_html = $(html).find('#export-data').html();
              
              //console.log('export-data', export_data_html);
              
              var arr = $.parseJSON(export_data_html);
              var csv;
              blogs = arr;
              console.log('blogs', blogs);
              
              
              
             csv = Papa.unparse(blogs);
              //console.log(csv);

              function downloadCsv(csvString) {
                //console.log('running')
                var blob = new Blob([csvString]);
                if (window.navigator.msSaveOrOpenBlob){
                  window.navigator.msSaveBlob(blob, "filename.csv");
                }
                else {
                  var a = window.document.createElement("a");

                  a.href = window.URL.createObjectURL(blob, {
                    type: "text/plain"
                  });
                  a.download = "{{name}}.csv";
                  document.body.appendChild(a);
                  a.click();
                  document.body.removeChild(a);
                  //console.log('clicked')
                }
              }
              downloadCsv(csv);
            }
        });  
    } // end getblogs
     
     getBlogs();
});
</script>

You don't really need to worry about what is all going on with the JS here but we are basically just making an AJAX call to the blog listing page which now housed our JSON Module.

Our AJAX call looks for that script template (JSON Module created instep one) and parses it into a CSV file that is in turn automatically downlaoded when the page is visted.

The only items in this module that you need to worry about and edit are the two hubl variables set before the script tag. You can use them to set the blog ID and the filename for the CSV download.

As mentioned above all that you need to do is place this module on it's own blank page, publish the page, and then visit the published page and the csv download should happen automatically. Open that bad boy up and you should see all of your blog posts with all of the custom fields that you chose to export from your blog post template.

Troubleshooting

As mentioned earlier, this might not work on it's first go. If it doesnt the likely culprit is the array of JSON objects. The best way to trouble shoot this is to uncomment some of the console logs in the 'Conversion module'(2nd module) and look at the errors that are being logged so you can better find which field is failing and why it might be failing. Remember to make good use of the HubSpot filters in order to massage the fields to your heart's content.

If you need any additional help feel free to reach out on LinkedIn or the HubSpot slack channel!

Happy Exporting!