Pulling data from Facebook (so far)

This will be a post that will (hopefully) evolve over time.  I’ve been trying to find an easy way to pull page level comments, replies, likes, reactions, etc from my company page to put into our database.  I know that Facebook has stats that they are currently sharing with admins for pages, but we’re looking for specific stuff, and honestly, at this point, I’m digging for stuff to see what’s available.  Shout out to Jon Spears at Pragmatic Works for suffering through this with me :)!

PLEASE tell me if I am catastrophically incorrect or misinformed about anything in this article.  Keep in mind the posted date because Facebook stuff changes daily.

Up front info: 

  • I’m talking about a lot of Pragmatic Works and their components – I have never worked for them, I just really like their products.
  • I’m going to tell you up front – this is not fun, nor is it as easy as one would think.
  • Facebook’s Graph API is going to show you more than you might get once you make your call or try to pull the data.  It’s very forgiving.  The biggest issue I had was that I could see the FROM data in the API, but not when I pulled it in SSIS from the REST source.  Found out it might be because my connection manager and Rest Source might not be sending a PAGE level token which is essential to get From data.

Idea: Create an SSIS package that will allow me to pull data related to posts that we make on our facebook page.  I was specifically looking for Comments, their authors, likes/reactions to the post, and comments on the comments – I’ll refer to these as replies for my own smoking brain.

The first rule of Facebook API is, you don’t talk about Facebook API.  No, seriously.  I feel like I’ve jumped down the rabbit hole trying to get a solid answer.  I think this is one of those things that if you’re not using it every single day,  you’re going to miss something.  Kind of like learning a language.

1st Rule: In order to access the data,  you need a facebook app to do the accessing for you.  I’m not going to go into how to do that here, there are plenty of posts you can find on how to set that up, and truthfully, by the time I hit publish, it will probably change.

2nd Rule: Don’t settle for attacking it one way.  I found out quickly that there are certain things you can do and certain pieces of data you can get right off the bat as well as what you can’t get.  Recently, they changed the way you access Page data in that you can’t get FROM data unless you are using a Page token to access it.  The From data is who posted the info, and what is their id.  This is at all levels it seems when you’re talking about a post.

Info to know: Facebook data is in JSON format, meaning that you have nodes of data.  Again, I’ll stick this in here – I’m only pretending to know what I’m talking about or referring to my own understanding.  This rabbit hole is deep and has many tunnels.  Essentially, I’m trying to pull JSON data so that I can parse it and drop it into tables laid out for our own consumption.

Steps:

  1. Starting Point: Create a Facebook developer app, and make sure you’re id is the admin for the page you’re trying to access.  After you have that app in hand, go to the Graph API explorer on the facebook developer site.
  2. Graph API: You’re going to live here if you don’t know JSON well (like me) or you want to try things out to show a client what is available.  Again, I’m sure there are millions of tutorials on how to use it, but I’ll keep my points short.
    1. if you go to the page and hit submit, you’ll see your own information.  It defaults to id and name fields.  The left side allows you to choose more fields if you are looking to do that.
    2. In order to get page level data, you have to change it from you to the page you want to see.  In order to do that, you need to choose your app from the Graph API Explorer button and then choose the Page Access Token from the Get Token button.  If you notice when you hit Submit, it is still showing ID and Name fields, but it’s for the page you’ve connected to.  me?fields=id,name <- the same syntax brings back the same data but pertaining to the object you’re looking at.  In this case, when I picked the Page Access token, I’m now looking at my company’s page data, and not data about me.
    3. Choose the information that you want on the left side.  I chose to view posts, comments on that post, and the author information so the information in the field next to the submit button is – me?fields=posts.limit(1){message,created_time,comments.limit(2)}.  I want 1 post, the message, when it was created, and the latest 2 comments.  When choosing a field, if it has subfields (sub-nodes?) they’ll populate automatically.  If you choose specific fields like I did for posts (message, created_time), you’ll only get those fields.  I didn’t ask for anything specific to comments, so it gave me what it defaults to, which in this case is the From information – who responded, and what is their FB id.
    4. So I have the information that I want laying out nice and pretty, and I want to ‘get’ it.  The easiest way to test if you can get it properly is if you click on the Get Code button at the bottom, choose the cURL syntax and copy it to notepad ++.  Just pull the HTTP along with the long token and plug it into a browser.  Hopefully, you’re seeing a browser page with nicely formatted JSON data.
    5. Getting to this point was like pulling teeth. 🙂  I wish you luck!
  3. Getting the data:  For my initial stab at this, I am using SSIS, specifically the components that come with Pragmatic Works’ Task Factory.
    1. At first, I used a REST Source with a profile file that I got from PW to help me connect.  This is where you’ll use the information from the FB App you’ve already set up.  The connection manager allows you to connect to the API and make a call using GET to pull the data you need.  At first, I was using just a simple statement but realized that with the change FB made to how to access certain levels of data, it’s necessary to drop in the page level token.  I have a feeling this is going to be an issue if I change my FB password or it expires, but I just want to get good data.  I’ll work on the refinements later.  
    2. I’ll add the step by step here later.  I need to get some work done.
    3. I know there’s a way to get the data with Python, but I’m barely past the HelloWorld stage.  From what I learned jumping around looking for info, this might be the easier way to grab it and parse it out into flat files and use SSIS to parse them out into tables.
  4. Learned the hard way: Start small.  When you navigate through so many cobwebs, you tend to get lost.  Start with pulling post information, get that set, and then step down through stuff.  Jon@PW was super helpful in that we discovered it might be easier to use a task to pull comment ids for posts and throw them through a for loop and use a different get command to pull comment specific data.

Questions that need answering:

  1. how are posts and comments connected?
    1. if you look at the comment level id field, it’s a concatenation of postid_commentid.
  2. how are comments and replies connected?
    1. from what I can see, the reply level id is a concatenation of the original post and the replies comment id.  There’s no connection from reply to comment, only back to the original post.  Makes it tough to see what reply is related to what comment.
  3. How can I see the object_id of something deeper than a post?
    1. I’ve tried to delve into object_id for a comment thinking it’s the way to connect to it’s own replies, but can’t see how to get it.  It’s most likely in the syntax of how I’m asking.
  4. How is accessing a node different than accessing an edge?

Leave a Reply

Your email address will not be published. Required fields are marked *