Welcome


Welcome to Understanding Link Analysis. The purpose of my site is to discuss the methods behind leveraging visual analytics to discover answers and patterns buried within data sets.

Visual analytics provides a proactive response to threats and risks by holistically examining information. As opposed to traditional data mining, by visualizing information, patterns of activity that run contrary to normal activity surface within very few occurances.

We can dive into thousands of insurance fraud claims to discover clusters of interrelated parties involved in a staged accident ring.

We can examine months of burglary reports to find a pattern leading back to a suspect.

With the new generation of visualization software our team is developing, we can dive into massive data sets and visually find new trends, patterns and threats that would take hours or days using conventional data mining.

The eye processes information much more rapidly when information is presented as images, this has been true since children started learning to read. As our instinct develops over time so does our ability to process complex concepts through visual identification. This is the power of visual analysis that I focus on in my site.

All information and data used in articles on this site is randomly generated with no relation to actual individuals or companies.

Creating Timelines From RSS News Feeds

Those who analyze worldwide events for threat analysis are constantly plugged in to multiple news and information sources. When a threat occurs somewhere in the world, capturing and analyzing the information being received from your sources can be a daunting task.

One way to organize this information is to set up an automated process for capturing and time lining the relevant data for real time analysis. In this article I am going to discuss one of the methods for capturing information from multiple news sources and setting up an import specification to bring the data into a visualization utilizing link analysis software and Microsoft desktop components.

Capturing Data:

I am going to start with two different news feeds, one from ABS-CBN in Southeast Asia and another from BBC in the United Kingdom. Ultimately you can set up RSS feed captures on as many news sources as you like.

The first step is to capture the xml schema from the news source RSS feeds. Really Simple Syndication is a family of Web feed formats used to publish frequently updated works—such as blog entries, news headlines, audio, and video—in a standardized format. An RSS document (which is called a "feed", "web feed", or "channel") includes full or summarized text, plus metadata such as publishing dates and authorship. Almost all news sources incorporate RSS feeds in their site allowing users to subscribe to the feed.

Lets start by capturing the schema in ABS-CBN and bringing the data into Microsoft Excel. First I am going to locate the RSS page on ABS-CBN's web site. As you see below, the RSS location is found on the RSS logo:










I am going to click on the logo to bring up the RSS page for ABS-CBN. Once I have landed on the ABS-CBN RSS page located at http://www.abs-cbnnews.com/rss.xml I view the page source information by clicking view on my browsers menu and selecting "View Page Source". This brings up the xml schema for the RSS feed page that I am going to copy and paste into Microsoft Excel to bring in the pages data into a spreadsheet allowing me to set up an import specification for my visualization software.















The XML schema contains the data source for the news feed along with column and delineation information that excel can use to organize the information in a spreadsheet. Once I have this page open with the schema viewed I am going to select save on the xml window and save the schema to my local computer.

















My next step is to open Microsoft Excel and bring the XML schema into my spreadsheet. I accomplish this by going to Excel and selecting the Data tab then select "Other Data Sources" for the source menu. Then I am going to select "From XML Data Import" from the Other Data Sources sub-menu.














Once the XML data source is selected, Microsoft Excel is going to prompt for an XML file location as the data source. I am going to select the ABS CBN.xml file I saved from my browser. Once I have selected the XML schema file, excel is going to import the data into a spreadsheet. Keep in mind that every news source has a different RSS schema and displays different information but for the most part all show basic information that we are going to utilize for our timeline.

In this particular case, ABS-CBN provides several fields we can utilize; A one sentence summary of the news article; a publication date and time that the story was posted and a source which is going to our theme line identity for the time line. The only thing lacking from ABS-CBN's RSS feed is a narrative column which BBC provides, this column is used in the description field of the visual import to provide some additional information aside from the summary.













I put the ABS-CBN xml into tab 1 in my spreadsheet and now I am going to navigate to BBC and follow the same steps from above placing the XML into tab 2 of the same workbook. From the example below you will notice that BBC uses different column setups but is similar to the ABS-CBN data however it adds the description information.











We have now captured two different news sources that we are going to import into visualization software to create a time line based on the information being provided. Keep in mind you can create as many XML feeds in excel for as many different sources as you like and refresh the entire workbook at one time.

Importing RSS Data Into Visualization Software:

Now that I have set up my RSS feeds in Microsoft Excel and I am going to save my workbook on my local computer and begin setting up an import specification to bring the data into my visualization software creating a time bound theme line for the news stories.

In real life, I would be capturing data on a major event, such as a bombing, for specific incident or threat analysis. Regretfully today everyone is behaving themselves so we are going to be importing in garden variety news stories into a theme line but this should give you a good example of the process for leveraging and organizing news information for threat analysis.

I am going to open up i2 and begin building my import specification. This specification is going to differ from the others we have discussed in previous articles as the layout and format is going to revolve around time binding events for a theme line.

When I select my RSS feed Excel workbook as my data source for my import you can see that ABS CBN is located on the first tab and the BBC is located on the second. We are going to use both tabs, however since every news site has a different XML schema, thus different columns, each import spec is going to be different for each news source. The good news is, once an import spec is established for a specific news feed, you can reuse that import spec over and over again for new postings and imports.

















Even thought the data captured and columns may differ, the columns used in the import are going to be pretty standard. The first change in the import from the others is that I am selecting the theme line layout for events as my import layout.

















Next I am going to examine my columns and data to determine where to place them in my theme line. For both the ABS-CBN data and the BBC data, the posting date and time is going to assigned to the data and time in my import specification. Also in both, the source is going to be assigned to the theme line entity identity. This is going to create a new theme line for each news source and link the appropriate stories to the appropriate source.

















Next I need to set up the identity of my event frame. Just like every entity, the identity of the event frame needs to be a unique identifier for each story so that each change in the story will create a new event frame. I am going to assign the one line description in the news feed as the identity. As news articles will replicate when imported into Excel, this will ensure that only one event frame will be created for each unique news story.

For the BBC news feed, the RSS XML includes a story description which contains several lines of information about the story. The ABS-CBN also incorporates a description but embeds pictures and video in their feed which cannot be imported by excel so for ABS-CBN I am going to use the hyper link to the article in the description which is going to allow me to click on my chart and go to the story to see the full news description.

















Another standard field that is included in most news organization RSS feeds is a category field which is a very valuable field to bring into my visualization allowing me to search within my chart for specific categories. This is very important if multiple events are occurring in different parts of the world that you are analyzing. Each site interprets the category differently, in the case of BBC the category refers the geographic location of the story such as Asia, Africa or Europe. I am going to bring in the category field as an attribute to my event frame.
















This is going to create a search able attribute field on my visualization that I can use to isolate news stories to a certain area or event.

It is now time to run my import specification and see what we have. On a side note, I have been creating import specifications for 15 years and anytime I am creating a new specification from scratch I still rarely get it on the first try. I also constantly revisit old specifications and find new ways to improve on them. Even though you can save and reuse specs in your software, review them from time to time and see if there is any way to improve on them.

I am going to execute both my BBC and ABS-CBN import specification and bring them into the same chart.















As you can see, each news feed different slightly in the import specification but both clearly organize news stories into a time line for analysis. Each theme line is based on the news source and the category attribute can be used to isolate on specific stories from each theme line. This makes the process of news collection for threat analysis much easier and more organized.

At any point I can refresh each theme line in my chart with any new news stories in a two step process.

First I am going to update my Microsoft Excel spreadsheet containing the data by opening the workbook, selecting the data tab and selecting "refresh all" from the data sub menu. This is going to ping the data source in the XML schema and bring in any new data to each of the tabs from the corresponding news sources.













Next I am going to open my theme line chart, open my saved import specifications and run my BBC and ABS-CBN import spec again bringing in the new data to my theme line.
















This is where keeping all of your RSS news feeds in the same workbook comes in handy. In Excel you can refresh every RSS feed at one time, saving time when you need to bring in new data. For those who are good at writing macro's in excel, you can set up an automatic refresh within excel to pull in new data as well.

This best way to learn and refine your ability to visualize news stories is to experiment with different news sources and theme line layouts. There really is no one right way of accomplishing this task but in the end just remember to follow some basic rules for event analysis:

  • Ensure each theme line is assigned to identify the news source, otherwise you are not going to be able to refer back to the source or know where a specific story originated from.
  • Ensure you create a unique identity for each event frame based on the one line description from the feed. Otherwise you may either miss a story or replicate existing stories in your import.
  • Ensure that you use the post date and time as your date and time fields in your import to properly time line events. RSS feeds have several date fields in them such as publication date or posting date but you want to ensure you capture the stories event date. Review the data and find the appropriate field which indicates the date and time the story occurred, every RSS feed from news organizations contains one.
As always if you have any questions regarding this article or any other please feel free to write me at linkanalysis@gmail.com.