How to compare multiple files?

The context

For a project, we had to create over 100 virtual machines (VM), each one with specific caracteristics: OS, # CPU, RAM, # disks, disk space... About 1/3 VM are Windows, 2/3 Linux.
I maintain a MS Access Database with the requested VM.
The project is near ending; for quality control, I need to validate what is delivered vs. what was requested. My sysadmins went on almost(*) all the VM to get the caracteristics; the result is 2 text files, one for Windows and one for Linux.
*: This is important. I don't have all the VM yet, hence I will have to do this process again; this have to be taken into account in my solution.

How to compare the data?

Method 1: by hand

Boring, tedious and highly error prone. Forgotten.

Method 2: using Excel

I tried that first. But I'm not a king on Excel, and because the caracteristics are quite different between Linux and Windows, is't hard to make comparisons. Abandoned.

Method 3: using Access

This is actually the most logical way, because my initial source of information is already in Access. Moreover, I do SQL for many years, so I feel comfortable to compare the servers with it (despite Access's SQL is somehow different from standard).
However, because the caracteristics are different between each server, a import from the text file did not yield good results.
But wait! Access is able to manage XML... let's try that.

What I've done?

  1. I've merged the 2 text files
  2. I've transform the raw text file into XML with a basic text editor; tedious, and I made a lot of errors, but with the help of online XML syntax checker, I've ended with a clean file.
  3. Import the XML to Access: encountered a first problem
Problem #1: attributes are badly managed
My XML file was similar to this:
<servers>
  <server kernel="..." name="..." os="..." ..>
    <filesystems>
       <fs mount="/dev" name="devtmpfs" size="5.8G" ... >
       ...
    </filesystems>
   </server>
   ...
</servers>
However, using attributes was a bad idea: Access doesn't seem able to manage them and I ended with empty fields in the result table:
To overcome this, I turned the attributes into elements (easy, using the Find & Replace function of Notepad):
<servers>
 <server> <name>...</name> <os>...</os> <ci>...</ci> ...
   <filesystems>
     <fs> <name>devtmpfs</name> <size>12G</size> <mount>/dev</mount> ... </fs>
     ....
   </filesystems>
 </server>
 ....
<servers>
Problem #2: no common key between the generated tables
Access ended with 2 tables: server and filesystems. However, no primary key is defined in server, but worse no foreign key is defined in filesystems!
The solution: transforming the XML during the import. Access allows to specify a XSLT file to transform the XML.
XML source --> XSLT transformation --> Access tables
Here is my XSLT:
<xml version="1.0" encoding="UTF-8"?>
  <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output indent="yes" omit-xml-declaration="yes" />
    <xsl:strip-space elements="*" />
    <xsl:template match="node()">
      <xsl:copy>
        <xsl:apply-templates select="node()">
      </xsl:copy>
    </xsl:template>
    <xsl:template match="fs">
      <fs>
        <ci><xsl:apply-templates select="../ci"></ci>
      </fs>
    </xsl:template>
  </xsl:stylesheet >

Update: gathering data with Ansible

I now have access to an Ansible console; this will allow me to gather the data on a regular basis. Sadly, Ansible output format is JSON, and not XML. However, JSON can easily be transformed to XML with online tools such as https://www.freeformatter.com/json-to-xml-converter.html

Comments

Popular Posts