September 2003
  Intelligent Systems
Table of Contents

SOFTWARE

Managing Your Data
Before, During, and After Testing

Setting up your test system is only half the battle—how you store the data and access them after testing is the other half.

Richard House, VI Technology, Inc.

figure

Properly planned for or not, every test system that involves taking measurements has some method for managing the acquired data. All too often, data management plans are an afterthought, resulting in misplaced or lost data files, and difficulties in determining meaningful unique file names. What initially seems like the simplest method of storing data—writing the data to a text file—often becomes the most difficult when later you need to run reports, access the data for analysis, and compare results from different test runs. Create a plan for managing your data before, during, and after execution of your test program, one that meets your needs for data integrity, data access, and report generation at some later date.

Before You Test
Before you’ve designed the measurement system, consider how you want to store your data, the types of data you will store, and the kinds of relationships you want between them.

 Proper Data Storage. Most developers opt to store their data in familiar and easy-to-use text files or spreadsheets—deceptively thought of as the lowest-cost method of storage. Once you’ve executed multiple tests and created many files, the inefficiencies and expenses of such storage methods become clear. Data files stored on the measurement systems hard drive are mistakenly deleted, people forget where a particular file was stored, or more catastrophically, the hard disk fails, taking your data with it. Test operators frequently don’t back up the measurement data, and when they do, they rarely check the backup systems for accurate recovery.

File naming conventions quickly break down as additional demands are placed on the measurement system. For example, the file name may start with the serial number of the device under test. A new demand may be to test each part number more than once, requiring the addition of a date and time stamp to the serial number in the file name to distinguish the files. Changes to the test plan can add and remove measurements, which may require changes in file names or locations to differentiate the new files from the old ones. Each new demand requires users of the file system to agree upon a new file nomenclature system, rename old files saved under the previous system, and change their measurement programs to save files using the new one.

Here’s a typical scenario. The developer writes the software for a simple file system, saving the data to a text file. But then someone else needs to access the data or create reports containing the data, so the developer then writes a second program to move the data from the file-based system to a database. Finally the developer ends up rewriting the original test program to pump future data directly into the database. This inefficient design approach is often more than twice as expensive as designing a database into the system up front, before testing begins.

With today’s off-the-shelf databases and test data management tools, storing data in a database is no more difficult or expensive than storing data in text-based files. Even sophisticated databases from Oracle and Microsoft that meet the needs of measurement applications cost less than the combined cost of off-the-shelf test development software packages and of most PCs used to control your measurement system. (See “Choosing the Right Database for Your Measurement Application” (below) for a summary of available databases and their related costs.)

Choosing the Right Database for Your Measurement Application
Several popular databases are cost-effective options for managing your measurement data. By estimating the number of simultaneous users, amount of data storage, and budget, you can choose the best database for your application.
Database Database Size Number of Users Price
Microsoft Access < 2 GB < 255 $200
Microsoft SQL Server
Desktop Edition
< 2 GB < 5 No Cost*
Microsoft SQL Server > 2 GB Unlimited $300/user**
Oracle > 2 GB Unlimited $300/user**
*License included with several Microsoft products
**SQL Server and Oracle have various versions and the pricing can vary greatly. The pricing listed is typical of the version of database software used in measurement applications.

 Types of Data. Plan in advance for the types of data that you want to store—this will mean fewer (and easier) changes to your database and application later.

There are two broad categories of data types to store. The first, measurement system data, is information about the sensors, measurement instrumentation, automated test station, and the test plan itself (e.g., the station name, the person who maintains the station, the measurement instrumentation brand and model number, the serial number of the measurement instrumentation, the name and type of sensors, the description of the test steps, and calibration data).

The second broad category, results data, is information about the device or substance measured, and measurement results (e.g., device serial number, material lot number, temperature, frequency, strain, waveform data, event count, signal max./min., dynamic range, and pass/fail results).

 Data Relationships. Once you know the types of data, you can then plan for relationships among them. You might want to record information, including calibration data, for several sensors used in the measurement system. You could then relate each sensor that acquired measurements to each device it tested, and vice versa. A relational database lets you can keep sensor information in one location and results data in a second location (see Figure 1); by relating the two, you can access all the information about the sensor, including the calibration data used on each measurement.

figure
Figure 1. By relating data in tables in a database, you can eliminate duplicate entries. In this example, three different devices under test are tested by the same sensor. The data about the tests are in one table and the information about the sensors is in a second table. By relating the two, the sensor information is not duplicated for each test.

Other types of relationships to consider are information about the operator making the measurement, the location of the measurement, and the measurement equipment used for the test. This can help you spot trends. From our previous relation examples, a report could show if there is a trend toward a larger number of failures based on the calibration date of the sensor, the location in which the measurement was made, the test operator, or what equipment was used.

 Database Considerations. Three important considerations when using a database to manage your data are storage location, the amount of data you will store, and the number and type of users who will need access.

You can store your data locally on the automated measurement system or in a centralized server. If the data are kept locally, you will always have access and you won’t have to acquire centralized server space from your IT department. However, if many people in diverse locations will need access, then store your data on a centralized server that is maintained and backed up.

Because different databases have different data storage limits, it is important to approximate how much data you will need to store over the life of the system. Storing small amounts of pass/fail data, for example, would let you use a lower-cost database system. Storing large arrays of waveform data and many measurement results will likely require a more sophisticated database capable of handling hundreds of gigabytes or even terabytes of data. Should your needs change over the life of the system, you can upgrade from a low-cost to a more sophisticated database readily and without data loss.

You should also estimate how many users will need access. To determine the total number of users, be sure to estimate the number of people retrieving data and the number of automated measurement systems storing data. If only a few measurement systems and people will need access, you can store your data in a lower-cost database. If you have multiple measurement systems sending data into a centralized database and potentially multiple people accessing the data, you’ll want a database that can handle simultaneous access from a larger number of users. Depending on the user’s job duties, he or she may need different access privileges, e.g., an operator who needs only to run the measurement system and store data in the database, a manager who needs to store and extract data for reports, and an administrator who needs unrestricted access to all aspects of the database, including the ability to delete data or change the way the database itself is organized.

During Testing
During test execution the main considerations are how well the system performs, data integrity, what format(s) the data will be stored in, and whether the system requires simultaneous operations, such as multiple people accessing data simultaneously while the measurement system is storing them.

 Performance. Your test data management system should not degrade the performance of your automated measurement application. For a nondistributed system, high-speed measurement systems could acquire data at a rate too fast for the database to store. In a distributed ýystem, both the speed of the network and the database have the potential to slow down high-speed measurements. The solution is to buffer the data if the database or network is too slow, thus ensuring that the measurement application runs at full speed.

 Data Integrity. For systems that send data over the network to a centralized database, failsafe data transfers are essential. Technologies such as Microsoft Message Queuing (MSMQ), IBM MQ Series, and Sun ONE Message Queue, as well as test data management software such as VI Technology Arendar can be used in the system to guarantee that data will be transferred from the measurement system to the centralized server. If the network fails, the system acknowledges that the data transfer is not complete and the system will resend the data and confirm their authenticity once the proper network connection has resumed.

Some systems are designed without proven failsafe technology. They store data locally on the measurement system. If the network goes down during the data transfer, these systems don’t have a way to tell the difference between a completely transferred data set and an incomplete one, or a complete but corrupted data set and an unaltered one, resulting in useless data on the centralized server. Simply copying the data across the network or programming the measurement system to automatically copy the data over the network is not sufficient to ensure data integrity. If you want data integrity, you have to develop your own failsafe data transfer technology, use one of the commercially available technologies, or use an integrated data management software package that provides failsafe data transfer.

 Standard Data Transfer Format. For distributed systems, the data need to be in a format that can be sent over the network. Both the measurement system and the server must be consistent in the data format. Today, the standard data format is extensible markup language (XML). Previously, all data formats were proprietary, so applications couldn’t read each other’s formats without some conversion application translating from one to another. A standard format such as XML lets you use a selection of third-party tools to format and analyze the data.

A test data management system using XML will take data acquired by the measurement system, convert them to XML, and send them to the centralized server. The server will convert the value back to its original measurement value and store it in the database. Most commercially available databases today can receive XML data, thereby simplifying the storage of data in the database (see Figure 2).

figure
Figure 2. XML is a standard data format and is used for transferring measurement data from applications to databases. The measurement application acquires the data and sends them to the database in XML format. Industry-leading databases receive XML formatted test data directly.

 Simultaneous Access. Consider how many people and measurement systems will need to simultaneously access the database during test execution. Databases are often sold or configured with a limited amount of user access and will block users in excess of some maximum number. Whële some databases don’t limit the number of simultaneous users, performance may slow considerably if there are too many. Remember that each automated measurement system and each user is considered a unique access (Figure 3).

figure
Figure 3. Measurement data stored in a centralized database can have multiple people and automated measurement systems accessing the data simultaneously. The number of accesses to the database is the sum of all the measurement systems and people accessing the database. The maximum possible number of accesses to the database is dependent on the database software and server hardware.

After Testing
After your measurement system has stored the data, the main considerations are how to extract them from the database and how to combine different data into reports.

 Extracting Data. Database tools and test data management software offer a range of data extraction options. For example, you can pull all the data or only selective data from any measurement into a spreadsheet or other commercially available analysis and presentation software. So if you’d stored pass/fail, serial number, and temperature data, you could query the database on the serial number and import the pass/fail and temperature data into Excel. The most common language used to pull data from the database is Structure Query Language (SQL). SQL is the American National Standards Institute (ANSI) standard language for operating upon relational databases. Alternatively, test data management software provides graphical, menu-driven interfaces for those who want to extract data from their measurements but don’t want to learn SQL.

 Combining Data. In addition to retrieving data from a particular test, a database system will allow you to retrieve and combine data from multiple tests.

The most common data extracted from the data management system are the number of units that failed by product. This helps companies determine where to put their efforts to increase production yield.

Frequently two or more different products are designed with a common component. In a centralized test data management system, the products may be tested on different test stations with each one storing the measurement data in the same centralized database. Even though the products are different, manufacturing and design engineers may want to combine data that extend across all products that use this component and show performance and reliability results for the component.

Plan Ahead for Your Next System
When developing your next measurement system, in addition to standard planning—for the right sensors, instrumentation, and automated test and measurement software—invest some time up front to plan your data management. Doing this before testing begins will save you time and money over the life span of the project, ensure data integrity and performance during test execution, and allow you to analyze and report on your data after test completion without the headaches of trying to piece together disparate text files.


Richard House is President, VI Technology, Inc., Austin, TX; 512-327-3348, x-201, richard.house@vi-tech.com.

MORE!
For further reading on this and related topics, see these Sensors articles.

"Data Acquisition Software—Choices, Choices, Choices", Part 1 and Part 2, September and October 2002
"Intelligent Wireless Condition-Based Maintenance," June 2002
"e-Manufacturing: The Shrinking World of Business," February 2002





 
E-NEWSLETTERS
SUBSCRIBE NOW!
Sensors Weekly
  What's New
  Product Picks





Questex Media
Home | Contact Us | Advertise
© 2009 Questex Media Group, Inc.. All rights reserved.
Reproduction in whole or in part is prohibited.
Please send any technical comments or questions to our webmaster.