It all started out so simply. You are a responsible Compliance Officer, a hard-working Operations professional or a business savvy IT manager and you always pitch in. As the firm grew, the data needs grew and you gladly offered to apply your knowledge to help ensure the firm had the information it needed. The firm was fairly small, and the basic data needed was typed into a spreadsheet.

Eventually, the firm grew and added a feed for both equities and fixed income.
The firm needed better issuer data, so you added a feed for that. Then you needed better parent/child information, so that was added too.
The PMs were not happy with the pricing, so the firm now has two different feeds, one from IDC and one from Bloomberg.
The firm has two new Socially Responsible Investing funds coming on line shortly, so the firm is scrambling to get a feed for that.
Security data is coming in from Bloomberg, but the mortgage backed guys want different yield numbers;
And the Muni team wanted the S&P Thomson Reuters MMD feeds;
Back to equity—they needed S&P sectors but now they want the Russell sectors;
And mortgage backed guys have gone back to their spreadsheets so they can key in their own yields.

Costs keep climbing, so now the firm wants to do a data consolidation project and name a Data Steward.

Congratulations--this person will report to you and all data issues are now your problem.

Where do you start? Data consolidation projects typically start with an inventory of every data field existing anywhere in the firm. Junior people are then assigned to try to make sense of the thousands of fields and make recommendations on what fields are redundant what the firm can live without.

Two years later, a new data warehouse emerges-- with data that the line of business rejects.

A Practical Approach

Rather than starting with a massive inventory of each data field, start with the data stream for each line of business and how it is used. For this exercise, we borrow the process flow from manufacturing, repackaged as Six Sigma methodology (and probably several others):

Supplier --> Input --> Process --> Output --> Consumer

Supplier: What entity is supplying the data? Is it a vendor (check your invoices) or is it an internal source? Internal sources can be groups like Credit, who may produce a list of approved issuers. Include all Suppliers, even if the information is manually typed in from looking at a Bloomberg display or relayed verbally or via email.

Input: What is being supplied? At this point, don’t try to go field by field, just capture the high-level information, such as “equity security set-up information.” You can capture the individual fields later.

Process: What happens to the data? Is it imported into the OMS, into the Accounting System or being used for modeling by the PMs?

Output: What comes out? In this case, compliance results, trades and, for new securities, an update downstream to the portfolio management system that houses the positions.

Consumer: What parties consume the data? There will be internal parties, like Compliance and Client Reporting departments, and external parties, such as custodians.

Once the major streams are documented, stop in to see your CFO or Controller and find out what the feeds are costing the firm. The cost is generally higher than most people would guess, and it will give you a framework for what the firm can pay. Ideally, you will work within this framework to improve the data and stay within, or possibly lower, the overall cost.

Meet with the lines of business and find out the business justification for each one of the feeds. Find out if any of those feeds are supplemented or altered once the data is property of the firm. If they are, try to get documentation for exactly what happens to it. (This can be difficult if data is being transformed via a script or custom stored procedure.) If the data transformation is not transparent, that is a red flag that the process may be out of control.

Blaming the Data Monster

Want to learn more?