Data owners -- Chief Data Officers, Chief Information Officers, Chief Operations Officers, Chief Marketing Officers or anyone ultimately responsible for “data” in an organization -- have many demands and challenges, yet have more opportunities and options that didn’t exist just a few years ago.
Collectively, this post will refer to all of these data owners as CDOs, recognizing that this is growing field and the title is not present in all organizations. The CDO role itself is designed to bridge a common gap between technology automation and the businesses ownership of data, and CDOs can often experience a wide difference in focus based on the data maturity, budgets and resources in their businesses.
First and foremost, CDOs are responsible for the quality of the data that the business – and their clients – rely upon. This is a hefty task in an industry where complete accuracy is paramount and the regulatory oversight burden continues to become stricter.
In some organizations with limited budgets, CDOs that we encounter are just trying to get the basics right. They are focused on identifying sources and uses of information, creating roles and responsibilities, and documenting procedures to provide a basic level of governance in their organizations. In one case the IT impact was as simple as locking down the hard drive housing the Excel spreadsheets that are used for corporate reporting.
As highlighted in PanoVista.co’s report here, most firms have or are in the process of establishing more mature levels of structured data cleansing and distribution. This may include centralizing pricing and security reference data, ensuring CRM (Client Relationship Management) databases have clean information, warehousing core operational data, or ensuring the marketing-> sales-> client onboarding-> client support process is seamless. These firms are typically able to support traditional BI (Business Intelligence) tools, or increasingly BYOBI (Bring Your Own BI) tools like Tableau or Qlik.
The firms that already have mature structured data infrastructures are now focused on governance tools that help enhance data transparency and workflow among the various data owners in an organization. For example, our partner Data3Sixty is a cloud-based data governance service that consolidates meta-data across diverse applications, provides glossaries from the sources to define primary enterprise definitions, assesses ongoing data quality metrics, and creates a social network of interested parties to monitor and correct data anomalies quickly.
It will be interesting to see how reference data services such as the recently announced Smartstream SPReD and emerging services such as Bloomberg Polarlake, Markit EDM Direct, RIMES Managed Data Services and OpenFinance will have on existing processes at many of these firms.
So far we’ve highlighted mostly to governance/compliance side of the CDO role. For many firms, this role is also responsible for innovation.
As noted, many firms have centralized components of their data with reference data tools and perhaps data warehouses. This is appropriate for data that is expensive to acquire and/or used by multiple systems, processes and reporting. Examples include prices, security reference, client data, positions and transactions, fundamental research data and performance returns.
However, many firms have more than one business line where it does not make sense to centralize data silos or warehouse all of the data, yet the wealth of information can potentially add significant insights into the health, risks and innovation opportunities for the business.
Using a federated or hybrid (centralized plus federated) model, we see firms increasingly either leaving the data in place or copying data into Hadoop or NoSql databases as-is. With either approach – as appropriately dictated based on the size, shape, and timeliness requirements – the data is blended when needed for reporting, advanced business intelligence or for Data Scientists to generate predictive and prescriptive analytics.
As a file system, Hadoop can store unstructured data that don’t fit into a relational model, such as internal research documents, emails and videos next to web and third party sourced data. Thought leading firms are now incorporating web-sourced text analytics and social predictive analytics into the investment research and trading process to identify alpha-generating signals.
Additionally, we are seeing firms enhancing research database performance using Hadoop HFDS (Hadoop Distributed File System), in-memory NoSql (Not Only SQL) databases and MPP (massively parallel processing) on commodity hardware to drastically improve query times compared to managing large RDBMS (Relational Database Management Systems).
New applications are increasingly running in the cloud. For example, the previously mentioned Data3Sixty runs on Azure. Our partner Alpha Vee runs on AWS (Amazon Web Services), where their service accelerates the investment product creation process with lightning fast global equity research, modeling, portfolio construction and management services. ETF providers in particular are drawn to the ability to create differentiated smart Beta and liquid alternative products quickly and cost effectively.
For retail oriented firms like mutual funds, ETFs, wealth management, insurance, and brokerage, the CDO needs to support the CMO teams’ need for combining internal and external demographic information to support segmentation, SEO and SMO (Search Engine Optimization, Social Media Optimization) with the latter requiring real time social feeds into predictive analytics for micro-segmenting prospects and the messages they receive. Our partner KeyInsite and their team of Data Scientist On-Demand are seeing increasing interest in the SMO space in addition to web-sourced investment research and sentiment analysis.
With most new applications available as cloud services, the set-up, deployment and ongoing total cost of ownership is a fraction of what it was just a few years ago using traditional technologies, along with maturing cloud information security models that can meet or exceed what many internal systems currently provide.
Ultimately, each business has its own journey and appetite for building a mature data infrastructure that can support governance, compliance and innovation. Fortunately, there are new resourcing models, open source tools, and deployment options that can help CDOs add real strategic value.
Collectively, this post will refer to all of these data owners as CDOs, recognizing that this is growing field and the title is not present in all organizations. The CDO role itself is designed to bridge a common gap between technology automation and the businesses ownership of data, and CDOs can often experience a wide difference in focus based on the data maturity, budgets and resources in their businesses.
First and foremost, CDOs are responsible for the quality of the data that the business – and their clients – rely upon. This is a hefty task in an industry where complete accuracy is paramount and the regulatory oversight burden continues to become stricter.
In some organizations with limited budgets, CDOs that we encounter are just trying to get the basics right. They are focused on identifying sources and uses of information, creating roles and responsibilities, and documenting procedures to provide a basic level of governance in their organizations. In one case the IT impact was as simple as locking down the hard drive housing the Excel spreadsheets that are used for corporate reporting.
As highlighted in PanoVista.co’s report here, most firms have or are in the process of establishing more mature levels of structured data cleansing and distribution. This may include centralizing pricing and security reference data, ensuring CRM (Client Relationship Management) databases have clean information, warehousing core operational data, or ensuring the marketing-> sales-> client onboarding-> client support process is seamless. These firms are typically able to support traditional BI (Business Intelligence) tools, or increasingly BYOBI (Bring Your Own BI) tools like Tableau or Qlik.
The firms that already have mature structured data infrastructures are now focused on governance tools that help enhance data transparency and workflow among the various data owners in an organization. For example, our partner Data3Sixty is a cloud-based data governance service that consolidates meta-data across diverse applications, provides glossaries from the sources to define primary enterprise definitions, assesses ongoing data quality metrics, and creates a social network of interested parties to monitor and correct data anomalies quickly.
It will be interesting to see how reference data services such as the recently announced Smartstream SPReD and emerging services such as Bloomberg Polarlake, Markit EDM Direct, RIMES Managed Data Services and OpenFinance will have on existing processes at many of these firms.
So far we’ve highlighted mostly to governance/compliance side of the CDO role. For many firms, this role is also responsible for innovation.
As noted, many firms have centralized components of their data with reference data tools and perhaps data warehouses. This is appropriate for data that is expensive to acquire and/or used by multiple systems, processes and reporting. Examples include prices, security reference, client data, positions and transactions, fundamental research data and performance returns.
However, many firms have more than one business line where it does not make sense to centralize data silos or warehouse all of the data, yet the wealth of information can potentially add significant insights into the health, risks and innovation opportunities for the business.
Using a federated or hybrid (centralized plus federated) model, we see firms increasingly either leaving the data in place or copying data into Hadoop or NoSql databases as-is. With either approach – as appropriately dictated based on the size, shape, and timeliness requirements – the data is blended when needed for reporting, advanced business intelligence or for Data Scientists to generate predictive and prescriptive analytics.
As a file system, Hadoop can store unstructured data that don’t fit into a relational model, such as internal research documents, emails and videos next to web and third party sourced data. Thought leading firms are now incorporating web-sourced text analytics and social predictive analytics into the investment research and trading process to identify alpha-generating signals.
Additionally, we are seeing firms enhancing research database performance using Hadoop HFDS (Hadoop Distributed File System), in-memory NoSql (Not Only SQL) databases and MPP (massively parallel processing) on commodity hardware to drastically improve query times compared to managing large RDBMS (Relational Database Management Systems).
New applications are increasingly running in the cloud. For example, the previously mentioned Data3Sixty runs on Azure. Our partner Alpha Vee runs on AWS (Amazon Web Services), where their service accelerates the investment product creation process with lightning fast global equity research, modeling, portfolio construction and management services. ETF providers in particular are drawn to the ability to create differentiated smart Beta and liquid alternative products quickly and cost effectively.
For retail oriented firms like mutual funds, ETFs, wealth management, insurance, and brokerage, the CDO needs to support the CMO teams’ need for combining internal and external demographic information to support segmentation, SEO and SMO (Search Engine Optimization, Social Media Optimization) with the latter requiring real time social feeds into predictive analytics for micro-segmenting prospects and the messages they receive. Our partner KeyInsite and their team of Data Scientist On-Demand are seeing increasing interest in the SMO space in addition to web-sourced investment research and sentiment analysis.
With most new applications available as cloud services, the set-up, deployment and ongoing total cost of ownership is a fraction of what it was just a few years ago using traditional technologies, along with maturing cloud information security models that can meet or exceed what many internal systems currently provide.
Ultimately, each business has its own journey and appetite for building a mature data infrastructure that can support governance, compliance and innovation. Fortunately, there are new resourcing models, open source tools, and deployment options that can help CDOs add real strategic value.