Personally, while I thinking about data source which is one of the most difficult parts and that it takes more time when making a forecast is, without doubt, the collection of valid and reliable data. The data processing staff is familiar with the expression “garbage in, garbage in the garbage “ ( GIGO: garbage in garbage óut ). This phrase also applies to forecasts. When you forecast, you should take into account that a forecast cannot be more accurate than the data on which it is based. The more complex forecasting model will fail if it is applied to unreliable or unrealistic data.
With the advent of computing, day after day an infinity of information is generated in all areas of life. The difficult task faced by those who predict is how to find relevant data to help solve their specific decision-making problems.
To determine if the data will be useful, some authors propose applying four criteria:
1. The data must be reliable and accurate. Adequate care must be taken when collecting data, which are of a reliable fluency and with due attention to its accuracy.
2. The data must be relevant. They must be representative of the circumstances for which they will be used. The data that represent the representation of the economic activity must show the ups and downs in accordance with the cyclical fluctuations in the historical past. There are free databases where this information can be obtained.
3. The data must be consistent. It is always necessary to be careful that the data is uniform, adjustments must be made to maintain consistency in historical patterns.
4. The data must be periodic. The data that is collected, and compiled and published based on a periodicity will be of great value to the forecaster.
Data sources, as in any field of scientific research, can be classified into primary and secondary. Secondary sources of data are data already published, collected for purposes other than those that the specific forecast or research needs to have at hand. This type of data can be classified in turn as coming from internal sources, originated within the organization, or from external sources, generated outside of it. Publications based on censuses are good examples of external secondary sources. Frequently, accounting records are used as sources of internal secondary data.
Secondary sources of data are already published data, collected for purposes other than those that the specific forecast or research needs to have at hand.
The primary data sources comprise all the methods of the original data collection. It is common for this type of data to be collected through sampling procedures, panel surveys or a complete census of elements of interest. Even more common is the weekly, monthly, quarterly or annual record of the key variables of the company.
Such time series variables are often the focus of attention of the administration.
The difference between the primary and secondary sources of data is significant in the sense that a primary source is more likely to contain more complete and accurate data that would be found in a secondary source. On the other hand, primary data tend to be more expensive than secondary data.
In recent years, the amount of published data sources available to forecasters has been greatly increased. The proliferation of the computer is partially responsible for this increase in data availability. The fact that businessmen and the government realized that greater and better information increases the effectiveness of planning and decision making has also contributed to this increase. Libraries are filled with billions of pieces of historical data on any subject imaginable. Government organizations, computerized services and non-profit organizations generate a huge amount of statistical data that can be used as inputs for the forecasting process.
The US government is the largest publisher and data collector in the world. Each large city in the US has at least one library designated as a government repository. These libraries have a large collection of accumulated statistical data.
The Bureau of Economic Analysis (BEA) is an important source of data for organizations that perform long-term planning. This department provides basic information regarding inflation, economic growth, regional development and the role of the nation in the world economy. The BEA publishes the Survey of Current Business, a monthly magazine that provides estimates and analysis of US economic activity. The magazine also has two statistical sections that present a composition of economic data from public and private sources. The section of Cyclic Indicators of the Businesses consists of tables of around 270 series and diagrams of approximately 130 series that are widely used in the analysis of current cyclical developments. The Current Business Statistics section consists of tables for more than 1900 series that cover business activities in general and specific industries.
Another source of government data and which we can access are the censuses. Each census provides detailed information by geographic area, for various demographic characteristics such as sex, age, income, marital status, and schooling. These data can be extremely valuable for those companies related to the forecast of time series that depend on said demographic characteristics. The census data are also valuable for forecasters who carry out feasibility studies. For example, will a proposed mall work for a geographic area? Will the area around the shopping center be interested in an athletic club? Census data is often used to answer these types of questions.