Skip to main content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
There are many sources for datasets and the below are just a sampling of data sets that may be found online. Most are freely available except where otherwise noted. The dataset sources on this page are intended to be of use to those students enrolled in applied statistics courses.
Academic Torrents Datasets
Researchers from the University of Massachusetts have launched a torrent site which allows academics to share papers and datasets. The Academic Torrents service is designed to facilitate storage of all the data used in research, including datasets as well as publications. One of the uses of the site (in addition to being useful for a group of editors to "seed" their own peer-reviewed published articles), is for large dataset delivery. Large dataset delivery can be supported by researchers in the field that have the dataset on their machine. A popular large dataset doesn't need to be housed centrally. Researchers can have part of the dataset they are working on and they can help host it together.
Academics can join the site and start sharing. The site currently indexes over 1.5 petabytes of data, including a recent copy of Wikipedia and NASA's map of Mars.
-posted on February 2, 2014
UC Irvine Machine Learning Repository
UCI Machine Learning Repository of Datasets
Nearly 300 datasets are maintained on the UC Irvine Machine Learning Repository site. View all datasets, see newest datasets, and view most popular datasets. Datasets are organized to browse alphabetically by subject but users can sort using the faceting tools on the left sidebar by attribute type, data type, subject area, the number of attributes, the number of instances, and format types.
-posted on March 5, 2014
Many text and numeric databases are available from Federal agencies for policy analysis and general research. Data.gov is an excellent resources for connecting to datasets.
Data.gov increases the ability of the public to easily find, download, and use datasets that are generated and held by the Federal Government. Data.gov provides descriptions of the Federal datasets (metadata), information about how to access the datasets, and tools that leverage government datasets. The data catalogs will continue to grow as datasets are added. Federal, Executive Branch data are included in the first version of Data.gov.
World Development Indicators
Use the World Development Indicators database (restricted to current Bentley students, faculty and staff) to create your own datasets.
This database contains information on 208 countries and 18 country groups for the years 1960 to 2006 (where data are available).
Organisation for Economic and Co-operative Development (OECD) Data Warehouse
Use the OECD.Stat Extracts to access their selection of datasets (available without a subscription).
UN Data: Data Access System to UN Databases
The United Nations Statistics Division (UNSD) of the Department of Economic and Social Affairs (DESA) launched a new internet based data service for the global user community. It brings UN statistical databases within easy reach of users through a single entry point. Users can now search and download a variety of statistical resources of the UN system.
World Bank Datasets
The Data Catalog provides download access to over 7,000 indicators from World Bank datasets. The World Bank's Open Data initiative is intended to provide all users with access to World Bank data. The data catalog is a listing of available World Bank datasets, including databases, pre-formatted tables and reports.
World Bank Research Data Sets and Analytical Tools
The datasets contained on this page were compiled for World Bank research, and are provided free of cost to foster the creation of new knowledge.
Financial and Economic Datasets
Federal Reserve Economic Data
The Federal Reserve Economic Data (FRED) contains 55,000 economic time series from 44 sources. Download, graph, and track economic data.
Macroeconomics Datasets from the Bureau of Labor Statistics
The Bureau of Labor Statistics (BLS) has data on the following: Inflation & Prices, Employment, Unemployment, Pay & Benefits, Spending & Time Use, Productivity, Workplace Injuries, International, and Employment Projections
Consumer and Marketing Datasets
Survey of Consumers (Thomson Reuters and University of Michigan)
Tables include topics including: the Index of Consumer Sentiment, Annual Trends in Household Financial Situation, Probability of Personal Income Increase During the Next Year, Change in Likelihood of Comfortable Retirement, Expected Change in Unemployment During the Next Year, Reasons for Opinions for Buying Conditions for Vehicles, Expected Change in Home Values During the Next Year
ERIM Database (Kilts Center for Marketing, James M. Kilts Center, University of Chicago Booth School of Business)
The ERIM database is data collected by the now-defunct ERIM division of A.C. Nielsen on panels of households in two midsized Midwestern cities. Information is available on the purchases of households in a number of product categories along with household demographic information.
Dominick's Database (Kilt's Center for Marketing, James M. Kilts Center, University of Chicago Booth School of Business)
From 1989 to 1994, Chicago Booth and Dominick's Finer Foods entered into a partnership for store-level research into shelf management and pricing. Randomized experiments were conducted in more than 25 different categories throughout all stores in this 100-store chain. As a byproduct of this research cooperation, approximately nine years of store-level data on the sales of more than 3,500 UPCs is available in this database. This data is unique for the breadth of its coverage and for the information available on retail margins.
bayesm is a software package for Bayesian analysis of many models of interest to marketers. In addition, bayesm contains a number of interesting datasets, including scanner panel data, key account level data, store level data and various types of survey data. bayesm is an R package which can be downloaded from the CRAN network of mirror sites around the world. Users running R can install bayesm automatically from within R.
Consumer Expenditure Survey
The Consumer Expenditure Survey (CEX) provides information on the buying habits of American consumers, including data on their expenditures, income, and household characteristics. The survey data are collected for the Bureau of Labor Statistics by the U.S. Census Bureau. Free resource.
MPC Data Projects
The MPC is one of the world's leading developers of demographic data resources. Population data includes: international and national harmonized data from 1960 onwards, harmonized data from the Current Population Survey, the North Atlantic Population Project, the National Historical Geographic Inform, and American Time Use Survey-X
Data sets on education are available freely through a number of sites. The National Center for Education Statistics (NCES) collects, analyzes and makes available data related to education in the U.S. and other nations and below you will find links to a few of their data sets.
National Assessment of Adult Literacy (NCES)
The 2003 National Assessment of Adult Literacy is a nationally representative assessment of English literacy among American adults age 16 and older. To access the data sets, click on "Data Files" from the left menu.
Schools and Staffing Survey (NCES)
With its focus on schools and school personnel, the Schools and Staffing Survey (SASS) emphasizes teacher demand and shortage, teacher and administrator characteristics, school programs, and general conditions in schools. SASS also collects data on many other topics, including principals' and teachers' perceptions of school climate and problems in their schools; teacher compensation; district hiring practices and basic characteristics of the student population.
To access data sets, click on the text "Data Products" (text in RED, located under the the title of the site "Schools and Staffing Survey") http://nces.ed.gov/surveys/sass/dataproducts.asp
School Survey on Crime and Safety (NCES)
The School Survey on Crime and Safety (SSOCS) is the primary source of school-level data on crime and safety for the U.S. Department of Education, National Center for Education Statistics (NCES). The SSOCS is a nationally representative cross-sectional survey of about 3,500 public elementary and secondary schools.
Find access to data sets by selecting Data Sources from the blue menu box on the left side of the screen http://nces.ed.gov/surveys/ssocs/data_products.asp
Journal of Statistics Education Data Archive
Data sets are available through the Journal of Statistics Education (JSE) data archive. The "Datasets and Stories" department of the Journal of Statistics Education provides a forum for exchanging interesting datasets and discussing ways they can be used effectively in teaching statistics.
Other Dataset Links
Inter-University Consortium for Political and Social Research
ICPSR maintains a data archive of more than 500,000 files of research in the social sciences. It hosts 16 specialized collections of data in education, aging, criminal justice, substance abuse, terrorism, and other fields.
Pew Research Center Datasets
Pew Research Center makes its data available to the public for secondary analysis. Datasets exist for the following Pew Projects: Pew Research Center for the People & the Press, Pew Research Center’s Journalism Project, Pew Research Center’s Hispanic Trends Project, Pew Research Center’s Global Attitudes Project, Pew Research Center’s Internet & American Life Project, Pew Research Center’s Social & Demographic Trends, and Pew Research Center’s Religion & Public Life Project.