Download all stock prices data using Python

If you create models for the financial market, at first, you have to download the financial data you need, and this require time, money and computation. It could be useful to download all data we need and than try our model, surely this will save us time.

In this tutorial, you’ll learn:

How to use yfinance library
How to download a list of tickers of a stock market index
How to download those those tickers
How to open the prices of the stocks you saved previously

To accomplish this task we will use time and a library called yfinance.

To download all stock data into a folder using Python, you can use the yfinance library to retrieve the stock data and the pandas library to store the data as CSV files in a specified folder. Here how you can use yfinance to download the AMZN stock data in a time span:

import yfinance as yf

stock_df = yf.download("AMZN", start="2021-05-01", end="2023-04-01")

As you can see with the yfinance library is very simple to download a stock timeseries prices. With this line of code we can download daily prices, but we can use some parameter to download different interval of data. Check this example:

gold = yf.download(tickers="GC=F", period="5d", interval="1m")

Here we download gold data with one minute interval and a period of five days.

The stock will be downloaded in a dataframe data structure. We can treat the stock data with pandas Python library.

Now we want all stocks ticker of the S&P 500 companies and we will download them in a list.

A publicly traded company’s stock have a special set of letters called a ticker symbol, also known as a stock symbol, in the financial world. In a variety of financial marketplaces and platforms, such as stock exchanges, financial news services, and trading platforms, ticker symbols are used to identify the stock.

You can use the pandas library in Python to scrape the list of tickers for the S&P 500 from the Wikipedia website. We don’t need BeautifulSoup library because pandas is able to scrape tables from a webpage. Here’s the code:

import pandas as pd

# The URL for the S&P 500 component stocks list
url = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'

# Fetch the first table in the page
df = pd.read_html(url, flavor='html5lib')[0]

# Take the symbols column and convert it in a list
tickers = df['Symbol'].tolist()

# Print the tickers
print(tickers)

This code will scrape the list of S&P 500 component stocks from the Wikipedia page, extract the ticker symbols from the table, and store them in a list called tickers. The resulting list of tickers will be printed to the console.

Note that the Wikipedia page may not always have the most up-to-date information on the S&P 500 component stocks.

Now we can download the tickers in csv files and put them in a folder for further uses.

# Import libraries
import os
import yfinance as yf
import pandas as pd
import time


# The URL for the S&P 500 component stocks list
url = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'

# Fetch the first table from the page
df = pd.read_html(url, flavor='html5lib')[0]

# Take the symbols column and convert it in a list
tickers = df['Symbol'].tolist()


start_ = "2014-01-01"
end_ = "2023-04-01"

# Define the folder path to store the data
folder_path = "stocks_datasets"

# Create the folder if it does not exist
if not os.path.exists(folder_path):
    os.makedirs(folder_path)

# Loop through each stock symbol and download the data
for symbol in tickers:
    try:
        # Download the data from Yahoo Finance using yfinance library
        stock_df = yf.download(symbol, start=start_, end=end_)
        
        # Store the data as CSV file in the specified folder using pandas library
        file_path = os.path.join(folder_path, f"{symbol}.csv")
        stock_df.to_csv(file_path)
    
        print(f"{symbol} data downloaded")
        
        time.sleep(1) # Wait a second
        
    except:
        print(f"{symbol} failed to download")

Here, we begin by importing the pandas and yfinance libraries. The folder directory and a list of stock symbols that we want to use to store the data are then defined. If the folder doesn’t already exist, we create it next. The data is then downloaded using yfinance and stored as a CSV file using pandas once we loop over each stock symbol. The creation of file paths and directories is handled by the OS library.

The path to the CSV file where we need to save the downloaded stock data is stored in the file_path variable. This path is generated using the os.path.join method, which joins the folder path (folder_path) with the filename, which is the stock symbol (symbol) with a .csv file extension. For instance, file_path will be “stocks_datasets/AAPL.csv“, if the folder_path is set to stocks_datasets and the stock symbol is AMZN.

Once we have the file_path, we can use the to_csv method of the Pandas DataFrame (stock_df) to save the data to a CSV file at the specified path. The to_csv method takes the file path as its first argument and saves the data to the specified file.

Take a look inside one of the file; you should see rows of data like the following:

Date,Open,High,Low,Close,Adj Close,Volume
2014-01-02 00:00:00-05:00,40.84406280517578,40.84406280517578,40.164520263671875,40.20743942260742,37.14159393310547,2678848
2014-01-03 00:00:00-05:00,40.33619689941406,41.02288818359375,40.24320602416992,40.715309143066406,37.610748291015625,2609647

After we saved the data, it will be useful to load them into memory to try out some models. How we can do it?

Open all company datasets present in the folder we created

os and pandas libraries can be used to open all datasets stored in the folder. You can use the example code shown here:

import os
import pandas as pd

# In this folder we will be store CSV files
folder_path = "stocks_datasets"

# In this empty list we will store the dataframes
stocks_list = []

# Loop through each file in the folder
for file in os.listdir(folder_path):
    # Check if it is a CSV file
    if file.endswith(".csv"):
        # Create the full file path
        file_path = os.path.join(folder_path, file)
        
        try:
            # Read the CSV file into a Pandas DataFrame
            df = pd.read_csv(file_path)
            
            # Append the DataFrame to the list
            stocks_list.append(df)
            
        except:
            
            print(f"Fail to open {file}")
        
        
# Here we can put our model!

In this code, we first specify the location of the folder containing the CSV files. The data frames are then stored in an empty list (stocks_list). After that, we employ the os.listdir method to cycle through each file in the folder. The endswith method is used to determine whether the file is a CSV file, and if it is, the os.path.join method is used to build the whole file path. Using the pd.read_csv method, we then read the CSV file into a Pandas DataFrame and append it to the stocks_list list.

I hope this post was helpful to you. Keep following me if you want to read articles about quantitative finance and machine learning.