If you create models for the financial market, at first, you have to download the financial data you need, and this require time, money and computation. It could be useful to download all data we need and than try our model, surely this will save us time.
In this tutorial, you’ll learn:
- How to use yfinance library
- How to download a list of tickers of a stock market index
- How to download those those tickers
- How to open the prices of the stocks you saved previously
To accomplish this task we will use time and a library called yfinance.
To download all stock data into a folder using Python, you can use the yfinance
library to retrieve the stock data and the pandas
library to store the data as CSV files in a specified folder. Here how you can use yfinance to download the AMZN stock data in a time span:
import yfinance as yf stock_df = yf.download("AMZN", start="2021-05-01", end="2023-04-01")
As you can see with the yfinance library is very simple to download a stock timeseries prices. With this line of code we can download daily prices, but we can use some parameter to download different interval of data. Check this example:
gold = yf.download(tickers="GC=F", period="5d", interval="1m")
Here we download gold data with one minute interval and a period of five days.
The stock will be downloaded in a dataframe data structure. We can treat the stock data with pandas Python library.
Now we want all stocks ticker of the S&P 500 companies and we will download them in a list.
A publicly traded company’s stock have a special set of letters called a ticker symbol, also known as a stock symbol, in the financial world. In a variety of financial marketplaces and platforms, such as stock exchanges, financial news services, and trading platforms, ticker symbols are used to identify the stock.
You can use the pandas
library in Python to scrape the list of tickers for the S&P 500 from the Wikipedia website. We don’t need BeautifulSoup library because pandas is able to scrape tables from a webpage. Here’s the code:
import pandas as pd # The URL for the S&P 500 component stocks list url = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies' # Fetch the first table in the page df = pd.read_html(url, flavor='html5lib')[0] # Take the symbols column and convert it in a list tickers = df['Symbol'].tolist() # Print the tickers print(tickers)
This code will scrape the list of S&P 500 component stocks from the Wikipedia page, extract the ticker symbols from the table, and store them in a list called tickers
. The resulting list of tickers will be printed to the console.
Note that the Wikipedia page may not always have the most up-to-date information on the S&P 500 component stocks.
Now we can download the tickers in csv files and put them in a folder for further uses.
# Import libraries import os import yfinance as yf import pandas as pd import time # The URL for the S&P 500 component stocks list url = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies' # Fetch the first table from the page df = pd.read_html(url, flavor='html5lib')[0] # Take the symbols column and convert it in a list tickers = df['Symbol'].tolist() start_ = "2014-01-01" end_ = "2023-04-01" # Define the folder path to store the data folder_path = "stocks_datasets" # Create the folder if it does not exist if not os.path.exists(folder_path): os.makedirs(folder_path) # Loop through each stock symbol and download the data for symbol in tickers: try: # Download the data from Yahoo Finance using yfinance library stock_df = yf.download(symbol, start=start_, end=end_) # Store the data as CSV file in the specified folder using pandas library file_path = os.path.join(folder_path, f"{symbol}.csv") stock_df.to_csv(file_path) print(f"{symbol} data downloaded") time.sleep(1) # Wait a second except: print(f"{symbol} failed to download")
Here, we begin by importing the pandas and yfinance
libraries. The folder directory and a list of stock symbols that we want to use to store the data are then defined. If the folder doesn’t already exist, we create it next. The data is then downloaded using yfinance
and stored as a CSV file using pandas
once we loop over each stock symbol. The creation of file paths and directories is handled by the OS library.
The path to the CSV file where we need to save the downloaded stock data is stored in the file_path
variable. This path is generated using the os.path.join
method, which joins the folder path (folder_path
) with the filename, which is the stock symbol (symbol
) with a .csv
file extension. For instance, file_path
will be “stocks_datasets/AAPL.csv
“, if the folder_path
is set to stocks_datasets
and the stock symbol is AMZN
.
Once we have the file_path
, we can use the to_csv
method of the Pandas DataFrame
(stock_df
) to save the data to a CSV file at the specified path. The to_csv
method takes the file path as its first argument and saves the data to the specified file.
Take a look inside one of the file; you should see rows of data like the following:
Date,Open,High,Low,Close,Adj Close,Volume 2014-01-02 00:00:00-05:00,40.84406280517578,40.84406280517578,40.164520263671875,40.20743942260742,37.14159393310547,2678848 2014-01-03 00:00:00-05:00,40.33619689941406,41.02288818359375,40.24320602416992,40.715309143066406,37.610748291015625,2609647
After we saved the data, it will be useful to load them into memory to try out some models. How we can do it?
Open all company datasets present in the folder we created
os
and pandas
libraries can be used to open all datasets stored in the folder. You can use the example code shown here:
import os import pandas as pd # In this folder we will be store CSV files folder_path = "stocks_datasets" # In this empty list we will store the dataframes stocks_list = [] # Loop through each file in the folder for file in os.listdir(folder_path): # Check if it is a CSV file if file.endswith(".csv"): # Create the full file path file_path = os.path.join(folder_path, file) try: # Read the CSV file into a Pandas DataFrame df = pd.read_csv(file_path) # Append the DataFrame to the list stocks_list.append(df) except: print(f"Fail to open {file}") # Here we can put our model!
In this code, we first specify the location of the folder containing the CSV files. The data frames are then stored in an empty list (stocks_list
). After that, we employ the os.listdir
method to cycle through each file in the folder. The endswith
method is used to determine whether the file is a CSV file, and if it is, the os.path.join
method is used to build the whole file path. Using the pd.read_csv
method, we then read the CSV file into a Pandas DataFrame and append it to the stocks_list
list.
I hope this post was helpful to you. Keep following me if you want to read articles about quantitative finance and machine learning.