Quantcast
Channel: Active questions tagged row - Stack Overflow
Viewing all articles
Browse latest Browse all 447

Python: Deque Unable to load specific number of rows from bottom of csv while including the 1st row header column names

$
0
0

CSV has 8000 rows. The code needs to pick up "n" rows (example: 1500) from the bottom of the csv, along with the header column names.

The data from csv looks like this:

TickerNameDateOpenHighLowCloseVolume
HINDOILEXPHINDUSTAN OIL EXPLORATION14-12-9431.3031.3031.3031.30920
HINDOILEXPHINDUSTAN OIL EXPLORATION15-12-9431.3031.5231.3031.52805
HINDOILEXPHINDUSTAN OIL EXPLORATION16-12-9432.6134.7832.6134.78460

The script below is using deque. And is giving this message on console:

R:\PyRAM>python myres3.py

Loading data from R:/PyRAM/db as mentioned in inputs.json

Missing columns in file R:/PyRAM/db\A459.csv: Ticker, Name, Date, Open, High, Low, Close, Volume

Below is the code that is unable to do the job (which is: loading last "n" lines with the 1st header column names row)

Code:

import pandas as pdimport jsonfrom datetime import datetimeimport globimport osimport concurrent.futuresimport sysimport timeimport msvcrtfrom concurrent.futures import ProcessPoolExecutorfrom concurrent.futures import Futurefrom concurrent.futures import ThreadPoolExecutorfrom multiprocessing import Processimport mathfrom tqdm import tqdm from collections import dequeimport io# Define a function to load CSV files and return a list of DataFramesdef load_csv_files(folder_path, required_columns, max_rows):    csv_files = glob.glob(os.path.join(folder_path, '*.csv'))    if not csv_files:        print("No CSV files found in the folder.")        print("Press ENTER to wait or ESC to exit...")        while True:            if msvcrt.kbhit() and msvcrt.getch() == b'\x1b':                exit(0)            if os.path.exists(os.path.join(folder_path, '*.csv')):                break            continue    all_data = []    for file in csv_files:        try:            # Use deque to store the last 'max_rows' rows            rows_to_lift = deque(maxlen=max_rows)            with open(file, 'r') as csv_file:                for line in csv_file:                    rows_to_lift.append(line)            # Reconstruct the CSV data with the header and lifted rows            lifted_data = list(rows_to_lift)            # Create a DataFrame from the lifted data            data = pd.read_csv(io.StringIO(''.join(lifted_data)), header=0)  # Specify that the first row is the header            # Check for missing columns            missing_columns = [col for col in required_columns if col not in data.columns]            if missing_columns:                print(f"Missing columns in file {file}: {', '.join(missing_columns)}")                continue            data['Date'] = pd.to_datetime(data['Date'], format='%d-%m-%y')            data.sort_values('Date', inplace=True)            all_data.append(data)            excel_file_path = os.path.join(folder_path, f"loaded_data_{len(all_data)}.xlsx")            data.to_excel(excel_file_path, index=False)            print(f"DataFrame {len(all_data)} saved to {excel_file_path}")        except FileNotFoundError:            print(f"Error: CSV file {file} not found.")            continue    return all_data

Viewing all articles
Browse latest Browse all 447

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>