python - Pandas 在 Linux 中失败,在 Windows 中没有发生 - 缺少数据

我在 RHEL Linux 中运行我的 python 脚本,我收到以下错误:

Traceback (most recent call last):
  File "main.py", line 162, in <module>
    find_deltas(logging, snapshot_id)
  File "/ariel/python_scripts/ariel_deltas/deltas.py", line 71, in find_deltas
    data = prepare_frames(logging, file_extracts)
  File "/ariel/python_scripts/ariel_deltas/deltas.py", line 606, in prepare_frames
    logging.info("df_old has %d records", len(df_old))
  File "/ariel/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 1041, in __len__
    return len(self.index)
  File "/ariel/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 5270, in __getattr__
    return object.__getattribute__(self, name)
  File "pandas/_libs/properties.pyx", line 63, in pandas._libs.properties.AxisProperty.__get__
  File "/ariel/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 5270, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute '_data'
Traceback (most recent call last):
  File "main.py", line 162, in <module>
    find_deltas(logging, snapshot_id)
  File "/ariel/python_scripts/ariel_deltas/deltas.py", line 71, in find_deltas
    data = prepare_frames(logging, file_extracts)
  File "/ariel/python_scripts/ariel_deltas/deltas.py", line 606, in prepare_frames
    logging.info("df_old has %d records", len(df_old))
  File "/ariel/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 1041, in __len__
    return len(self.index)
  File "/ariel/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 5270, in __getattr__
    return object.__getattribute__(self, name)
  File "pandas/_libs/properties.pyx", line 63, in pandas._libs.properties.AxisProperty.__get__
  File "/ariel/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 5270, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute '_data'

我正在有效地从 Oracle 读取数据框,将其写​​入 pickle 文件,然后读取 pickle 文件,还读取昨天的 pickle 文件,然后对主键进行连接。

当代码在 Windows 中完全相同的数据集上运行良好时,Linux 到底为什么会生成有关缺少“_data”属性的错误?!

在 Linux 中读取 pickle 文件,列符合预期。

>>> df.columns
Index(['AS_OF_DT', 'VARIATION_REQUEST_ID', 'LU_NUMBER', 'LU_TITLE', 'COUNTRY',
       'ARCHIVED', 'APPLIED', 'LU_DESCRIPTION', 'HA_LU_REF_NO', 'REMARKS',
       'LU_CATEGORY', 'VARIATION_TYPE', 'INSERT_UPDATE_TIME',
       'INSERT_UPDATE_USER', 'MERGED', 'REVISION_NUMBER', 'VERSION_SEQ',
       'RECORD_ID', 'IMPLEMENTED_SEQ', 'RMS_VERSION_SEQ',
       'REASON_FOR_LOCAL_UPDATE', 'C_ECTD_SEQUENCE_NO', 'INSERT_TIME',
       'ARCHIVED_DATE', 'REASON_FOR_MERGE', 'SCRN_NO'],
      dtype='object')
>>>

产生问题的函数如下:

def prepare_frames(logging, file_extracts):
    # file_extracts is a tuple of dictionaries
    # old_file
    # new_file
    # file_info

    # file_info is a dict describing the file master record including the join keys
    # {"file_id":file_id, "file_desc": r.FILE_DESC, "file_prefix": r.FILE_PREFIX, "compare_col": r.COMPARE_COL}

    # old_file and new_file dictionaries describes the file name of the older snapshot file to be compared
    # old_file["new_old"] = "old"
    # old_file["extract_id"] = extract_id
    # old_file["file_id"] = file_id
    # old_file["file_name"] = file_name
    # old_file["snapshot_id"] = snapshot_id
    # old_file["num_records"] = num_records

    # Strip columns which we know will be different, to remove false positives such as AS_OF_DT

    logging.info("Start: Reading in DataFrames for analysis from pickle files.")

    data = []

    for extract in file_extracts:
        old_file = extract[0]
        new_file = extract[1]
        file_info = extract[2]  # the dictionary

        old_file_name = old_file["file_name"]
        new_file_name = new_file["file_name"]
        logging.info("Reading in old snapshot from pickle file: %s", old_file_name)
        df_old = pd.read_pickle('snapshots/' + old_file_name)
        logging.info("Reading in new snapshot from pickle file: %s", new_file_name)
        df_new = pd.read_pickle('snapshots/' + new_file_name)

        logging.info("df_old has %d records", len(df_old))
        logging.info("df_new has %d records", len(df_new))




        # before we do any comparisons we need to remove as_of_dt type values as this will produce false deltas
        #if "AS_OF_DT" in df_new.columns:
        #    del df_new["AS_OF_DT"]
        #    del df_old["AS_OF_DT"]

        #if "AS_OF_DATE" in df_new.columns:
        #    del df_new["AS_OF_DATE"]
        #    del df_old["AS_OF_DATE"]

        data.append((df_old, df_new, old_file, new_file, file_info))

    logging.info("End: Reading in DataFrames for analysis from pickle files.")

    return data

第 606 行是这一行:

logging.info("df_old has %d records", len(df_old))

df_old 和 df_new 基本上是读入数据帧的 pickle 文件。我将相同的 pickle 文件复制到 Windows,完全没有问题

更新:看起来这是一个逻辑错误,数据帧实际上是空的!

最佳答案

我遇到了同样的问题。我在 conda 环境中使用 pandas=1.0.4。将 pandas 更新到 1.1.0 解决了我的问题。

希望有用。

关于python - Pandas 在 Linux 中失败,在 Windows 中没有发生 - 缺少数据属性,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63280366/

相关文章:

node.js - cors 问题与 passport.js google oauth 策略

perl - 为什么 local 在 STDERR 和 STDOUT 上不起作用?

angular - 如何更新 Angular 中提供者提供的值?

php - 在 Laravel 中通过表格获得一行的最佳方法是什么?

c# - EpiServer DynamicDataStore LINQ 语句中的 sql 语法不正

text - 关注 VStack 中的下一个 TextField

image - 在 vim 文件中插入 bash 脚本输出

reactjs - EsLint 错误解释 no-unused-vars 中的导入语句

azure - 如何在函数应用中为Azure SignalR服务使用不同的连接字符串?

caching - Mike Acton 的面向数据设计 - 'loops per cache li