python - 保存程序的当前状态并从上次保存的点再次恢复

我有一个脚本可以从链接下载图像。假设脚本由于某种原因终止,那么我想保存图像下载的点,并从上次保存的点再次恢复

到目前为止,我已经制作了下载脚本并尝试使用 pickle 保存程序状态

import pandas as pd
import requests as rq
import os,time,random,pickle
import csv
data=pd.read_csv("consensus_data.csv",usecols=["CaptureEventID","Species"])

z=data.loc[ data.Species.isin(['buffalo']), :]

df1=pd.DataFrame(z)

data_2=pd.read_csv("all_images.csv")

df2=pd.DataFrame(data_2)

df3=pd.merge(df1,df2,on='CaptureEventID')

p=df3.to_csv('animal_img_list.csv',index=False)

# you need to change the location below
data_final = pd.read_csv("animal_img_list.csv")
output=("/home/avnika/data_serengeti/url_op")

mylist = []

for i in range(0,100):
    x = random.randint(1,10)
    mylist.append(x)

print(mylist)

for y in range(len(mylist)):
    d=mylist[y]
    print(d)

file_name = data_final.URL_Info
print(len(file_name))
for file in file_name:
    image_url='https://snapshotserengeti.s3.msi.umn.edu/'+file
    f_name=os.path.split(image_url)[-1]
    print(f_name)
    r=rq.get(image_url)

    with open(output+"/"+f_name, 'wb') as f:
        f.write(r.content)
    time.sleep(d)


with open("/home/avnika/data_serengeti","wb") as fp:
    pickle.dump(r,fp)

with open("/home/avnika/data_serengeti","rb") as fp:
    pic_obj=pickle.load(fp)

假设我必须从一个 URL 下载 4000 张图片。我成功下载了 1000 张图片,但由于某些网络问题,我的脚本崩溃了。所以我希望当脚本重新启动时,它应该从图像编号 1001 开始下载。目前,如果脚本重新启动,它会再次从图像编号 1 重新开始。加载 pickle 对象后如何再次运行我的循环?

最佳答案

这个问题可能有多种解决方案,但首先想到的会帮助您解决这个问题。

方法:

很明显,脚本从 start 开始下载,因为直到最后一次下载它才记住索引。

为了解决这个问题,我们将创建一个文本文件,其中包含一个整数 0,表示该索引文件已下载完毕。当脚本运行时,它会检查文本文件中存在的整数值。 (这就像记忆位置)。如果文件下载成功,文本文件中的值将增加 1。

代码

理解的例子::

请参阅:我之前手动创建了一个包含“0”的文本文件。

# Opening the text file
counter =  open('counter.txt',"r")

# Getting the position from where to start.Intially it's 0 later it will be updated
start = counter.read()
print("-->  ",start)
counter.close()

for x in range(int(start),1000):
    print("Processing Done upto : ",x)

    #For every iteration we are writing it in the file with the new position      
    writer = open('counter.txt',"w")
    writer.write(str(x))
    writer.close()

修复你的代码:

注意:手动创建一个名为“counter.txt”的文本文件,并在其中写入“0”。

import pandas as pd
import requests as rq
import os,time,random,pickle
import csv
data=pd.read_csv("consensus_data.csv",usecols=["CaptureEventID","Species"])

z=data.loc[ data.Species.isin(['buffalo']), :]

df1=pd.DataFrame(z)

data_2=pd.read_csv("all_images.csv")

df2=pd.DataFrame(data_2)

df3=pd.merge(df1,df2,on='CaptureEventID')

p=df3.to_csv('animal_img_list.csv',index=False)

# you need to change the location below
data_final = pd.read_csv("animal_img_list.csv")
output=("/home/avnika/data_serengeti/url_op")

mylist = []

for i in range(0,100):
    x = random.randint(1,10)
    mylist.append(x)

print(mylist)

for y in range(len(mylist)):
    d=mylist[y]
    print(d)

# Opeing the file you manually created with '0' present in it.
counter =  open('counter.txt',"r")
start = counter.read()
count = start
counter.close()

file_name = data_final.URL_Info
print(len(file_name))

# The starting position from the file is used to slice the file_name from 'start' value.
for file in file_name[start:]:
    image_url='https://snapshotserengeti.s3.msi.umn.edu/'+file
    f_name=os.path.split(image_url)[-1]
    print(f_name)
    r=rq.get(image_url)

    with open(output+"/"+f_name, 'wb') as f:
        f.write(r.content)

    # File is downloaded and now, it's time to update the counter in the text file with new position.
    count+=1
    writer = open('counter.txt',"w")
    writer.write(str(count))
    writer.close()

    time.sleep(d)

希望这有帮助:)

https://stackoverflow.com/questions/56287199/

相关文章:

python - 在数据框中查找特定元素的列名

reactjs - 为什么脚本评估需要这么长时间?

c# - 使用 System.Print 在 "Microsoft Print to PDF"打印机

unit-testing - 在作为 Prop 传递的函数上调用 Jest spyOn

airflow - 无法在 GCP 上创建 Composer 环境

c# - Mac OS 上的 unity 3d build 问题

django-rest-framework - 是否可以将搜索过滤器与查找表达式不是 `exact`

javascript - TensorflowJS 异步预测

flutter - 如何在Flutter中真正使用SizeTransition?

google-compute-engine - Compute Engine API - 如何使用容