Stumbling Toward 'Awesomeness'

A Technical Art Blog

Monday, April 19, 2010

Dealing with File Sequences in Python

I have been parsing through the files of other people a lot lately, and finally took the time to make a little function to give me general information about a sequence of files. It uses regex to yank the numeric parts out of a filename, figure out the padding, and glob to tell you how many files in the sequence. Here’s the code and an example usage:

#returns [base name, padding, filetype, number of files, first file, last file]
def getSeqInfo(file):
	dir = os.path.dirname(file)
	file = os.path.basename(file)
	segNum = re.findall(r'\d+', file)[-1]
	numPad = len(segNum)
	baseName = file.split(segNum)[0]
	fileType = file.split('.')[-1]
	globString = baseName
	for i in range(0,numPad): globString += '?'
	theGlob = glob.glob(dir+'\\'+globString+file.split(segNum)[1])
	numFrames = len(theGlob)
	firstFrame = theGlob[0]
	lastFrame = theGlob[-1]
	return [baseName, numPad, fileType, numFrames, firstFrame, lastFrame]

Here is an example of usage:

print getSeqInfo('E:\\data\\data\\Games\\Project\\CaptureOutput\\Frame000547.jpg')
>>['Frame', 6, 'jpg', 994, 'E:\\data\\data\\Games\\Project\\CaptureOutput\\Frame000000.jpg', 'E:\\data\\data\\Games\\Project\\CaptureOutput\\Frame000993.jpg']

I know this is pretty simple, but I looked around a bit online and didn’t see anything readily available showing how to deal with different numbered file sets. I have needed something like this for a while that will work with anything from OBJs sent from external contractors, to images from After Effects…

posted by admin at 6:49 PM  


  1. Hi Chris, thanks for sharing ! Not sure you’ll be interested, but for the sake of it, as I’m also playing with a lot of files in Python, I wanted to expose how I’d have done it :

    def getSeqInfo(fpath):
    folder, fname = os.path.split(fpath)
    match = re.compile(“^(.*?)(\d+)\.(.*?)$”).match(fname)
    if match is None:
    raise RuntimeError(“Unable to find sequence number”)
    baseName, sequenceNum, fileType = match.groups()
    numPad = len(sequenceNum)
    seqPattern = re.compile(r”^%s%s.%s$” % (baseName, “\d”*numPad, fileType))
    names = [ name for name in os.listdir(folder) if seqPattern.match(name) ]
    if names == []:
    raise RuntimeError(“No matching file found”)
    numItems = len(names)
    firstItem = names[0]
    lastItem = names[-1]
    return [baseName, numPad, fileType, numItems, firstItem, lastItem]

    May not be perfect, but it’s only another version 🙂

    Comment by rotoglup — 2010/04/21 @ 12:14 AM

  2. Man, you’re awesome! I just bought my little regex cheat book, and this stuff is like chinese to me. You know of a site that has like regex tutorials based on solid scenarios? 😀

    Comment by admin — 2010/04/25 @ 4:27 PM

  3. It often looks like chinese to me too, when looked at some time after the regexp creation !! Sorry, I have no solid pointer to give… I only work through sweat, pain, and general regexp principles knowledge plus python syntax memento from python docs…. I limit myself to basic uses, as it may be painful to debug and re-read ! Hang on, you’ll make it 😉

    Comment by rotoglup — 2010/05/03 @ 8:15 PM

  4. little snippet, throwing this into the farm control here 🙂

    Love Regex, got a couple of those generic cheat sheet pdfs. Takes a bit of getting into but once it’s in your script you wonder how you managed without it!



    Comment by TxRx — 2010/05/31 @ 12:58 PM

  5. For me, this is very helpful book:

    Comment by Marin Petrov — 2010/07/27 @ 3:45 AM

  6. hi, thanks, you just saved my time. plus very knowledgeable work.

    few questions
    1- what would be smart psudo code steps to find missing files from a seq?
    2- is there any way to detech number of sequences in a folder. i mean, what if someone put 2 file sequences in a folder. nuke file open dialog shows this very nicely. i always wanted to find out and ask their dev person, how he solved this. plus i found that cgkit (python) has an interesting module named ‘sequence’
    but thanks again.

    Comment by lala — 2014/04/27 @ 7:33 PM

  7. I have done a widget in pyside browsing file system , and displaying file sequence.

    may be it could be usefull for somebody.

    Comment by zebulon — 2016/03/15 @ 2:38 PM

  8. I’ve tried to use pyseq with directories containing many thousands of files, and I always get “Argument list too large.” errors. Have any of you encountered this error and found a work-around?

    Comment by Len — 2016/04/30 @ 11:27 PM

  9. I know this is an old post but Len’s comment wasn’t too long ago. Anyway in my industry I run into this problem all the time and PySeq never worked for more than a few hundred files. But where the average movie is over 120,000 frames, it was nearly impossible to use in any real-world applications.

    I’m in development right now (it’s pretty close to release and quite usable as-is) on a tool that solves the efficiency problem and can handle millions of files in just a few minutes on a low-end mac mini. Feel free to give it a try and let me know if you like it and how it can be improved!

    Comment by Cody Cuellar — 2017/12/21 @ 12:42 AM

  10. Awesome, thanks for sharing!

    Comment by Chris — 2018/01/03 @ 12:01 PM

RSS feed for comments on this post. TrackBack URI

Leave a comment

Powered by WordPress