Forums

The forums ran from 2008-2020 and are now closed and viewable here as an archive.

Home Forums Back End How can I tell how long a PHP script can run + how to extend

  • This topic is empty.
Viewing 6 posts - 1 through 6 (of 6 total)
  • Author
    Posts
  • #26600
    mattvot
    Member

    I’ve made this PHP file that could potentially take days to complete.

    Initial tests show it stops working after some time.

    I would like to know what the time limit is for PHP scripts on the server.

    Plus, Does anybody know a why to up the limit? It doesn’t necessary have to be a straight forward config update.

    Thanks
    Matt

    #66051
    Argeaux
    Participant

    check this out for more info:

    http://php.net/manual/en/function.set-time-limit.php

    But why on earth do you have a script that takes days to complete? Maybe there is a better way if you explain what you are trying to do…

    #66053
    mattvot
    Member

    Haha,

    Well, My client struck a deal with a massive website and is allowed to be a re distributor of all their data. At first the XML feed was too big to process, so I got them to separate them into countries. Now I was able to get my script to do one feed at a time to reduce the times.

    Lets take the USA feed alone for example, there are over 60,000 items, and PHP could import them at about 6,000/hour. But then…

    We needed to create local tumbnails for the images that come along with the feed. That takes a HELL OF ALOT longer, ALOT LONGER. like from 6,000/hour to about 600/hour.

    So PHP times out and it’s frustrating because the alternative is to do all this processing on my laptop and then FTP the thumbnails up, but this will place my laptop out of action for who knows how long.

    I’ve looked at set_time_limit() before and got no where, but I did see this this time round:

    Quote:
    Note: The set_time_limit() function and the configuration directive max_execution_time only affect the execution time of the script itself. Any time spent on activity that happens outside the execution of the script such as system calls using system(), stream operations, database queries, etc. is not included when determining the maximum time that the script has been running. This is not true on Windows where the measured time is real.

    I am ‘includes’ function files and using a MySQL database so that probably messes with it, but I’m not sure how to resolve it.

    #66057
    Argeaux
    Participant

    Mhh i haven’t handled such large xml files before..

    Maybe the way to do it is in chunks of (x)amount.
    Then keep track of where you are in the xml file in a database.

    You make a cronjob which calls the php file every 5 minutes to do (x)amount of lines and then save on what line it stops so the next cronjob can start there. You will have to find out how many lines can be done in 5 minutes by the script to fill in the (x)amount.

    I am not sure how to do this, ti am just thinking out loud and maybe it helps you.
    It’s probably a smart idea to find a forum dedicated to php for this, because its a though question.

    #66244

    I agree that the way in which you are going about this is not the most efficient approach.

    Firstly, 6000 transactions/hour = 100 transactions/min = 1.66 transactions/sec = a very slow script! It’s possible that it’s just necessarily slow, that PHP isn’t the optimal scripting language or that it’s just running on an overloaded web server.

    This brings me to my first point – don’t run the script through a website, which can timeout, etc. The webserver will limit the amount of available ram and other resources the script can use. Instead, run the script directly from the commandline using php myscript.php. This will be much more efficient although if it will still be difficult to manage if it’s going to take several days.

    If this is quite a processor/memory intensive operation (and it sounds like it is) consider using Amazon EC2. With Amazon EC2 you can get a virtual private server up and running in minutes, and you are only charged by the hour for use. It’s really cheap (and you can of course pass any costs on to your client).

    Something else worth looking into would be Amazon Simple Queue Service. It’s designed for this sort of scenario where you have a lot of processing to do in batch. It’s sort of a similar idea to breaking into chunks and using the cron, only more reliable.

    Firstly, you will need to create your queue:

    Code:
    Read the xml feed.
    For each item in the feed
    [Optional: Upload the item to Amazon S3]
    Create a message on the queue that contains the item [or a link to the item]

    Then in the processing servers:

    Code:
    Read a message from the queue
    Extract the item from the message
    [Optional: Download the item from Amazon S3 if stored there]
    Process the item – do whatever you need to do basically
    Generate the thumbnail
    Rinse and repeat

    The benefit of this strategy is that you can have as many processing servers reading the queue as you like. So if you think it’s going to take 72 hours on one server just create 7 servers and it should be all done in under 2 hours.

    http://aws.amazon.com/ec2/
    http://aws.amazon.com/sqs

    #66353
    mattvot
    Member

    Thanks both of you.

    Yeh, well the first run through of the script will take forever, but it is coded in such a way as when it realises it has parsed the item it is looking at already the script stops that feed and turns to the next one.

Viewing 6 posts - 1 through 6 (of 6 total)
  • The forum ‘Back End’ is closed to new topics and replies.