View Full Version : Cache your api results (a must read)
prosperent brian
06-07-2011, 04:13 PM
I know I have touched on this before, but wanted to do it once more. Caching your api results is essential. Especially going forward as we start to evaluate api limits. Basically caching stores a copy of our data on your end which is then called on subsequent page loads instead of hitting our servers every call. Right now we are handling almost 800,000 BOT requests to the api per hour. 100 percent of these could be cached on your end thus reducing our server loads, and allowing us to spend the extra cpu cycles on more important things like better algorithms, and higher api limits. Things that allow you guys to get more data, better data, and make more money.
We can handle the traffic, and can simply scale by adding new servers, but most of the requests are not of benefit to any of us from a money standpoint, thus the cache layer.
For caching suggestions, I would suggest 1 week, or 7 days for api caching. Most merchants products don't change more often than that, so you aren't going to have stale data. You can literally just cache the json response we send over to you, or get fancy and use memcache, or even create static files.
For the coupon api, specifically, I would only cache for 24 hours. That's how often we update and clear out any expired coupons, so any more than that would be bad.
makavelimob
06-07-2011, 06:32 PM
For MFPMU users, is it as easy as changing that one line of code? Or is there more that needs to be done?
Zaphod
06-07-2011, 07:07 PM
For MFPMU users, is it as easy as changing that one line of code? Or is there more that needs to be done?
Nope, that's it. Just set $arSiteConfig['prosp feed cache age'] = 7; in inc_config.php :)
charles
06-07-2011, 09:23 PM
perhaps in the next version that should be default. I think Brian would like that.
Zaphod
06-07-2011, 09:44 PM
perhaps in the next version that should be default. I think Brian would like that.
Yeah, that's for sure on the list haha :)
tykoon
06-08-2011, 01:55 AM
What about the coupons?
AcidRaZor
06-08-2011, 01:57 AM
Brian, mind giving us some tutorials on the memcached options? I like being fancy ^^ (and learning new server things)
Also, are the image servers coping with everything?
prosperent brian
06-08-2011, 05:47 AM
Sure, I can do that today. As for images, well, it would be awesome if people cached those as well. We pump out about 80MB a second worth of images currently.
makavelimob
06-08-2011, 10:09 AM
I'm a noob when its comes to this, but how would I go about caching images on my MFPMU set up?
prosperent brian
06-08-2011, 10:13 AM
The second reply :)
Just set $arSiteConfig['prosp feed cache age'] = 7; in inc_config.php
Zaphod
06-08-2011, 10:15 AM
Well caching images would be tough... especially at a large scale. There's another option... not sure if it's a good option, I'm just throwing it out there so we can all talk about it. Back before Prosp did images we used to use some fancy JS/CSS to auto resize images and make them still look good in our pages. We could try that again, using the merchant image URL from the API instead of using Prosp's server. Just an idea.
This would expose our sites to any merchant who cared to look at their image server logs, whereas right now we're all behind the curtain of Prosperent....
monalisa
06-08-2011, 10:15 AM
The second reply :)
Just set $arSiteConfig['prosp feed cache age'] = 7; in inc_config.php
That would cache only the api results and not the images.
prosperent brian
06-08-2011, 10:24 AM
man, Ineed to read today lol. Thanks monalisa.
Zaphod, not a huge fan of that idea. Most merchants couldn't handle the volume of requests and would crash or shut you guys down. We have the infrastructure to handle it, just looking toward the future :). Let me brainstorm on that one a bit more.
monalisa
06-08-2011, 10:31 AM
Hey Brian
I am not an expert at squid, but got the idea of using squid to cache images from http://www.squid-cache.org/mail-archive/squid-users/200412/0084.html . Do you think a setup something like that can be used to cache images on our servers.
prosperent brian
06-08-2011, 10:39 AM
Yeah, that would probably work, or varnish. They are both basically reverse proxies, so they should both be able to handle the task.
AcidRaZor
06-08-2011, 10:42 AM
You should consider using Akamai as a CDN, any processed images would be uploaded there and the URL's would then be changed to that, should take off a huge load from your servers right?
prosperent brian
06-08-2011, 10:45 AM
Sure, and only a couple hundred thousand for the volume of traffic and amount of data we push ;)
monalisa
06-08-2011, 10:53 AM
I know you are busy so if time permits, can you provide us a crisp step by step guide for using squid to cache images only.
bman46
06-09-2011, 06:48 AM
Im sure im not the problem since im super low volume at this point but ive changed the setting to a 7 day cache to avoid problems in the future.
Zaphod
06-09-2011, 10:25 AM
MFPmu users, do you use the same API key for an entire install? I didn't at the beginning, but I do now. If we all do, then we can easily improve the efficiency of the feed caching by removing the domain ID from the query. With different API keys for domains you need to cache feeds per domain... if the same API key is being used for all sites then they can be cached for the entire install... much more efficient if you have niche overlap (I do).
monalisa
06-09-2011, 10:29 AM
MFPmu users, do you use the same API key for an entire install? I didn't at the beginning, but I do now. If we all do, then we can easily improve the efficiency of the feed caching by removing the domain ID from the query. With different API keys for domains you need to cache feeds per domain... if the same API key is being used for all sites then they can be cached for the entire install... much more efficient if you have niche overlap (I do).
I use a single api key for all the domains in an install. DO I need to do anything else?
Zaphod
06-09-2011, 10:33 AM
I use a single api key for all the domains in an install. DO I need to do anything else?
well as it stands this won't help... but I could rewrite it to not use domainID and cache feeds install wide... I could use API keys as a db key to cache on, but they're pretty long! haha
makavelimob
06-09-2011, 10:44 AM
I use a new API key for every 200 sites. Just to keep track of batches. They are all on the same server but I use a different IP address for each one... not sure if this matters.
Zaphod
06-09-2011, 10:46 AM
maybe I could grab like the crc of the API key and use that as a key in the db, it's much smaller....
charles
06-09-2011, 10:50 AM
I have used one key for all so far.
Voodoo
06-09-2011, 10:54 AM
I use one key.
I'm up for anything that will improve the caching. The way it currently works, the hit rate is very very low ;)
Zaphod
06-09-2011, 10:57 AM
hmm... even without the domain id there's also the page number as well as products per page that differs from site to site... in the end, not much of a savings :/
Brian
New to prosperent. Just have pro plugin and store, no MFPMU yet(soon I Hope). Be glad to help in anyway I can. Please explain what I need to do.
Thanks for your time
Joe
toykilla
06-09-2011, 02:01 PM
What is the average API hit per day on accounts?
Jts2005
06-09-2011, 02:04 PM
What is the average API hit per day on accounts?
top accounts with API request:
1. Toykilla
2. ?????
prosperent brian
06-09-2011, 02:07 PM
Average would be difficult to come up with based on the number of api keys, but there are well over 100 people here that send over a million requests per day (some in the 10+ million range). Completely unnecessary. Most are pulling the same data over and over.
prosperent brian
06-09-2011, 02:08 PM
We have one account that sent over 100 million requests in a single day.
Jts2005
06-09-2011, 02:08 PM
we have one account that sent over 100 million requests in a single day.
holy shit!!
prosperent brian
06-09-2011, 02:10 PM
We've had 1 billion queries sent to us in the past 6 days.
toykilla
06-09-2011, 02:18 PM
I am not guilty of that much.. I am caching results.
Zaphod
06-09-2011, 02:23 PM
GeoIP ... You guys stopped caring about region some time ago, right?
toykilla
06-09-2011, 02:33 PM
I don't mess with geoip, but i block bad bots and most foreign countries
prosperent brian
06-09-2011, 02:41 PM
We can't geoip anything on our end because it is up to you guys to send over the ip of the visitor, which most seem to forge anyway. On that note, a significant number of people outside of the u.s make purchases of us products, so blocking them is a bad bad bad idea. Better idea is to just cache the json responses we give you when you make an api request, then simply check the local cache first before sending the request to our servers. If people did something as simple as that, we could cut back on requests by 70+ percent I bet.
AcidRaZor
06-09-2011, 02:42 PM
We have one account that sent over 100 million requests in a single day.
good lord, I do about 350k a month....how the hell do you get to 100 million in a single day?!?!!?
Zaphod
06-09-2011, 02:49 PM
We can't geoip anything on our end because it is up to you guys to send over the ip of the visitor, which most seem to forge anyway. On that note, a significant number of people outside of the u.s make purchases of us products, so blocking them is a bad bad bad idea. Better idea is to just cache the json responses we give you when you make an api request, then simply check the local cache first before sending the request to our servers. If people did something as simple as that, we could cut back on requests by 70+ percent I bet.
Cool. I was asking because MFPmu takes region into account. I'll pull that out of the next version... that'll allow more queries to be cached. (Less duplication because of country code.)
deafbiz
06-10-2011, 05:53 AM
What about those without MFPmu? I only have a simple template with prosperent tokens only.
AcidRaZor
06-10-2011, 05:55 AM
Me too, and I cache with files. Here's a tutorial: http://www.theukwebdesigncompany.com/articles/php-caching.php
Zaphod
06-10-2011, 11:26 AM
I've been down the file based caching road before... it'll mess up your drive. You either do it all in a big dir and that dir eventually becomes unreadable, or you make an algo for subdirs - this makes the dirs readable, but in the end it's the same. You'll run out of inodes on the drive after a while :(
Brian, how goes the battle? Are API calls down at all?
prosperent brian
06-10-2011, 11:28 AM
Not even a little bit. Our numbers look identical.
charles
06-10-2011, 11:44 AM
are you able to identify the api keys that are high users and contact them?
I wouldnt even know if Im a high user or not even a drop in the bucket.
prosperent brian
06-10-2011, 11:45 AM
Most of the active people on here are, and they all know who they are. I don't think the caching in mfpmu is working very well at the moment which is one issue.
Zaphod
06-10-2011, 12:05 PM
I don't think the caching in mfpmu is working very well at the moment which is one issue.
It could be greatly improved, with the API performance in mind, at the expense of the client server - I'm not saying that's a bad thing, the health of the API is obviously in all our best interest. Currently MFPmu uses the API pages for site pagination. This is both good and bad for the API. Good when looking at one page because it only pulls 10 products if that's all it needs, bad because the next 5 pages will also make calls of 10 products each. So... I could change it to pull every available product, cache a larger feed, and do internal pagination (easily done). And actually, now that I think about it, this will result in larger cached records, but FAR fewer of them... so better for the client as well. This will also make it far easier to share feeds among sites since all sites, no matter what the number of products per page for each site is.
Also, removing region from the cache will improve things greatly.
prosperent brian
06-10-2011, 12:47 PM
Yeah, I think that would make sense. basically treat the api response as a raw response, then reuse it whenever it is needed for that and other sites.
Sounds interesting, how would you implement memcache?
AcidRaZor
06-23-2011, 04:10 AM
Sounds interesting, how would you implement memcache?
Brian's a bit too busy to give us tutorials :D
In any case, once I have my DNS servers sorted (thanks editdns.net for ruining a great service and selling to Dyn!), I'll probably delve more into the memcached bits myself and maybe post something here :)
pascalos
06-23-2011, 06:47 PM
thanks editdns.net for ruining a great service and selling to Dyn!
in the same boat ...will loose 8 k free sub in august lol ...
(got free sub account deleted but the domain remain looool)
pascalos
06-23-2011, 06:48 PM
anyways i found a way to cache the images from api ..... it accellerate the load of pages a lot ... weird code but if someone is interested ill post it
Zaphod
06-23-2011, 07:04 PM
anyways i found a way to cache the images from api ..... it accellerate the load of pages a lot ... weird code but if someone is interested ill post it
Are you caching to files?
anyways i found a way to cache the images from api ..... it accellerate the load of pages a lot ... weird code but if someone is interested ill post it
I'm open to seeing whatever you have. If memory serves, you're the dude who scripted a real-time query of the Prosperent API through a search form. I'd LOVE to grab that code.
monalisa
06-23-2011, 09:42 PM
anyways i found a way to cache the images from api ..... it accellerate the load of pages a lot ... weird code but if someone is interested ill post it
Please show us your method.
pascalos
06-26-2011, 11:08 PM
yes,zaph ,it use file
how do you want to cache image without caching them as file ?
will put the code soon .
pascalos
06-26-2011, 11:09 PM
I'm open to seeing whatever you have. If memory serves, you're the dude who scripted a real-time query of the Prosperent API through a search form. I'd LOVE to grab that code.
yup, good memory lol
pascalos
06-26-2011, 11:54 PM
:cool:MOD THIS AT YOUR OWN RISK ,I DONT SUPPORT IT AND I WILL NOT REPAIR YOUR MFPMU/SERVER INSTALLATION IF YOU BROKE IT ! DO IT ONLY IF YOU KNOW WHAT YOU DO ! :cool:
ok here we go .
0- open inc_functions.php
1- create 2 folders called "img" and "mini" inside your mfpmu folder .
2-chmod these folder 777
3- search for "// end coupons mods" comment ,there is 2 occurences of it
after first occurence
add :
define ('CACHE_DIR', './img/');
$src =$arThisFeed['image_url'];
$merch=str_replace("'","_",$arThisFeed['merchant']);
$merch=str_replace(" ","_",$merch);
ereg_replace("[^a-zA-Z]","_",$merch);
mkdir ("./images/$merch");
chmod ("./images/$merch", 0777);
$local_file = CACHE_DIR .$merch."/". sha1($src).".gif";
if (!file_exists($local_file)) {
file_put_contents($local_file, file_get_contents($src));
}
$img_cache="http://".$_SERVER['HTTP_HOST']."/".$local_file;
after second occurence :
define ('CACHE_DIR', './mini/');
$src =$arThisFeed['image_thumb_url'];
$merch=str_replace("'","_",$arThisFeed['merchant']);
$merch=str_replace(" ","_",$merch);
ereg_replace("[^a-zA-Z]","_",$merch);
mkdir ("./images/$merch");
chmod ("./images/$merch", 0777);
$local_file = CACHE_DIR .$merch."/". sha1($src).".gif";
if (!file_exists($local_file)) {
file_put_contents($local_file, file_get_contents($src));
}
$img_cache="http://".$_SERVER['HTTP_HOST']."/".$local_file;
now :
find this :
$arSiteConfig['main content'] .= "<div class='summaryMainPicWrapper'><a href='$strBuyLink' rel='nofollow'><img src='".$arThisFeed['image_url']."' alt='".cleanForHTML($arThisFeed['keyword'])."' /></a></div>";
replace by:
$arSiteConfig['main content'] .= "<div class='summaryMainPicWrapper'><a href='$strBuyLink' rel='nofollow'><img src='$img_cache' alt='".cleanForHTML($arThisFeed['keyword'])."' /></a></div>";
find this :
$arSiteConfig['main content'] .= "<div class='summaryPicWrapper'><a href='$strBuyLink' rel='nofollow'><img src='".$arThisFeed['image_thumb_url']."' alt='".cleanForHTML($arThisFeed['keyword'])."' /></a></div>";
replace by:
$arSiteConfig['main content'] .= "<div class='summaryPicWrapper'><a href='$strBuyLink' rel='nofollow'><img src='$img_cache' alt='".cleanForHTML($arThisFeed['keyword'])."' /></a></div>";
images and thumbs are now cached at first query and will be no more queried to api after .
images are cached in folder for each merchant .easier if delete needed .
you're all set ...
monalisa
06-27-2011, 02:46 AM
@pascalos: one question: Why are you saving all images as gif. The images may be in jpeg format. Won't saving them as gif create any problem?
mferrara
06-27-2011, 03:18 AM
Might want to spread out those cached files across multiple directories? Millions of files in that single directory could get messy.
Example:
rsscache_52903c75ce97d0ce880dc189ad2d9e32
Gets stored in:
temp/52/90/3c/rsscache_52903c75ce97d0ce880dc189ad2d9e32
AcidRaZor
06-27-2011, 04:50 AM
@pascalos: one question: Why are you saving all images as gif. The images may be in jpeg format. Won't saving them as gif create any problem?
Modern browsers don't load images based off of their extensions anymore, they load based on the file header. They'll download/store it as a GIF, but if the file's header says it's PNG or JPEG, it uses the appropriate rendering engine to display it properly.
Zaphod
06-27-2011, 08:29 AM
yes,zaph ,it use file
how do you want to cache image without caching them as file ?
will put the code soon .
To the db... caching images to file on a normal server could really crush the inodes IMO :)
AcidRaZor
06-27-2011, 09:25 AM
To the db... caching images to file on a normal server could really crush the inodes IMO :)
it crushes a db faster
/edit
just to add on, accessing a file directly (img src="myfile.jpg") is just as fast even if there were a billion images in 1 directory (if you wanted it that way). Filing them under their respective merchants/sizes would be less hungry when running programs that would actually want to list all files in a particular directory. But if you're not going to list that directory all the time, you wouldn't have to worry about it crushing anything (other than your available disc space)... whereas if you store it in a db, and the db gets corrupted, you're screwed both in data and in images.
pascalos
06-27-2011, 10:48 AM
Might want to spread out those cached files across multiple directories? Millions of files in that single directory could get messy.
Example:
rsscache_52903c75ce97d0ce880dc189ad2d9e32
Gets stored in:
temp/52/90/3c/rsscache_52903c75ce97d0ce880dc189ad2d9e32
plead read last lines of my post again .
@pascalos: one question: Why are you saving all images as gif. The images may be in jpeg format. Won't saving them as gif create any problem?
see zaph answer
Modern browsers don't load images based off of their extensions anymore, they load based on the file header. They'll download/store it as a GIF, but if the file's header says it's PNG or JPEG, it uses the appropriate rendering engine to display it properly.
thx !
To the db... caching images to file on a normal server could really crush the inodes IMO :)
and the db ....
it crushes a db faster
/edit
just to add on, accessing a file directly (img src="myfile.jpg") is just as fast even if there were a billion images in 1 directory (if you wanted it that way). Filing them under their respective merchants/sizes would be less hungry when running programs that would actually want to list all files in a particular directory.
you too ,read my last lines of my post .....
Zaphod
06-27-2011, 11:37 AM
I've made the mistake of large scale file-based caching in the past with an older CMS some of you here may be aware of. It'll destroy the drive if you get enough volume, trust me :)
On your linux server, do a "df -i". That shows you the inodes. Regardless of drive space, if you run out of inodes your drive is full.
prosperent brian
06-27-2011, 12:19 PM
Even with our 50 million images in 5 different sizes, we have only used 18 percent of our inodes on our main image server.
Zaphod
06-27-2011, 05:12 PM
Even with our 50 million images in 5 different sizes, we have only used 18 percent of our inodes on our main image server.
Maybe when I had my inode probs I was dealing with smaller drives than you're using :D
prosperent brian
06-27-2011, 06:53 PM
Definite possibility. this is an 8 drive raid array lol.
Zaphod
06-27-2011, 06:59 PM
Definite possibility. this is an 8 drive raid array lol.
haha yeah... I suspect I was using a single 500 gig drive in that server or something! :D
Again, in response to Acid, disk space in my case wasn't the issue, it was filling a directory with over 12 million files :D ... there was LOADS of drive space left, just no inodes :p
Also, when a directory gets that big you can't easily clean it out. If you break it down by sub dirs based on merchants or some letter algorithm, then each dir uses another inode and you run out of them faster haha :D
AcidRaZor
06-28-2011, 12:32 AM
You can actually increase the block size and decrease the inode size to fit in more data. Available drive space will decrease of course, but you won't be stuck with 12 million files and a ton of drive space left. This will give you more than enough to push in the directories to help organize things a little more. Another type of file system might also benefit you.
Just be careful when doing a database-based image storage system. I've seen that shit blow up in people's faces before.
mister
06-28-2011, 04:16 AM
Just be careful when doing a database-based image storage system. I've seen that shit blow up in people's faces before.
^^^ this. I recently had to clean up one of those messes.
Powered by vBulletin™ Version 4.0.8 Copyright © 2013 vBulletin Solutions, Inc. All rights reserved.