Sunday, July 27, 2014

Hybris Design consideration

  • Ensure Hybris app servers are not serving static content. (High priority) Use CDN for static media/pages delivery. This may reduce 90% load from your infrastructure. If above not possible then simply ensure web server layer fronting hybris serving static media.
  • Ensure hybris servers are running on physical servers instead of virtual machines. Though VMs are supported but we have tested that hybris serves best on physical. (Low Priority)
  • DB Indexing – Hybris out of the box DB indexes are poor so ensure according to your model customization, DB indexes are created/updated otherwise site goes into deadlock soon. (High priority) You would like to drop unused indexes also, so that DB should not waste it’s time. (Medium priority)
  • Perform SSL termination at load balancer/web server. Don’t trouble hybris app servers for SSL handshake. (Medium priority)
  • Leverage web server modules for browser caching and compression. Don’t forget disabling unused apache modules. (Low priority)
  • Ensure only required extensions are deployed on front end hybris servers and avoid deploying any back office extension on front end. This is good for security and performance. (Medium priority)
  • Ensure loggers are not configured in debug mode. In production it should be in Error mode only on front boxes. (Low)
  • Hybris application is very prone to deadlocks because it maintains staging and online catalog version in the same table of the database. Avoid maintaining/developing catalog version aware types. If cannot then avoid making them part of catalog sync. e.g price  - Touching price daily for every product may result into full cat sync. Either maintain prices with online version only or at the time of create/update update both version. This way you can remove price item type from catalog sync.
  • OOB Hyrbis Catalog sync process is not a great design and that is rectified in hybris 5.1. But in short term developer should consider removing all unnecessary root types from sync process. (High priority)
  • My several years of experience with hyrbis says if a site is falling down after few hours of operations then issue is not with the front end code or tomcat configuration or infrastructure. Issue is definitely with the hybris application code. So don’t waste much energy in optimizing css, js, images etc, this may help a little but can’t solve the real instability issue.  (Low priority)
  • Stock service – Ensure you are not checking stock on loading of product detail page, category listing pages or add/remove event on basket page. Do this check only on add item into basket, submission of basket page and just before taking payment (i.e. you need to reserve stock here). (High priority)
  • Ensure hybris stock service is used to check stock status and no logic is written on Stock Model directly. (High priority)
  • Ensure JALO and web app session time out are configured to same value. (High priority)
  • 4 load balanced hybris app servers should be more than sufficient for a decent load, if solution is designed correctly. Adding n number of servers in infrastructure will not save you from a site crash. Trust me you have to fix core coding issue (not using hyrbis services correctly) or DB deadlocks.
  • Set the minimum and maximum heap sizes for the JVM to the same value. 8GB should be more than sufficient. More memory means longer GC pause and GC pause means all threads on halt. So avoid giving more memory this is not going to solve problem.
  • TCP/UDP clustering configuration doesn’t matter much for 4 app server cluster. But prefer sticking with default UDP settings. Hybris works well for either case until you have some serious networking issue. Use udpsniff to validate packets (Low priority)
  • Solr – Ensure solution is designed properly. You should be really very smart here.
  • Run Solr in standalone mode with one master server. Perform delta Index frequently 10-15 mins.  Prefer two-phase mode so that your site should not stuck while indexing happening behind the scene. Run full index update once in a week or only when you perform schema change. (High Priority)
  • Use Solr as much as you can. Because this way you can save DB calls. Hybris is very chatty with DB because of it’s cache refreshment and lazy loading concept. I really hate this part of hybris but now used to live with this and identified ways to avoid implementing DB centric solution. E.g. you can use solr to render category listing pages, you can index price and stock data and use this data as much as you can. Whole objective touch DB when it is really necessary. (High priority)
  • Disable Quick Search in hmc on the front-end and back-end hybris application servers. (Low priority)
  • If hybris is not your PIM and you have some other system where merchandising team perform preview before publishing a product then you really don’t need multiple catalog version in hybris. Here you can save lots of overhead from hybris. (High priority)
  • Hyrbis customizations – you should be in position to justify the customization you are going to suggest. I have seen several implementations where customization is done while functionality was available OOB. This happens when java developers with less hybris knowledge working in architect role. I have endless list of such examples -
  • Order Invoice generation/re-generation in pdf format was the requirement and solution implemented was pure custom by using some open source api.  Developers didn’t realized that this all available hybris OOB.
  • Another example is purging/archive items those are older than 30 days is a requirement and I have seen developers written lots of java code/scheduler etc to   achieve this while this can be done without any single line of code. Hybris OOB provide you to configure purging of any item type. You can configure this manually through hmc or write an impex.
  • Key point is avoid re-inventing the wheel and tries to find the available tested wheels in the system that you already bought. (High priority)
  • Ensure your custom types defined into items xml are persisting data into its own table rather than piggy banking on generic table. (High priority)
  • A common mistake is that developers cut-and-paste type and relation definitions in items.xml files which may result in unintentionally setting relation ends to be ordered. As I said Hybris application is very chatty with DB so you should be very cautious while defining your types and relations.  Copy paste may result into make it more DB chatty while you could have avoided this. (High Priority)
  • If no deployment table is defined for a many-to-many relation, a generic join table is used to store relations and is not optimal for performance. So ensure you define rel table for your custom many to many relationships.
  • Any relation-end that does not need to be ordered should have this attribute set to false.
  • Collection types – Avoid defining collection type. To maintain data integrity, always use a relation.
  • Avoid creating history records if possible or purge them on regular interval. Creating a history record by using auditing service might be fancy but can become a big performance overhead very quickly. Imagine creating an audit record for each stock level change. (Medium Priority)
  • Cron job logs. Hybris forgot to add pagination and this kills your hmc when you open a job with thousands of log files. (Medium Priority)
  • Ensure you use hybris WCMS Navigation node design for Mega menu construction so that you should not end-up preparing nested category hierarchy with every hit. 
  • Ensure passwords of default users (customer/employee/none) are changed and made complex enough to guess.
Default users - admin, anonymous, vjdbcReportsUser, csagent, cmsmanager
Note - 1. Hybris creates these users when you perform initialize/update. It only updates if such a user does not exist in the users table. So If you  have  changed the  password once then hybris won’t override them during re-running of hybris update process. But if you delete a default user then it will be created again with default password on running the hybris update with corresponding extension selected.
2. If you are dropping a new OOB extension for some new  functionality then it is worth ensuring that the new extension does not create a user record,  if it does then it is your responsibility to change the password in live environment at least.
 

I will share more experience as n when I will get time but post your comments if you need more details in any specific area. Feel free to ask any question related to Hybris, Solr, Endeca or webMethods. For my learning, I am after questions that I can't answer.
Thanks.


 

7 comments:

blogger said...

Very informative blog. Thank you very much for the valuable post.

yudha said...

Thank you for the informative blog, very useful.

Vinay Chowdary Malempati said...

Its really very helpful, thank you
Vinay

Unknown said...

Great work mate. I would like to know how solr standalone cluster can be implemented. Are there any docs available at the moment?

theroad said...

Thank you, it's really helpful!

Unknown said...

Thank you .. Very helpful

Ramesh Yerramsetti said...

Since "out of the box DB indexes are poor" how does performance work with HANA DB?