IRC log for #asterisk-dev on 20180510

00:20.01*** join/#asterisk-dev infobot (ibot@rikers.org)
00:20.01*** topic/#asterisk-dev is Asterisk Development Discussion -=- http://www.asterisk.org/developers -=- Tier 2 and 3.14159265 support is in #asterisk -=- Check out our blog! blogs.asterisk.org -=- Follow on Twitter at @AsteriskDev
02:25.03*** join/#asterisk-dev snuff-work (~snuff-wor@210.9.148.102)
02:25.03*** mode/#asterisk-dev [+o snuff-work] by ChanServ
02:33.38*** join/#asterisk-dev cresl1n (~Adium@asterisk/libpri-and-libss7-expert/Cresl1n)
02:33.38*** mode/#asterisk-dev [+o cresl1n] by ChanServ
11:38.48*** join/#asterisk-dev jkroon (~jkroon@165.16.204.169)
12:46.06*** join/#asterisk-dev scgm11_ (~scgm11@r186-49-50-18.dialup.adsl.anteldata.net.uy)
12:47.10*** join/#asterisk-dev coreyfarrell (~coreyfarr@24-177-250-191.dhcp.nwtn.ct.charter.com)
12:47.10*** mode/#asterisk-dev [+o coreyfarrell] by ChanServ
13:30.39*** join/#asterisk-dev bford (uid283514@gateway/web/irccloud.com/x-fqmdopsilmhjmwoc)
13:30.39*** mode/#asterisk-dev [+o bford] by ChanServ
14:13.57*** join/#asterisk-dev kharwell (kharwell@nat/digium/x-tvhzjqoekejxtkdf)
14:13.57*** mode/#asterisk-dev [+o kharwell] by ChanServ
14:42.29coreyfarrellwhen people have time could I get reviews for https://gerrit.asterisk.org/#/q/topic:ASTERISK-27824 ?  This is required for Fedora 28 to compile Asterisk with --enable-dev-mode.
14:42.59*** join/#asterisk-dev cresl1n (Adium@asterisk/libpri-and-libss7-expert/Cresl1n)
14:42.59*** mode/#asterisk-dev [+o cresl1n] by ChanServ
15:17.25seanbrightcoreyfarrell: just a general comment that - i dislike magic numbers in general but when they are powers of two i generally think to myself: "ok, this is a buffer big enough to hold the data and then some"
15:17.36seanbrightwhen they aren't powers of two, then i have to think
15:18.10seanbrightso in the example of going from 512 to 520 - the question i have is "why?"
15:18.36seanbrightso maybe 512 + SOME_IDENTIFIER_THAT_EXPLAINS_THE_ADDITIONAL_PADDING?
15:19.18cresl1nYeah, I hate it when that happens too, but respected gcc's static analysis prowess
15:19.29cresl1nIt seems like there's really a code problem in that case
15:20.05cresl1nLinus has traditionally not just accepted "make the gcc warning go away" patches that don't look at underlying causes
15:20.22cresl1nNot saying I'm gonna go that route, but it did make me wonder in a few places
15:23.27*** join/#asterisk-dev csavinovich (sid296765@gateway/web/irccloud.com/x-gfutqfahcjadkdmb)
15:23.27*** mode/#asterisk-dev [+o csavinovich] by ChanServ
15:31.09coreyfarrellI did suppress the warning for a few sources where possibility of truncation is pretty much unavoidable (or in the case of test_strings where it is intentional).  I can look into using more calculated sizes like `char buf[512 + sizeof(somefield)];` but I won't be able to update it today.
15:32.49*** join/#asterisk-dev Worldexe (~Worldexe@95-107-33-134.dsl.orel.ru)
15:33.10seanbrightin the snprintf cases, you can elide the warning by checking the return value
15:33.26coreyfarrellin some cases it could be dealt with by switching to 'struct ast_str' but I wanted to avoid changes to logic for the system-wide 'get gcc to compile'.
15:33.48coreyfarrellseanbright: oh so (void)snprintf(...) might allow individual warning to be suppressed?
15:33.59seanbrighttesting
15:34.08seanbright(although it still is a band-air)
15:34.10seanbrightaid*
15:34.46seanbrightno, casting to void does not silence it
15:38.01coreyfarrellhuh.. that seems like maybe a gcc bug / overly aggressive warning?  should be possible to say "I know this call to snprintf can truncate and I don't care".
15:39.47seanbrighti think the logic is "you should care of this truncates because it might affect something else"
15:39.51seanbrightif*
15:40.08gtjosephcoreyfarrell: weren't we going to suppress the test_runner bits from these messages...Test ['./lib/python/asterisk/test_runner.py', 'tests/channels/pjsip/ami/pjsip_qualify'] passe
15:40.17gtjosephi forgot
15:40.57coreyfarrellgtjoseph: my python3-compat patch includes that, switches it to just print the test name.
15:41.06gtjosephah, yeah ok
15:52.16gtjosephcoreyfarrell: without the ['...'] ??
15:52.31coreyfarrellgtjoseph: I think the pretty_print updates are not unneeded to 15.4 / 13.21 since the python3-compat will only go to 13, 14, 15 and master?
15:52.55coreyfarrellgtjoseph: correct.  the ['...'] was because the code was printing cmd (an array of strings).
15:53.01gtjosephactually, i discovered the issue with cert 13.21
15:53.30gtjosephfor some reason when you run custom tests, the second form of the result is used.
15:54.44gtjosephoh it appears that ./self_test is missing from 13.21 and 15.4 branches
15:55.01gtjosephthat needs to be cherry-picked
15:55.48gtjosephor we need to change the groovy to check for its existence.
15:57.17coreyfarrellup to you.  keep in mind the self_test had two commits, second one switched to using '#!/usr/bin/env sh' and 'set -e' instead of '#!/usr/bin/sh -e'
16:00.21gtjosephi'd rather cherry-pick to keep the code bases as consistent as possible
16:01.19coreyfarrellgtjoseph: ok I think I have a few minutes.  you want me to just merge the two commits for the minor branches or cherry-pick both patches in series?
16:01.42gtjosephcherry-pick in sequence i think would be safer.
16:06.36*** join/#asterisk-dev dakudos (~dakudos@c-73-203-6-107.hsd1.co.comcast.net)
16:07.34coreyfarrellgtjoseph: ok all set.  obviously ignore the jenkins error for 8964 as that's what 8965 fixes.
16:07.44gtjosephyep, thx
16:13.01*** join/#asterisk-dev scgm11_ (~scgm11@r186-49-50-18.dialup.adsl.anteldata.net.uy)
16:38.07*** join/#asterisk-dev Deeewayne (~dwayne@2605:a600:8050:5600:829d:1142:5677:7e57)
16:38.07*** mode/#asterisk-dev [+o Deeewayne] by ChanServ
18:22.19*** join/#asterisk-dev scgm11_ (~scgm11@r186-49-50-18.dialup.adsl.anteldata.net.uy)
18:27.12*** join/#asterisk-dev scgm11_ (~scgm11@r186-49-50-18.dialup.adsl.anteldata.net.uy)
18:37.43*** join/#asterisk-dev elguero (~miguel323@74-95-21-41-Connecticut.hfc.comcastbusiness.net)
18:47.18*** join/#asterisk-dev scgm11_ (~scgm11@r186-49-50-18.dialup.adsl.anteldata.net.uy)
19:11.03*** join/#asterisk-dev scgm11_ (~scgm11@r186-49-50-18.dialup.adsl.anteldata.net.uy)
19:25.09*** join/#asterisk-dev scgm11_ (~scgm11@r186-49-50-18.dialup.adsl.anteldata.net.uy)
19:31.27seanbrightyou digium folks have any nortel phones laying around that you want to send me?
19:39.57*** join/#asterisk-dev Bhakimi (~textual@208.78.139.170)
19:40.40Bhakimihi guys! we are running asterisk 11 (yes i know its EOL but he customized it so we are stuck) and it kills the cpu after about 50k calls processed
19:40.46Bhakimiwe tried to run a profiler on it and couldnt find a issue
19:41.04*** join/#asterisk-dev scgm11__ (~scgm11@r186-49-50-18.dialup.adsl.anteldata.net.uy)
19:41.36seanbrightsorry to hear that
19:41.47seanbrightyou should upgrade. how much was the code customized?
19:45.31*** join/#asterisk-dev CELYA_ (~Thunderbi@LFbn-1-11898-144.w90-93.abo.wanadoo.fr)
19:45.44Bhakimitons of custom modules
19:45.51Bhakimiwe build a whole application around it
19:45.57Bhakimiit would takes a few years to upgrade
19:46.09Bhakimiits basically not a option which is why i mentioed it :)
19:46.26seanbrighthmm. ok.
19:46.35seanbrightso you are running a highly customized version of asterisk 11
19:46.38seanbrighthow can we help?
19:46.58Bhakimiim tryign to find out what tools we can use to track it down
19:47.21seanbrightcore show threads & gdb
19:47.38*** join/#asterisk-dev jpsharp (~jsharp@45.79.209.207)
19:47.42seanbrightwhen it slows down, use gcore to create a core dump
19:47.51Bhakimilet me bring my dev in, he knows better but i think we did both
19:47.58seanbrightno that's ok
19:48.02jpsharpI'm here.
19:48.09seanbrightok, then read above
19:48.37coreyfarrellalso check memory usage when it slows down, possible you might be using too much memory and thrashing swap. that would slow the whole system down not just asterisk.
19:48.45Bhakimino memory usage
19:48.47Bhakimino swap
19:48.49jpsharpMemory is good. it's not swapping.
19:48.56Bhakimiswap is happy and it deosnt use muhc memory
19:51.11jpsharpI gotta install gcore/gdb.  It'll take a moment.
19:53.58seanbrightif you run 'htop' are there one or two threads running hot?
19:55.10Bhakimiall threads
19:55.18seanbrighthmm
19:55.31seanbrightdo your custom modules create new threads?
19:55.34Bhakimiit looks like after 50k calls once we push the system it goes cpu crazy
19:56.43jpsharpno, the only fully custom module we have is a CEL module that uses redis.
19:57.25seanbrightit would take a few years to upgrade that, eh?
19:57.36seanbrightcel hasn't changed at all since asterisk 11 that i am aware of
19:57.52Bhakimitesting etc
19:58.00seanbrightok, so that's 3 weeks
19:58.17seanbrightso the other 101 weeks are what?
19:58.19Bhakimiits a whole call center platform, we also took freepbx and build the ui for it
19:58.26Bhakimithere its tons of areas
19:58.30seanbrightgotcha
19:58.42seanbrightwell, god speed and all that.
19:58.43Bhakimiits like five9 type system
19:58.56seanbrightk
19:59.27Bhakimiits possible that upgrade would bring new issues
19:59.53seanbrighthow do you resolve the slow-down-after-50k-calls issue?
19:59.54Bhakimiif i knew 100% 15 wont cause any other issues i would but its too much risk to be honest
20:00.07jpsharpWe do a full asterisk restart.
20:00.21Bhakimii hate that we are on 11, and to solve this i could throw a bunhc of hardware at it and build systems to restart them but thats also not a good idea
20:00.32seanbrightwe upgraded from 11 to 15 (also run a hosted contact center platform) and didn't run into any problems
20:00.39Bhakimiwe unloaded all moduels and reloaded them and the issue still happens, only way to fix it is to restart
20:00.47seanbrighthow often does this happen?
20:00.51jpsharpEvery day.
20:00.56seanbrightonce a day?
20:01.09jpsharpyeah.  After about 4-5 hours of calling.
20:01.13Bhakimiyea and then it has to run for about 4 hours or so
20:01.18seanbrightgotcha
20:01.47seanbrightwell with a perfect system like that, there's no reason to upgrade
20:01.50seanbright:D
20:02.37jpsharpIf we wanted that kind of stabilty, we'd run Windows :)
20:02.46seanbrightyeesh
20:02.52seanbrighti'm tapping out
20:04.31Bhakimithat soon lol
20:04.45seanbrightyou just seem like you are in over your head
20:04.46Bhakimiany suggestions what toold we can use to track the high cpu usage
20:04.54seanbrighti already told you
20:04.54Bhakimiat least if we knew where int he code to look
20:05.01seanbrightgcore & gdb
20:05.09jpsharpI'm running a gcore dump right now.
20:05.12seanbrightgcore will let you get a core dump of the running process
20:05.22seanbrightgdb will let you see where the threads are spending their time
20:05.30seanbrightthere is no way that *all* of the threads are 100%
20:05.34Bhakimik 1 ec, may we show you the output ?
20:05.36seanbrightit's simply not possible
20:06.04seanbrightnot me
20:06.07seanbrightsounds like we're competitors
20:06.21seanbrightsomeone else though maybe
20:07.38Bhakimilol
20:07.41Bhakimiits for 1 center
20:07.46Bhakimii dobuth we are competitors
20:07.59jpsharpgcore failed to create core.
20:09.31seanbrightshrugs
20:12.50jpsharphttps://imgur.com/a/OAiX3tm
20:12.54jpsharpThat's the output of htop
20:12.56Bhakimiwhat call center platform do you work fpr?
20:13.41seanbright1095% is a lot
20:13.45seanbrightbut i'm no expert
20:14.17jpsharpOn the other instances, none of the channel threads exceed 1-2% cpu
20:15.11seanbrightwhat's the output of:
20:15.21seanbrightfind /proc/<pid of bad asterisk>/fd -type l | wc
20:15.53*** join/#asterisk-dev jkroon_ (~jkroon@165.16.204.166)
20:16.07Bhakimi1 sec let me get it
20:17.30Bhakimi<PROTECTED>
20:18.07seanbrightseems fine
20:18.12seanbrightdo you record?
20:18.21Bhakimino
20:18.43Bhakimiall this server does it place cals and send them to the agent server once the calls connect
20:18.52Bhakimiit executes a agi script for the routing and thats pretty muhc it
20:20.01seanbrightgotcha
20:20.13seanbrightwithout a core dump i don't know that there is much you can do
20:20.25seanbrightso figure out why gcore isn't working
20:20.30Bhakimithanks for all the help, working on getting it
20:27.12jpsharpOkay, got a core dump.
20:38.39seanbrightok. pastebin it.
20:41.40*** join/#asterisk-dev Tim_Toady (~fuzzy@snf-33276.vm.okeanos.grnet.gr)
20:47.44jpsharpToo big for paste bin, but here it is: http://n5xns.org/gdb.txt
20:50.25seanbrightwhat protocols are you using?
20:50.32Bhakimisip
20:50.32seanbrightsip, iax2, dahdi?
20:50.37seanbrightonly sip?
20:50.40Bhakimiyes
20:50.47seanbrightok, stop loading chan_iax2.so
20:50.55seanbrightstop loading chan_dahdi.so
20:50.56Bhakimiand dahdi
20:51.07Bhakimiyup, but can they still cause issues even if we dont use it?
20:51.22seanbrightof course
20:51.25seanbrightit's running code
20:51.59Bhakimiwhich timing source is mot stable to use?
20:52.26seanbrightres_timing_timerfd
20:52.32seanbrightunless you have dahdi hardware installed
20:52.39seanbrightin which case i would use res_timing_dahdi
20:53.07jpsharpThere's no dahdi hardware installed.
20:53.52coreyfarrellkharwell: have you (or anyone else) done a full testsuite run with the python3-compat (using python2)?
20:55.11kharwellI haven't. can't speak for others, but doubtful on that front as well
20:56.05seanbrightjpsharp: can you run a 'core show locks' also please?
20:56.31jpsharpI don't have locking debugging enabled.
20:56.41seanbrighthmm
20:56.53jpsharpHaving that enabled completely hoses the system
20:57.49coreyfarrellkharwell: I've done some here and there checks but my system does poorly with the full testsuite (even without the patch I get some lock-ups and failures).  as much as I'd love to see the python3 testsuite patch move forward I think +1 might be premature.
20:58.02seanbright27 threads sitting in pthread_mutex_lock()
20:58.47coreyfarrellkharwell: that's one of the comments I posted, I'm hoping someone with better hardware can help in that department.
20:59.45seanbrightjpsharp: is the box usable at all when it gets like this? does it still handle calls?
21:00.42jpsharpYeah, it still processes calls.  Not properly, but it still originates and hangs up calls.  It's not a full deadlock.
21:01.09kharwellcoreyfarrell: aah okay I'd missed that comment. wish there was a way to allow some comments to pin or not auto collapse in gerrit.
21:01.21kharwellcoreyfarrell: I guess you don't want to -1 it so folks will look at it?
21:02.21kharwell(some people might have filters that show only reviews with out -1s)
21:02.23seanbrightjpsharp: are manager events getting emitted?
21:02.39coreyfarrellkharwell: exactly.
21:02.57kharwellI'll just remove my +1 for now
21:03.36kharwellthe starpy stuff is minor and should be okay though I'd think
21:03.51jpsharpseanbright: Let me attach to the manager and look.
21:06.08coreyfarrellkharwell: yes, I ran tests/manager/originate under python2 and python3 using the starpy patch, both passed (obviously using my python3-compat testsuite patch).
21:07.34coreyfarrellI know that doesn't give "complete" coverage but it watches for events and sends commands, plus starpy patch doesn't have inter-dependencies like the testsuite python does.
21:08.07jpsharpseanbright: A few events, but not nearly to the same rate as what's actually happening.
21:08.58jpsharpthe account I'm logged into the manager as has a filter for CPD-Result from our AMD system.  I know the system placed about 100 calls, but I only saw 4 events from the manager.
21:09.11seanbrighti'm sorry - i have to leave for the day. there is definitely something going on with locking, i'm just not sure what.
21:09.16seanbright'core show locks' output would help
21:09.19seanbrightgotta go
21:10.05jpsharpI agree with the locking issue.
21:34.10*** join/#asterisk-dev cresl1n (Adium@asterisk/libpri-and-libss7-expert/Cresl1n)
21:34.10*** mode/#asterisk-dev [+o cresl1n] by ChanServ
21:40.12jpsharpIf I do a "manager show eventq", should I get about 2-3 minutes of output?  Why would the eventq be backing up like that?
22:11.23*** part/#asterisk-dev kharwell (kharwell@nat/digium/x-tvhzjqoekejxtkdf)
23:21.56*** join/#asterisk-dev pchero (~pchero@176-23-78-252-cable.dk.customer.tdc.net)
23:25.41jpsharpAnd I think I've found the problem.  Events were backing up via the HTTP manager interface, causing the event queue to becoming extremely stupid large to the point where the ao2 iterator takes forever to go through, resulting in lots of excessive lock times on the list.
23:36.13*** join/#asterisk-dev snuff-work (~snuff-wor@210.9.148.102)
23:36.13*** mode/#asterisk-dev [+o snuff-work] by ChanServ

Generated by irclog2html.pl Modified by Tim Riker to work with infobot.