Discussion:
[Mlt-devel] Optimizing `melt` in a CPU-intensive environment
jeffrey k eliasen
2016-06-16 22:49:03 UTC
Permalink
I am running the melt command on a 4-core system with no other software running (except system services), and I am only seeing about 30% CPU usage (134% of the theoretical 400% on the box). ffmpeg and other commands are fully utilizing the cores (380-410%), so the power is definitely available. All the files are being read from tempfs (RAM-based disk), so while there is definitely some latency for file access it should not be significant compared to the same media stored on physical media.

Is melt able to utilize multiple processors? If so, why is it only using about 1.5 CPUs of the available 4? Are there settings I can adjust to better utilize the host machine?

----------
jeffrey k eliasen - technologist, philosopher, agent of change
blog <http://jeff.jke.net/> | linkedin <http://www.linkedin.com/pub/jeffrey-eliasen/3/a83/b76> | google+ <http://plus.google.com/+JeffreyEliasen> | facebook <http://facebook.com/jeffrey.eliasen> | twitter <http://twitter.com/jeffreyeliasen>
Dan Dennedy
2016-06-16 23:30:15 UTC
Permalink
https://www.mltframework.org/bin/view/MLT/Questions#Does_MLT_take_advantage_of_multi
Post by jeffrey k eliasen
I am running the melt command on a 4-core system with no other software
running (except system services), and I am only seeing about 30% CPU usage
(134% of the theoretical 400% on the box). ffmpeg and other commands are
fully utilizing the cores (380-410%), so the power is definitely available.
All the files are being read from tempfs (RAM-based disk), so while there
is definitely some latency for file access it should not be significant
compared to the same media stored on physical media.
Is melt able to utilize multiple processors? If so, why is it only using
about 1.5 CPUs of the available 4? Are there settings I can adjust to
better utilize the host machine?
----------
jeffrey k eliasen - technologist, philosopher, agent of change
blog <http://jeff.jke.net> | linkedin
<http://www.linkedin.com/pub/jeffrey-eliasen/3/a83/b76> | google+
<http://plus.google.com/+JeffreyEliasen> | facebook
<http://facebook.com/jeffrey.eliasen> | twitter
<http://twitter.com/jeffreyeliasen>
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and
traffic
patterns at an interface-level. Reveals which users, apps, and protocols
are
consuming the most bandwidth. Provides multi-vendor support for NetFlow,
J-Flow, sFlow and other flows. Make informed decisions using capacity
planning
reports. http://sdm.link/zohomanageengine
_______________________________________________
Mlt-devel mailing list
https://lists.sourceforge.net/lists/listinfo/mlt-devel
jeffrey k eliasen
2016-06-16 23:40:33 UTC
Permalink
OK, so if I'm reading this properly, I should do something like:

melt edl.xml -consumer avformat:out.mp4 real_time:4

... to get specify 4 cores with frame-dropping?

Also, what is frame dropping?

----------
jeffrey k eliasen - technologist, philosopher, agent of change
blog <http://jeff.jke.net/> | linkedin <http://www.linkedin.com/pub/jeffrey-eliasen/3/a83/b76> | google+ <http://plus.google.com/+JeffreyEliasen> | facebook <http://facebook.com/jeffrey.eliasen> | twitter <http://twitter.com/jeffreyeliasen>
https://www.mltframework.org/bin/view/MLT/Questions#Does_MLT_take_advantage_of_multi <https://www.mltframework.org/bin/view/MLT/Questions#Does_MLT_take_advantage_of_multi>
I am running the melt command on a 4-core system with no other software running (except system services), and I am only seeing about 30% CPU usage (134% of the theoretical 400% on the box). ffmpeg and other commands are fully utilizing the cores (380-410%), so the power is definitely available. All the files are being read from tempfs (RAM-based disk), so while there is definitely some latency for file access it should not be significant compared to the same media stored on physical media.
Is melt able to utilize multiple processors? If so, why is it only using about 1.5 CPUs of the available 4? Are there settings I can adjust to better utilize the host machine?
----------
jeffrey k eliasen - technologist, philosopher, agent of change
blog <http://jeff.jke.net/> | linkedin <http://www.linkedin.com/pub/jeffrey-eliasen/3/a83/b76> | google+ <http://plus.google.com/+JeffreyEliasen> | facebook <http://facebook.com/jeffrey.eliasen> | twitter <http://twitter.com/jeffreyeliasen>
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are
consuming the most bandwidth. Provides multi-vendor support for NetFlow,
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports. http://sdm.link/zohomanageengine <http://sdm.link/zohomanageengine>_______________________________________________
Mlt-devel mailing list
https://lists.sourceforge.net/lists/listinfo/mlt-devel <https://lists.sourceforge.net/lists/listinfo/mlt-devel>
Dan Dennedy
2016-06-16 23:57:52 UTC
Permalink
Post by jeffrey k eliasen
melt edl.xml -consumer avformat:out.mp4 real_time:4
NO! There is one syntax only to set properties. At the very least, you
could have read beyond one answer in the FAQ to see more examples!
Post by jeffrey k eliasen
... to get specify 4 cores with frame-dropping?
Also, what is frame dropping?
Just think a little about what "real time" might mean. It is actually
ironic that you can use a real_time option to make it not operate in real
time (the default for avformat consumer). Does frame-dropping sound at all
like something you want?
Post by jeffrey k eliasen
----------
jeffrey k eliasen - technologist, philosopher, agent of change
blog <http://jeff.jke.net> | linkedin
<http://www.linkedin.com/pub/jeffrey-eliasen/3/a83/b76> | google+
<http://plus.google.com/+JeffreyEliasen> | facebook
<http://facebook.com/jeffrey.eliasen> | twitter
<http://twitter.com/jeffreyeliasen>
https://www.mltframework.org/bin/view/MLT/Questions#Does_MLT_take_advantage_of_multi
Post by jeffrey k eliasen
I am running the melt command on a 4-core system with no other software
running (except system services), and I am only seeing about 30% CPU usage
(134% of the theoretical 400% on the box). ffmpeg and other commands are
fully utilizing the cores (380-410%), so the power is definitely available.
All the files are being read from tempfs (RAM-based disk), so while there
is definitely some latency for file access it should not be significant
compared to the same media stored on physical media.
Is melt able to utilize multiple processors? If so, why is it only using
about 1.5 CPUs of the available 4? Are there settings I can adjust to
better utilize the host machine?
----------
jeffrey k eliasen - technologist, philosopher, agent of change
blog <http://jeff.jke.net/> | linkedin
<http://www.linkedin.com/pub/jeffrey-eliasen/3/a83/b76> | google+
<http://plus.google.com/+JeffreyEliasen> | facebook
<http://facebook.com/jeffrey.eliasen> | twitter
<http://twitter.com/jeffreyeliasen>
Brian Matherly
2016-06-17 03:08:50 UTC
Permalink
The "real-time" property is documented here:https://www.mltframework.org/bin/view/MLT/ConsumerAvformat#real_time
Syntax:melt edl.xml -consumer avformat:out.mp4 realtime=-4
The real time property is really two properties in one. Positive and negative specifies whether frames should be dropped or not (respectively).

The reason one might drop frames would be in real-time applications where it is more valuable to keep displaying pictures than to actually display every picture. If you are watching a video, for example, it might be less distracting to drop every 5th frame than to have the video constantly pausing.
For file based processing and any non-real-time applications, you want to use a negative number so that frames are not dropped.
Tip: using more than one processing thread may expose a bug in a filter. If you see strange behavior, try setting real-time to 0 and see if the problem goes away.

The "threads" property is documented here:https://www.mltframework.org/bin/view/MLT/ConsumerAvformat#threads
Syntax:melt edl.xml -consumer avformat:out.mp4 threads=4
This will tell the encoder how many threads to use for encoding.
The two properties can be combined:melt edl.xml -consumer avformat:out.mp4 realtime=-4 threads=4
I typically set the values to be equal to the number of logical cores available. So "-4" and "4" respectively would be a good way to maximize your machine.
~BM


From: jeffrey k eliasen <***@jke.net>
To: Dan Dennedy <***@dennedy.org>
Cc: mlt-***@lists.sourceforge.net
Sent: Thursday, June 16, 2016 6:40 PM
Subject: Re: [Mlt-devel] Optimizing `melt` in a CPU-intensive environment

OK, so if I'm reading this properly, I should do something like:
melt edl.xml -consumer avformat:out.mp4 real_time:4
... to get specify 4 cores with frame-dropping?
Also, what is frame dropping?
#yiv3598955513 .yiv3598955513ExternalClass * {line-height:100%;}
----------
jeffrey k eliasen - technologist, philosopher, agent of change
blog |linkedin |google+ |facebook |twitter

On Jun 16, 2016, at 13:30, Dan Dennedy <***@dennedy.org> wrote:
https://www.mltframework.org/bin/view/MLT/Questions#Does_MLT_take_advantage_of_multi

On Thu, Jun 16, 2016 at 4:29 PM jeffrey k eliasen <***@jke.net> wrote:

I am running the melt command on a 4-core system with no other software running (except system services), and I am only seeing about 30% CPU usage (134% of the theoretical 400% on the box). ffmpeg and other commands are fully utilizing the cores (380-410%), so the power is definitely available. All the files are being read from tempfs (RAM-based disk), so while there is definitely some latency for file access it should not be significant compared to the same media stored on physical media.
Is melt able to utilize multiple processors? If so, why is it only using about 1.5 CPUs of the available 4? Are there settings I can adjust to better utilize the host machine?

----------
jeffrey k eliasen - technologist, philosopher, agent of change
blog |linkedin |google+ |facebook |twitter
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are
consuming the most bandwidth. Provides multi-vendor support for NetFlow,
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports. http://sdm.link/zohomanageengine_______________________________________________
Mlt-devel mailing list
Mlt-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mlt-devel
Mateusz Starzak
2016-06-17 12:12:21 UTC
Permalink
I've seen performace increase when using melt as a raw video and audio
processing engine through pipes with standalone ffmpeg processes. From my
own experience it's not possible to saturate a high-end cpu with melt alone
without doing heavy frame-based filtering. There's a bottleneck somewhere,
possibly in the avformat producer and/or consumer.

Matt
Brian Matherly
2016-06-17 13:23:39 UTC
Permalink
Using "real_time" and "threads" options should not change the final result - unless you happen to expose a bug. It should not matter if you use one, the other or in combination.
In my own experience, if I have an EDL that only has a few filters/transitions and I am encoding to H.264, I only need to set "threads=4" and I can pretty much saturate my CPU. Basically, one core ends up doing all the decoding and MLT processing and the rest of the cores get used up for encoding.

~Brian

From: jeffrey k eliasen <***@jke.net>
To: Brian Matherly <***@brianmatherly.com>
Cc: "mlt-***@lists.sourceforge.net" <mlt-***@lists.sourceforge.net>
Sent: Friday, June 17, 2016 3:28 AM
Subject: Re: [Mlt-devel] Optimizing `melt` in a CPU-intensive environment

OK, that's exactly what I was looking for, thanks!
Looks like I also had a typo in an earlier reply, using ':' instead of '=' to denote a property value, that took longer than it should have to recognize.
Finally, you mention the real_time and threads options can be combined, but does this change the final result in any way vs. using just one or the other (assuming I'm not dropping frames)?
#yiv6254521583 .yiv6254521583ExternalClass * {line-height:100%;}
Dan Dennedy
2016-06-17 17:02:03 UTC
Permalink
Let me add to the thread count info. Many encoders such as x264 and x265
configure themselves by logical CPU count if you do not specify "threads"
or set it to 0. They often actually use more threads than cpu count; they
know what they are doing, but it is based on the assumption they need the
vast majority of cpu utilization. I do not yet have a comprehensive
accounting of which codecs require an explicit threads property to use more
than one and which ones cannot be more than 1, but there is some start to
capturing this knowledge in Shotcut source code here:
https://github.com/mltframework/shotcut/blob/master/src/docks/encodedock.cpp#L565

The performance gain of MLT's parallel image processing (abs(real_time) >
1) varies considerably depending on the composition and services involved. It
is well short of linear scaling. Some effects are not parallel-safe and
block major portions from concurrent access, and that creates bottlenecks.
Slowly over time this situation is improving in MLT and frei0r. Based on my
testing, to make a general rule, I see little gains setting real_time >
(-)4 on a system with 8 logical processors. Sure, I can create some
scenarios where going over that still shows some benefits, but again, as a
general rule. In Shotcut, I made a heuristic to set this:
https://github.com/mltframework/shotcut/blob/master/src/mltcontroller.cpp#L718
So, on a dual core it will still be 1, on a quad core it will be 3, and on
anything more than that only 4. The rest can go to decoders and encoders,
many of which are multi-threaded now.
MLT's parallel image processing (also known as frame-threading) removes the
need for each effect to be touched to use SIMD assembler or OpenMP or even
to be slice-friendly. That makes it easy to get some parallelism going for
nearly all effects, but it is the least performant approach because it is
not friendly to the CPU RAM caches and has overhead for locks.
Post by Brian Matherly
Using "real_time" and "threads" options should not change the final result
- unless you happen to expose a bug. It should not matter if you use one,
the other or in combination.
In my own experience, if I have an EDL that only has a few
filters/transitions and I am encoding to H.264, I only need to set
"threads=4" and I can pretty much saturate my CPU. Basically, one core ends
up doing all the decoding and MLT processing and the rest of the cores get
used up for encoding.
~Brian
------------------------------
*Sent:* Friday, June 17, 2016 3:28 AM
*Subject:* Re: [Mlt-devel] Optimizing `melt` in a CPU-intensive
environment
OK, that's exactly what I was looking for, thanks!
Looks like I also had a typo in an earlier reply, using ':' instead of '='
to denote a property value, that took longer than it should have to
recognize.
Finally, you mention the real_time and threads options can be combined,
but does this change the final result in any way vs. using just one or the
other (assuming I'm not dropping frames)?
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are
consuming the most bandwidth. Provides multi-vendor support for NetFlow,
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports. http://sdm.link/zohomanageengine
_______________________________________________
Mlt-devel mailing list
https://lists.sourceforge.net/lists/listinfo/mlt-devel
jeffrey k eliasen
2016-06-17 23:14:17 UTC
Permalink
Cool, thanks, this is all very helpful.

I'm seeing about a 25% performance gain by adding threads=4 compared to the default. And I'm using v0.9.0 as installed using apt-get install melt on an Ubuntu 14.04.3 OS... not sure how much it's come along in the last couple months, but I'm certain I'm behind the curve a little here.

----------
jeffrey k eliasen - technologist, philosopher, agent of change
blog <http://jeff.jke.net/> | linkedin <http://www.linkedin.com/pub/jeffrey-eliasen/3/a83/b76> | google+ <http://plus.google.com/+JeffreyEliasen> | facebook <http://facebook.com/jeffrey.eliasen> | twitter <http://twitter.com/jeffreyeliasen>
https://github.com/mltframework/shotcut/blob/master/src/docks/encodedock.cpp#L565 <https://github.com/mltframework/shotcut/blob/master/src/docks/encodedock.cpp#L565>
https://github.com/mltframework/shotcut/blob/master/src/mltcontroller.cpp#L718 <https://github.com/mltframework/shotcut/blob/master/src/mltcontroller.cpp#L718>
So, on a dual core it will still be 1, on a quad core it will be 3, and on anything more than that only 4. The rest can go to decoders and encoders, many of which are multi-threaded now.
MLT's parallel image processing (also known as frame-threading) removes the need for each effect to be touched to use SIMD assembler or OpenMP or even to be slice-friendly. That makes it easy to get some parallelism going for nearly all effects, but it is the least performant approach because it is not friendly to the CPU RAM caches and has overhead for locks.
Using "real_time" and "threads" options should not change the final result - unless you happen to expose a bug. It should not matter if you use one, the other or in combination.
In my own experience, if I have an EDL that only has a few filters/transitions and I am encoding to H.264, I only need to set "threads=4" and I can pretty much saturate my CPU. Basically, one core ends up doing all the decoding and MLT processing and the rest of the cores get used up for encoding.
~Brian
Sent: Friday, June 17, 2016 3:28 AM
Subject: Re: [Mlt-devel] Optimizing `melt` in a CPU-intensive environment
OK, that's exactly what I was looking for, thanks!
Looks like I also had a typo in an earlier reply, using ':' instead of '=' to denote a property value, that took longer than it should have to recognize.
Finally, you mention the real_time and threads options can be combined, but does this change the final result in any way vs. using just one or the other (assuming I'm not dropping frames)?
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are
consuming the most bandwidth. Provides multi-vendor support for NetFlow,
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports. http://sdm.link/zohomanageengine <http://sdm.link/zohomanageengine>_______________________________________________
Mlt-devel mailing list
https://lists.sourceforge.net/lists/listinfo/mlt-devel <https://lists.sourceforge.net/lists/listinfo/mlt-devel>
Loading...