Skip to the content.

Classification Report for Qwen2.5 72B

Invalid Model Output Count

Test Invalid Count
0%_2048 3
0%_8192 2
0%_16384 3
0%_32768 4
25%_64 1
25%_1024 1
25%_4096 1
25%_8192 2
25%_16384 4
25%_32768 6
50%_64 1
50%_8192 1
50%_16384 2
50%_32768 1
75%_64 2
75%_2048 8
75%_4096 14
75%_8192 1
75%_16384 3
75%_32768 1
100%_16384 3

Consolidated Classification Report

Test Total Samples True Violations Predicted Violations Accuracy Precision (macro) Recall (macro) F1-score (macro) Precision (weighted) Recall (weighted) F1-score (weighted)
base 411 248 282 0.83 0.838 0.804 0.814 0.833 0.83 0.825
0%_64 411 248 279 0.832 0.838 0.808 0.818 0.835 0.832 0.828
0%_128 411 248 281 0.827 0.834 0.802 0.812 0.83 0.827 0.823
0%_256 411 248 281 0.818 0.823 0.792 0.801 0.82 0.818 0.813
0%_512 411 248 291 0.793 0.803 0.761 0.771 0.798 0.793 0.786
0%_1024 411 248 283 0.803 0.808 0.776 0.785 0.805 0.803 0.798
0%_2048 408 246 274 0.824 0.827 0.801 0.809 0.825 0.824 0.82
0%_4096 411 248 262 0.83 0.826 0.815 0.819 0.829 0.83 0.828
0%_8192 409 247 304 0.758 0.772 0.717 0.725 0.766 0.758 0.745
0%_16384 408 246 336 0.701 0.731 0.64 0.635 0.722 0.701 0.667
0%_32768 407 247 315 0.764 0.794 0.715 0.724 0.782 0.764 0.747
25%_64 410 248 295 0.793 0.806 0.758 0.768 0.799 0.793 0.784
25%_128 411 248 301 0.783 0.8 0.746 0.756 0.793 0.783 0.773
25%_256 411 248 299 0.793 0.81 0.757 0.768 0.803 0.793 0.784
25%_512 411 248 309 0.774 0.797 0.732 0.741 0.787 0.774 0.76
25%_1024 410 247 324 0.744 0.778 0.693 0.697 0.766 0.744 0.722
25%_2048 411 248 333 0.73 0.77 0.673 0.674 0.756 0.73 0.702
25%_4096 410 247 347 0.707 0.762 0.642 0.634 0.746 0.707 0.667
25%_8192 409 247 366 0.675 0.746 0.597 0.567 0.728 0.675 0.612
25%_16384 407 245 370 0.658 0.727 0.578 0.538 0.71 0.658 0.586
25%_32768 405 244 376 0.635 0.676 0.549 0.491 0.666 0.635 0.547
50%_64 410 247 298 0.798 0.816 0.762 0.773 0.808 0.798 0.788
50%_128 411 248 299 0.783 0.798 0.747 0.757 0.791 0.783 0.773
50%_256 411 248 307 0.783 0.807 0.743 0.753 0.797 0.783 0.771
50%_512 411 248 320 0.757 0.789 0.708 0.715 0.777 0.757 0.738
50%_1024 411 248 336 0.732 0.779 0.674 0.675 0.764 0.732 0.703
50%_2048 411 248 348 0.698 0.744 0.632 0.622 0.73 0.698 0.657
50%_4096 411 248 356 0.689 0.743 0.618 0.6 0.728 0.689 0.639
50%_8192 410 247 364 0.69 0.778 0.616 0.592 0.755 0.69 0.633
50%_16384 409 246 374 0.667 0.766 0.587 0.547 0.742 0.667 0.594
50%_32768 410 247 374 0.666 0.754 0.585 0.545 0.732 0.666 0.593
75%_64 409 246 296 0.8 0.818 0.765 0.776 0.809 0.8 0.791
75%_128 411 248 301 0.788 0.806 0.751 0.761 0.798 0.788 0.778
75%_256 411 248 310 0.776 0.802 0.734 0.743 0.791 0.776 0.762
75%_512 411 248 332 0.727 0.764 0.671 0.672 0.752 0.727 0.7
75%_1024 411 248 347 0.715 0.774 0.651 0.644 0.756 0.715 0.677
75%_2048 403 245 333 0.712 0.747 0.649 0.645 0.735 0.712 0.679
75%_4096 397 238 329 0.736 0.8 0.677 0.676 0.78 0.736 0.704
75%_8192 410 247 366 0.685 0.774 0.61 0.583 0.751 0.685 0.625
75%_16384 408 246 365 0.674 0.746 0.597 0.567 0.727 0.674 0.611
75%_32768 410 247 374 0.646 0.693 0.565 0.519 0.682 0.646 0.57
100%_64 411 248 299 0.788 0.804 0.752 0.762 0.797 0.788 0.779
100%_128 411 248 299 0.793 0.81 0.757 0.768 0.803 0.793 0.784
100%_256 411 248 306 0.781 0.803 0.741 0.751 0.794 0.781 0.769
100%_512 411 248 321 0.74 0.765 0.69 0.695 0.756 0.74 0.719
100%_1024 411 248 340 0.708 0.746 0.647 0.642 0.734 0.708 0.673
100%_2048 411 248 348 0.708 0.763 0.642 0.634 0.746 0.708 0.668
100%_4096 411 248 332 0.723 0.756 0.666 0.666 0.745 0.723 0.695
100%_8192 411 248 360 0.693 0.766 0.621 0.602 0.746 0.693 0.641
100%_16384 408 247 358 0.674 0.72 0.599 0.575 0.707 0.674 0.618
100%_32768 411 248 364 0.684 0.757 0.609 0.584 0.737 0.684 0.626