Add files via upload
This commit is contained in:
parent
d74f2d442c
commit
69ca87ab65
|
|
@ -0,0 +1,572 @@
|
|||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c60c52da-31c9-46ff-ab3b-6bc851886f32",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Import Library & Inisialisasi\n",
|
||||
"\n",
|
||||
"Tahap awal dilakukan dengan menyiapkan jumlah data dan memastikan proses random bersifat konsisten menggunakan seed"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "8bd2bcdb-740c-4e36-8771-b6627052ba6a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import pandas as pd\n",
|
||||
"import numpy as np\n",
|
||||
"\n",
|
||||
"np.random.seed(42)\n",
|
||||
"\n",
|
||||
"n_samples = 2000\n",
|
||||
"data = []"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "60f026f9-5ae5-4157-ad15-64df947842d6",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"###### pandas → manipulasi data\n",
|
||||
"###### numpy → random data\n",
|
||||
"###### seed(42) → hasil random konsisten (reproducible\n",
|
||||
"\n",
|
||||
"Membuat 2000 data sintetis\n",
|
||||
"Disimpan dalam list sebelum jadi DataFrame"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3a7788f3-c270-48a2-9442-41244e3864b7",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Generate Durasi Tidur, Mood Berdasarkan Tidur, Durasi Belajar Berdasarkan Mood\n",
|
||||
"\n",
|
||||
"Durasi tidur dihasilkan menggunakan distribusi normal dengan nilai rata-rata 6.5 jam, kemudian dibatasi agar tetap berada dalam rentang realistis (3–9 jam).\n",
|
||||
"\n",
|
||||
"Nilai mood ditentukan berdasarkan durasi tidur menggunakan pendekatan probabilitas, sehingga mencerminkan hubungan antara kondisi fisik dan emosional.\n",
|
||||
"\n",
|
||||
"Durasi belajar ditentukan berdasarkan kondisi mood, dengan asumsi bahwa kondisi emosional mempengaruhi tingkat fokus belajar."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "368ae584-e1cc-4b69-ac92-0c14cafab3e5",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"for i in range(1, n_samples + 1):\n",
|
||||
" \n",
|
||||
" sleep = np.clip(np.random.normal(6.5, 1.5), 3, 9)\n",
|
||||
" \n",
|
||||
" if sleep >= 7:\n",
|
||||
" mood = np.random.choice(\n",
|
||||
" ['Bagus', 'Lumayan', 'Biasa Saja'],\n",
|
||||
" p=[0.5, 0.3, 0.2]\n",
|
||||
" )\n",
|
||||
" elif sleep >= 5:\n",
|
||||
" mood = np.random.choice(\n",
|
||||
" ['Lumayan', 'Biasa Saja', 'Cukup Jenuh'],\n",
|
||||
" p=[0.4, 0.4, 0.2]\n",
|
||||
" )\n",
|
||||
" else:\n",
|
||||
" mood = np.random.choice(\n",
|
||||
" ['Biasa Saja', 'Cukup Jenuh', 'Jenuh'],\n",
|
||||
" p=[0.3, 0.4, 0.3]\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" if mood == 'Bagus':\n",
|
||||
" duration = np.random.normal(95, 20)\n",
|
||||
" elif mood == 'Lumayan':\n",
|
||||
" duration = np.random.normal(75, 20)\n",
|
||||
" elif mood == 'Biasa Saja':\n",
|
||||
" duration = np.random.normal(55, 15)\n",
|
||||
" elif mood == 'Cukup Jenuh':\n",
|
||||
" duration = np.random.normal(35, 10)\n",
|
||||
" else: # Jenuh\n",
|
||||
" duration = np.random.normal(25, 10)\n",
|
||||
" "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "396c21e7-aab8-4af0-915f-e595cadac737",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Durasi tidur dibangkitkan menggunakan distribusi normal dengan:\n",
|
||||
"<p> * rata-rata tidur <bold>6.5 jam\n",
|
||||
"<p> * standar deviasi 1.5 jam\n",
|
||||
"\n",
|
||||
"Mood ditentukan berdasarkan durasi tidur menggunakan probabilitas berbobot (weighted probability).\n",
|
||||
"<p> * Tidur lebih cukup → peluang mood baik lebih besar.\n",
|
||||
"<p> * Tidur kurang → peluang mood jenuh meningkat.\n",
|
||||
"\n",
|
||||
"Durasi belajar dibentuk berdasarkan kondisi mood pengguna\n",
|
||||
"<p>* Mood baik → durasi belajar cenderung lebih tinggi.\n",
|
||||
"<p>* Mood jenuh → durasi belajar cenderung lebih rendah.\n",
|
||||
"\n",
|
||||
"Tahap ini dilakukan agar hubungan antar variabel lebih mendekati kondisi nyata dan tidak dibentuk secara acak penuh"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "f1e66dd5-8ed5-4a62-bd9e-9353769dd643",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"duration = int(np.clip(duration, 10, 180))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "efb60353-bf12-4c71-b3ec-43d4a6933c0e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Durasi belajar dibatasi antara minimum 10 menit maksimum 180 menit.\n",
|
||||
"\n",
|
||||
"Hal ini dilakukan agar tidak muncul nilai yang tidak realistis"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1c793019-5f88-4a70-9464-462cc3a3155c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Penyusunan Dataset\n",
|
||||
"\n",
|
||||
"Data yang dihasilkan disusun ke dalam bentuk DataFrame dan disimpan sebagai file CSV"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "0b2ea9bb-fda1-45a8-ab80-27ebb48072ed",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<div>\n",
|
||||
"<style scoped>\n",
|
||||
" .dataframe tbody tr th:only-of-type {\n",
|
||||
" vertical-align: middle;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe tbody tr th {\n",
|
||||
" vertical-align: top;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe thead th {\n",
|
||||
" text-align: right;\n",
|
||||
" }\n",
|
||||
"</style>\n",
|
||||
"<table border=\"1\" class=\"dataframe\">\n",
|
||||
" <thead>\n",
|
||||
" <tr style=\"text-align: right;\">\n",
|
||||
" <th></th>\n",
|
||||
" <th>no</th>\n",
|
||||
" <th>mood</th>\n",
|
||||
" <th>durasi_belajar</th>\n",
|
||||
" <th>durasi_tidur</th>\n",
|
||||
" </tr>\n",
|
||||
" </thead>\n",
|
||||
" <tbody>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>2000</td>\n",
|
||||
" <td>Bagus</td>\n",
|
||||
" <td>95</td>\n",
|
||||
" <td>7.7</td>\n",
|
||||
" </tr>\n",
|
||||
" </tbody>\n",
|
||||
"</table>\n",
|
||||
"</div>"
|
||||
],
|
||||
"text/plain": [
|
||||
" no mood durasi_belajar durasi_tidur\n",
|
||||
"0 2000 Bagus 95 7.7"
|
||||
]
|
||||
},
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"data.append([\n",
|
||||
" i,\n",
|
||||
" mood,\n",
|
||||
" duration,\n",
|
||||
" round(sleep, 1)\n",
|
||||
" ])\n",
|
||||
"# Buat DataFrame\n",
|
||||
"df = pd.DataFrame(data, columns=[\n",
|
||||
" 'no',\n",
|
||||
" 'mood',\n",
|
||||
" 'durasi_belajar',\n",
|
||||
" 'durasi_tidur'\n",
|
||||
"])\n",
|
||||
"\n",
|
||||
"# Simpan ke CSV\n",
|
||||
"df.to_csv('data_sintetis_2000_tanpa_label.csv', index=False)\n",
|
||||
"\n",
|
||||
"df.head()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "aa08129e-b6ea-40c6-8546-04f6f0730b01",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Pembentukan Label (Pseudo-Labeling)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "2863896b-4c95-4aae-a963-d064aa16402a",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<div>\n",
|
||||
"<style scoped>\n",
|
||||
" .dataframe tbody tr th:only-of-type {\n",
|
||||
" vertical-align: middle;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe tbody tr th {\n",
|
||||
" vertical-align: top;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe thead th {\n",
|
||||
" text-align: right;\n",
|
||||
" }\n",
|
||||
"</style>\n",
|
||||
"<table border=\"1\" class=\"dataframe\">\n",
|
||||
" <thead>\n",
|
||||
" <tr style=\"text-align: right;\">\n",
|
||||
" <th></th>\n",
|
||||
" <th>no</th>\n",
|
||||
" <th>mood</th>\n",
|
||||
" <th>durasi_belajar</th>\n",
|
||||
" <th>durasi_tidur</th>\n",
|
||||
" <th>mood_score</th>\n",
|
||||
" <th>sleep_score</th>\n",
|
||||
" <th>duration_score</th>\n",
|
||||
" <th>total_score</th>\n",
|
||||
" <th>label</th>\n",
|
||||
" </tr>\n",
|
||||
" </thead>\n",
|
||||
" <tbody>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>Lumayan</td>\n",
|
||||
" <td>72</td>\n",
|
||||
" <td>7.2</td>\n",
|
||||
" <td>1.5</td>\n",
|
||||
" <td>2</td>\n",
|
||||
" <td>2</td>\n",
|
||||
" <td>5.5</td>\n",
|
||||
" <td>Intensif</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>1</th>\n",
|
||||
" <td>2</td>\n",
|
||||
" <td>Biasa Saja</td>\n",
|
||||
" <td>59</td>\n",
|
||||
" <td>4.8</td>\n",
|
||||
" <td>1.0</td>\n",
|
||||
" <td>0</td>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>2.0</td>\n",
|
||||
" <td>Ringan</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>2</th>\n",
|
||||
" <td>3</td>\n",
|
||||
" <td>Bagus</td>\n",
|
||||
" <td>110</td>\n",
|
||||
" <td>8.9</td>\n",
|
||||
" <td>2.0</td>\n",
|
||||
" <td>2</td>\n",
|
||||
" <td>2</td>\n",
|
||||
" <td>6.0</td>\n",
|
||||
" <td>Intensif</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>3</th>\n",
|
||||
" <td>4</td>\n",
|
||||
" <td>Lumayan</td>\n",
|
||||
" <td>64</td>\n",
|
||||
" <td>5.6</td>\n",
|
||||
" <td>1.5</td>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>2</td>\n",
|
||||
" <td>4.5</td>\n",
|
||||
" <td>Sedang</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>4</th>\n",
|
||||
" <td>5</td>\n",
|
||||
" <td>Biasa Saja</td>\n",
|
||||
" <td>26</td>\n",
|
||||
" <td>6.9</td>\n",
|
||||
" <td>1.0</td>\n",
|
||||
" <td>1</td>\n",
|
||||
" <td>0</td>\n",
|
||||
" <td>2.0</td>\n",
|
||||
" <td>Ringan</td>\n",
|
||||
" </tr>\n",
|
||||
" </tbody>\n",
|
||||
"</table>\n",
|
||||
"</div>"
|
||||
],
|
||||
"text/plain": [
|
||||
" no mood durasi_belajar durasi_tidur mood_score sleep_score \\\n",
|
||||
"0 1 Lumayan 72 7.2 1.5 2 \n",
|
||||
"1 2 Biasa Saja 59 4.8 1.0 0 \n",
|
||||
"2 3 Bagus 110 8.9 2.0 2 \n",
|
||||
"3 4 Lumayan 64 5.6 1.5 1 \n",
|
||||
"4 5 Biasa Saja 26 6.9 1.0 1 \n",
|
||||
"\n",
|
||||
" duration_score total_score label \n",
|
||||
"0 2 5.5 Intensif \n",
|
||||
"1 1 2.0 Ringan \n",
|
||||
"2 2 6.0 Intensif \n",
|
||||
"3 2 4.5 Sedang \n",
|
||||
"4 0 2.0 Ringan "
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"df_labeled = df.copy()\n",
|
||||
"\n",
|
||||
"# 1️⃣ Mapping skor mood\n",
|
||||
"mood_scores = {\n",
|
||||
" 'Bagus': 2,\n",
|
||||
" 'Lumayan': 1.5,\n",
|
||||
" 'Biasa Saja': 1,\n",
|
||||
" 'Cukup Jenuh': 0.5,\n",
|
||||
" 'Jenuh': 0\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"df_labeled['mood_score'] = df_labeled['mood'].map(mood_scores)\n",
|
||||
"\n",
|
||||
"# 2️⃣ Skor durasi tidur\n",
|
||||
"def sleep_score(hours):\n",
|
||||
" if hours > 7:\n",
|
||||
" return 2\n",
|
||||
" elif hours >= 5:\n",
|
||||
" return 1\n",
|
||||
" else:\n",
|
||||
" return 0\n",
|
||||
"\n",
|
||||
"df_labeled['sleep_score'] = df_labeled['durasi_tidur'].apply(sleep_score)\n",
|
||||
"\n",
|
||||
"# 3️⃣ Skor durasi belajar\n",
|
||||
"def duration_score(minutes):\n",
|
||||
" if minutes > 60:\n",
|
||||
" return 2\n",
|
||||
" elif minutes > 30:\n",
|
||||
" return 1\n",
|
||||
" else:\n",
|
||||
" return 0\n",
|
||||
"\n",
|
||||
"df_labeled['duration_score'] = df_labeled['durasi_belajar'].apply(duration_score)\n",
|
||||
"\n",
|
||||
"# 4️⃣ Total skor\n",
|
||||
"df_labeled['total_score'] = (\n",
|
||||
" df_labeled['mood_score'] +\n",
|
||||
" df_labeled['sleep_score'] +\n",
|
||||
" df_labeled['duration_score']\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# 5️⃣ Buat label pseudo\n",
|
||||
"def categorize(score):\n",
|
||||
" if score <= 3:\n",
|
||||
" return 'Ringan'\n",
|
||||
" elif score <= 4.5:\n",
|
||||
" return 'Sedang'\n",
|
||||
" else:\n",
|
||||
" return 'Intensif'\n",
|
||||
"\n",
|
||||
"df_labeled['label'] = df_labeled['total_score'].apply(categorize)\n",
|
||||
"\n",
|
||||
"df_labeled.head()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "7f81bc75-ac2a-403d-937f-f091e384a858",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"label\n",
|
||||
"Ringan 780\n",
|
||||
"Sedang 675\n",
|
||||
"Intensif 545\n",
|
||||
"Name: count, dtype: int64"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"df_labeled['label'].value_counts()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "232c5b2d-532e-43cf-8808-4abcc89ab7c0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"df_labeled.to_csv('data_sintetis_2000_dengan_pseudo_label.csv', index=False)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "4da2a7fa-38fe-485d-9738-e54b75b1fc3a",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<div>\n",
|
||||
"<style scoped>\n",
|
||||
" .dataframe tbody tr th:only-of-type {\n",
|
||||
" vertical-align: middle;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe tbody tr th {\n",
|
||||
" vertical-align: top;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe thead th {\n",
|
||||
" text-align: right;\n",
|
||||
" }\n",
|
||||
"</style>\n",
|
||||
"<table border=\"1\" class=\"dataframe\">\n",
|
||||
" <thead>\n",
|
||||
" <tr style=\"text-align: right;\">\n",
|
||||
" <th></th>\n",
|
||||
" <th>mood</th>\n",
|
||||
" <th>durasi_belajar</th>\n",
|
||||
" <th>durasi_tidur</th>\n",
|
||||
" <th>label</th>\n",
|
||||
" </tr>\n",
|
||||
" </thead>\n",
|
||||
" <tbody>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>Lumayan</td>\n",
|
||||
" <td>72</td>\n",
|
||||
" <td>7.2</td>\n",
|
||||
" <td>Intensif</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>1</th>\n",
|
||||
" <td>Biasa Saja</td>\n",
|
||||
" <td>59</td>\n",
|
||||
" <td>4.8</td>\n",
|
||||
" <td>Ringan</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>2</th>\n",
|
||||
" <td>Bagus</td>\n",
|
||||
" <td>110</td>\n",
|
||||
" <td>8.9</td>\n",
|
||||
" <td>Intensif</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>3</th>\n",
|
||||
" <td>Lumayan</td>\n",
|
||||
" <td>64</td>\n",
|
||||
" <td>5.6</td>\n",
|
||||
" <td>Sedang</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>4</th>\n",
|
||||
" <td>Biasa Saja</td>\n",
|
||||
" <td>26</td>\n",
|
||||
" <td>6.9</td>\n",
|
||||
" <td>Ringan</td>\n",
|
||||
" </tr>\n",
|
||||
" </tbody>\n",
|
||||
"</table>\n",
|
||||
"</div>"
|
||||
],
|
||||
"text/plain": [
|
||||
" mood durasi_belajar durasi_tidur label\n",
|
||||
"0 Lumayan 72 7.2 Intensif\n",
|
||||
"1 Biasa Saja 59 4.8 Ringan\n",
|
||||
"2 Bagus 110 8.9 Intensif\n",
|
||||
"3 Lumayan 64 5.6 Sedang\n",
|
||||
"4 Biasa Saja 26 6.9 Ringan"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Ambil hanya kolom yang diperlukan\n",
|
||||
"df_final = df_labeled[['mood', 'durasi_belajar', 'durasi_tidur', 'label']]\n",
|
||||
"\n",
|
||||
"df_final.head()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "3f6058f0-ba14-4ec2-ab41-2bdd124203e6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"df_final.to_csv('data_final_2000_dengan_label.csv', index=False)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.12",
|
||||
"language": "python",
|
||||
"name": "python312"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.12.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
File diff suppressed because one or more lines are too long
|
|
@ -0,0 +1,151 @@
|
|||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "90ab8483-90bf-43b3-83d2-56d0978b1a33",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import pandas as pd\n",
|
||||
"import numpy as np\n",
|
||||
"\n",
|
||||
"np.random.seed(42)\n",
|
||||
"\n",
|
||||
"n_samples = 2000\n",
|
||||
"data = []\n",
|
||||
"\n",
|
||||
"for i in range(1, n_samples + 1):\n",
|
||||
" \n",
|
||||
" sleep = np.clip(np.random.normal(6.5, 1.5), 3, 9)\n",
|
||||
" \n",
|
||||
" if sleep >= 7:\n",
|
||||
" mood = np.random.choice(\n",
|
||||
" ['Bagus', 'Lumayan', 'Biasa Saja'],\n",
|
||||
" p=[0.5, 0.3, 0.2]\n",
|
||||
" )\n",
|
||||
" elif sleep >= 5:\n",
|
||||
" mood = np.random.choice(\n",
|
||||
" ['Lumayan', 'Biasa Saja', 'Cukup Jenuh'],\n",
|
||||
" p=[0.4, 0.4, 0.2]\n",
|
||||
" )\n",
|
||||
" else:\n",
|
||||
" mood = np.random.choice(\n",
|
||||
" ['Biasa Saja', 'Cukup Jenuh', 'Jenuh'],\n",
|
||||
" p=[0.3, 0.4, 0.3]\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" if mood == 'Bagus':\n",
|
||||
" duration = np.random.normal(95, 20)\n",
|
||||
" elif mood == 'Lumayan':\n",
|
||||
" duration = np.random.normal(75, 20)\n",
|
||||
" elif mood == 'Biasa Saja':\n",
|
||||
" duration = np.random.normal(55, 15)\n",
|
||||
" elif mood == 'Cukup Jenuh':\n",
|
||||
" duration = np.random.normal(35, 10)\n",
|
||||
" else: # Jenuh\n",
|
||||
" duration = np.random.normal(25, 10)\n",
|
||||
" \n",
|
||||
" duration = int(np.clip(duration, 10, 180))\n",
|
||||
"\n",
|
||||
" data.append([\n",
|
||||
" i,\n",
|
||||
" mood,\n",
|
||||
" duration,\n",
|
||||
" round(sleep, 1)\n",
|
||||
" ])\n",
|
||||
"\n",
|
||||
"# Buat DataFrame\n",
|
||||
"df = pd.DataFrame(data, columns=[\n",
|
||||
" 'no',\n",
|
||||
" 'mood',\n",
|
||||
" 'durasi_belajar',\n",
|
||||
" 'durasi_tidur'\n",
|
||||
"])\n",
|
||||
"\n",
|
||||
"# Simpan ke CSV\n",
|
||||
"df.to_csv('data_sintetis_2000_tanpa_label.csv', index=False)\n",
|
||||
"\n",
|
||||
"df.head()\n",
|
||||
"\n",
|
||||
"df_labeled = df.copy()\n",
|
||||
"\n",
|
||||
"# 1️⃣ Mapping skor mood\n",
|
||||
"mood_scores = {\n",
|
||||
" 'Bagus': 2,\n",
|
||||
" 'Lumayan': 1.5,\n",
|
||||
" 'Biasa Saja': 1,\n",
|
||||
" 'Cukup Jenuh': 0.5,\n",
|
||||
" 'Jenuh': 0\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"df_labeled['mood_score'] = df_labeled['mood'].map(mood_scores)\n",
|
||||
"\n",
|
||||
"# 2️⃣ Skor durasi tidur\n",
|
||||
"def sleep_score(hours):\n",
|
||||
" if hours > 7:\n",
|
||||
" return 2\n",
|
||||
" elif hours >= 5:\n",
|
||||
" return 1\n",
|
||||
" else:\n",
|
||||
" return 0\n",
|
||||
"\n",
|
||||
"df_labeled['sleep_score'] = df_labeled['durasi_tidur'].apply(sleep_score)\n",
|
||||
"\n",
|
||||
"# 3️⃣ Skor durasi belajar\n",
|
||||
"def duration_score(minutes):\n",
|
||||
" if minutes > 60:\n",
|
||||
" return 2\n",
|
||||
" elif minutes > 30:\n",
|
||||
" return 1\n",
|
||||
" else:\n",
|
||||
" return 0\n",
|
||||
"\n",
|
||||
"df_labeled['duration_score'] = df_labeled['durasi_belajar'].apply(duration_score)\n",
|
||||
"\n",
|
||||
"# 4️⃣ Total skor\n",
|
||||
"df_labeled['total_score'] = (\n",
|
||||
" df_labeled['mood_score'] +\n",
|
||||
" df_labeled['sleep_score'] +\n",
|
||||
" df_labeled['duration_score']\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# 5️⃣ Buat label pseudo\n",
|
||||
"def categorize(score):\n",
|
||||
" if score <= 3:\n",
|
||||
" return 'Ringan'\n",
|
||||
" elif score <= 4.5:\n",
|
||||
" return 'Sedang'\n",
|
||||
" else:\n",
|
||||
" return 'Intensif'\n",
|
||||
"\n",
|
||||
"df_labeled['label'] = df_labeled['total_score'].apply(categorize)\n",
|
||||
"\n",
|
||||
"df_labeled.head()\n",
|
||||
"\n",
|
||||
"df_labeled.to_csv('data_sintetis_2000_dengan_pseudo_label.csv', index=False)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.12",
|
||||
"language": "python",
|
||||
"name": "python312"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.12.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading…
Reference in New Issue